I'm starting a new thread for Toom related stuff, as our last thread got a bit messed up. :-)
I did a calculation of how much better Toom 4 should be than Toom 3. But it didn't really show what I thought it would. My hypothesis was that for numbers of limbs that are a power of 2, Toom 4 should be the clear winner, as Toom 3 always needs to "round up" the limb sizes. I think this is not really such a big issue in practice. Below is my calculation anyway, just in case it spurs someone to think of something, or I need it later. Note, I realise this is a pointless computation as I've not taken overheads and additions into account, etc. In practice for 2048 limbs, Paul Zimmermann's implementation of Toom 4 is about 7% faster than our existing Toom 3 implementation. [I guess I'm still banging my head over the fact that basically we are 10% behind in multiplication on K8, 30% behind in division and 10% ahead in RSA compared with the up and coming GMP. A faster FFT is only going to make 20% difference to the final multiplication bench. Unless they have worked hard on reducing overheads for very small multiplications, I'm failing to see how they've gotten times down by anything like what is required to give a 10% difference on the overall multiply bench....] Anyhow, for the computation: ---------- Take the example from mpirbench of multiplying two 131072 bit numbers. On a 64 bit machine this is 2048 limbs. Let's suppose crossovers were 30 limbs for karatsuba, 90 limbs for Toom 3 and 500 limbs for Toom 4. First let's use Toom 3: ceil(2048/3) = 683 limbs. So there'd be one multiplication of 682 limbs and 4 of 683. ceil(682/3) = 228, so there'd be one 226 limbs and four 228 limbs. ceil(683/3) = 228, so there'd be one 227 limbs, and four 228 limbs. ceil(228/3) = 76, so there'd be five 76 limbs. ceil(227/3) = 76 so there'd be one 75 limbs, four 76 ceil(226/3) = 76 so there'd be one 74 limbs, four 76 We have a total of ((74 + 4*76) + 4*(5*76)) + 4*((75 + 4*76) + 4* (5*76))) = 74 + 4*75 + 120*76 limb multiplications. Now lets use Toom 4: 2048/4 = 512 limbs, so there'd be seven multiplications of 512 limbs. 512/4 = 128 limbs, so there'd be seven multiplications of 128 limbs. ceil(128/3) = 43, so there'd be one of 42 limbs and 4 of 43 limbs We have a total of 49*42 +196*43 limb mults. Let's be charitable and simply with 125 x 74 limb mults for Toom 3 and 245 x 43 limbs mults for Toom 4. Now 74/2 = 37 so there'd be three 37 limb mults ceil(37/2) = 19, so there'd be two 19 limb mults and one 18 limb mult. In total we have 3*(2*19^2 + 18^2) = 3138. ceil(43/2) = 22, so there'd be two 22 limb mults and one 21 limb mult. In total we have 2*22^2 + 21^2 = 1409. So in linear terms, Toom 4 takes roughly 1409 * 245 = 345205 whereas Toom 3 takes 3138 * 125 = 392250. (Read apples vs bananas for units.) Bill. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---