I'm starting a new thread for Toom related stuff, as our last thread
got a bit messed up. :-)

I did a calculation of how much better Toom 4 should be than Toom 3.
But it didn't really show what I thought it would. My hypothesis was
that for numbers of limbs that are a power of 2, Toom 4 should be the
clear winner, as Toom 3 always needs to "round up" the limb sizes. I
think this is not really such a big issue in practice.

Below is my calculation anyway, just in case it spurs someone to think
of something, or I need it later. Note, I realise this is a pointless
computation as I've not taken overheads and additions into account,
etc.

In practice for 2048 limbs, Paul Zimmermann's implementation of Toom 4
is about 7% faster than our existing Toom 3 implementation.

[I guess I'm still banging my head over the fact that basically we are
10% behind in multiplication on K8, 30% behind in division and 10%
ahead in RSA compared with the up and coming GMP. A faster FFT is only
going to make 20% difference to the final multiplication bench. Unless
they have worked hard on reducing overheads for very small
multiplications, I'm failing to see how they've gotten times down by
anything like what is required to give a 10% difference on the overall
multiply bench....]

Anyhow, for the computation:

----------

Take the example from mpirbench of multiplying two 131072 bit numbers.
On a 64 bit machine this is 2048 limbs.

Let's suppose crossovers were 30 limbs for karatsuba, 90 limbs for
Toom 3 and 500 limbs for Toom 4.

First let's use Toom 3:

ceil(2048/3) = 683 limbs. So there'd be one multiplication of 682
limbs and 4 of 683.

ceil(682/3) = 228, so there'd be one 226 limbs and four 228 limbs.
ceil(683/3) = 228, so there'd be one 227 limbs, and four 228 limbs.

ceil(228/3) = 76, so there'd be five 76 limbs.
ceil(227/3) = 76 so there'd be one 75 limbs, four 76
ceil(226/3) = 76 so there'd be one 74 limbs, four 76

We have a total of ((74 + 4*76) + 4*(5*76)) + 4*((75 + 4*76) + 4*
(5*76))) = 74 + 4*75 + 120*76 limb multiplications.

Now lets use Toom 4:

2048/4 = 512 limbs, so there'd be seven multiplications of 512 limbs.

512/4 = 128 limbs, so there'd be seven multiplications of 128 limbs.

ceil(128/3) = 43, so there'd be one of 42 limbs and 4 of 43 limbs

We have a total of 49*42 +196*43 limb mults.

Let's be charitable and simply with 125 x 74 limb mults for Toom 3 and
245 x 43 limbs mults for Toom 4.

Now 74/2 = 37 so there'd be three 37 limb mults

ceil(37/2) = 19, so there'd be two 19 limb mults and one 18 limb
mult.

In total we have 3*(2*19^2 + 18^2) = 3138.

ceil(43/2) = 22, so there'd be two 22 limb mults and one 21 limb mult.

In total we have 2*22^2 + 21^2 =  1409.

So in linear terms, Toom 4 takes roughly 1409 * 245 = 345205 whereas
Toom 3 takes 3138 * 125 = 392250. (Read apples vs bananas for units.)

Bill.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to