In the x86_64 division branch I've put a x86_64 mpn_divexact_byBm1of which is 
ment to stand for divexact_by (B-1)/f ie B minus 1 over f
this is for exact division by any factor of B-1 ie 3,5,17,257,... , runs at 
the same speed as divexact_by3 
I could get exact division by 1.5 or 2.5   ie a one lshift for free
for exact division by 6 or 10 the one rshift should be cheap , two rshifts are 
dearer.
This should give us a few percent

On Tuesday 17 March 2009 20:35:52 Bill Hart wrote:
> I'm starting a new thread for Toom related stuff, as our last thread
> got a bit messed up. :-)
>
> I did a calculation of how much better Toom 4 should be than Toom 3.
> But it didn't really show what I thought it would. My hypothesis was
> that for numbers of limbs that are a power of 2, Toom 4 should be the
> clear winner, as Toom 3 always needs to "round up" the limb sizes. I
> think this is not really such a big issue in practice.
>
> Below is my calculation anyway, just in case it spurs someone to think
> of something, or I need it later. Note, I realise this is a pointless
> computation as I've not taken overheads and additions into account,
> etc.
>
> In practice for 2048 limbs, Paul Zimmermann's implementation of Toom 4
> is about 7% faster than our existing Toom 3 implementation.
>
> [I guess I'm still banging my head over the fact that basically we are
> 10% behind in multiplication on K8, 30% behind in division and 10%
> ahead in RSA compared with the up and coming GMP. A faster FFT is only
> going to make 20% difference to the final multiplication bench. Unless
> they have worked hard on reducing overheads for very small
> multiplications, I'm failing to see how they've gotten times down by
> anything like what is required to give a 10% difference on the overall
> multiply bench....]
>
> Anyhow, for the computation:
>
> ----------
>
> Take the example from mpirbench of multiplying two 131072 bit numbers.
> On a 64 bit machine this is 2048 limbs.
>
> Let's suppose crossovers were 30 limbs for karatsuba, 90 limbs for
> Toom 3 and 500 limbs for Toom 4.
>
> First let's use Toom 3:
>
> ceil(2048/3) = 683 limbs. So there'd be one multiplication of 682
> limbs and 4 of 683.
>
> ceil(682/3) = 228, so there'd be one 226 limbs and four 228 limbs.
> ceil(683/3) = 228, so there'd be one 227 limbs, and four 228 limbs.
>
> ceil(228/3) = 76, so there'd be five 76 limbs.
> ceil(227/3) = 76 so there'd be one 75 limbs, four 76
> ceil(226/3) = 76 so there'd be one 74 limbs, four 76
>
> We have a total of ((74 + 4*76) + 4*(5*76)) + 4*((75 + 4*76) + 4*
> (5*76))) = 74 + 4*75 + 120*76 limb multiplications.
>
> Now lets use Toom 4:
>
> 2048/4 = 512 limbs, so there'd be seven multiplications of 512 limbs.
>
> 512/4 = 128 limbs, so there'd be seven multiplications of 128 limbs.
>
> ceil(128/3) = 43, so there'd be one of 42 limbs and 4 of 43 limbs
>
> We have a total of 49*42 +196*43 limb mults.
>
> Let's be charitable and simply with 125 x 74 limb mults for Toom 3 and
> 245 x 43 limbs mults for Toom 4.
>
> Now 74/2 = 37 so there'd be three 37 limb mults
>
> ceil(37/2) = 19, so there'd be two 19 limb mults and one 18 limb
> mult.
>
> In total we have 3*(2*19^2 + 18^2) = 3138.
>
> ceil(43/2) = 22, so there'd be two 22 limb mults and one 21 limb mult.
>
> In total we have 2*22^2 + 21^2 =  1409.
>
> So in linear terms, Toom 4 takes roughly 1409 * 245 = 345205 whereas
> Toom 3 takes 3138 * 125 = 392250. (Read apples vs bananas for units.)
>
> Bill.
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to