Great. I should get back to Toom 4 soon.

Bill.

2009/3/19 Jason Moxham <ja...@njkfrudils.plus.com>:
>
>
> In the x86_64 division branch I've put a x86_64 mpn_divexact_byBm1of which is
> ment to stand for divexact_by (B-1)/f ie B minus 1 over f
> this is for exact division by any factor of B-1 ie 3,5,17,257,... , runs at
> the same speed as divexact_by3
> I could get exact division by 1.5 or 2.5   ie a one lshift for free
> for exact division by 6 or 10 the one rshift should be cheap , two rshifts are
> dearer.
> This should give us a few percent
>
> On Tuesday 17 March 2009 20:35:52 Bill Hart wrote:
>> I'm starting a new thread for Toom related stuff, as our last thread
>> got a bit messed up. :-)
>>
>> I did a calculation of how much better Toom 4 should be than Toom 3.
>> But it didn't really show what I thought it would. My hypothesis was
>> that for numbers of limbs that are a power of 2, Toom 4 should be the
>> clear winner, as Toom 3 always needs to "round up" the limb sizes. I
>> think this is not really such a big issue in practice.
>>
>> Below is my calculation anyway, just in case it spurs someone to think
>> of something, or I need it later. Note, I realise this is a pointless
>> computation as I've not taken overheads and additions into account,
>> etc.
>>
>> In practice for 2048 limbs, Paul Zimmermann's implementation of Toom 4
>> is about 7% faster than our existing Toom 3 implementation.
>>
>> [I guess I'm still banging my head over the fact that basically we are
>> 10% behind in multiplication on K8, 30% behind in division and 10%
>> ahead in RSA compared with the up and coming GMP. A faster FFT is only
>> going to make 20% difference to the final multiplication bench. Unless
>> they have worked hard on reducing overheads for very small
>> multiplications, I'm failing to see how they've gotten times down by
>> anything like what is required to give a 10% difference on the overall
>> multiply bench....]
>>
>> Anyhow, for the computation:
>>
>> ----------
>>
>> Take the example from mpirbench of multiplying two 131072 bit numbers.
>> On a 64 bit machine this is 2048 limbs.
>>
>> Let's suppose crossovers were 30 limbs for karatsuba, 90 limbs for
>> Toom 3 and 500 limbs for Toom 4.
>>
>> First let's use Toom 3:
>>
>> ceil(2048/3) = 683 limbs. So there'd be one multiplication of 682
>> limbs and 4 of 683.
>>
>> ceil(682/3) = 228, so there'd be one 226 limbs and four 228 limbs.
>> ceil(683/3) = 228, so there'd be one 227 limbs, and four 228 limbs.
>>
>> ceil(228/3) = 76, so there'd be five 76 limbs.
>> ceil(227/3) = 76 so there'd be one 75 limbs, four 76
>> ceil(226/3) = 76 so there'd be one 74 limbs, four 76
>>
>> We have a total of ((74 + 4*76) + 4*(5*76)) + 4*((75 + 4*76) + 4*
>> (5*76))) = 74 + 4*75 + 120*76 limb multiplications.
>>
>> Now lets use Toom 4:
>>
>> 2048/4 = 512 limbs, so there'd be seven multiplications of 512 limbs.
>>
>> 512/4 = 128 limbs, so there'd be seven multiplications of 128 limbs.
>>
>> ceil(128/3) = 43, so there'd be one of 42 limbs and 4 of 43 limbs
>>
>> We have a total of 49*42 +196*43 limb mults.
>>
>> Let's be charitable and simply with 125 x 74 limb mults for Toom 3 and
>> 245 x 43 limbs mults for Toom 4.
>>
>> Now 74/2 = 37 so there'd be three 37 limb mults
>>
>> ceil(37/2) = 19, so there'd be two 19 limb mults and one 18 limb
>> mult.
>>
>> In total we have 3*(2*19^2 + 18^2) = 3138.
>>
>> ceil(43/2) = 22, so there'd be two 22 limb mults and one 21 limb mult.
>>
>> In total we have 2*22^2 + 21^2 =  1409.
>>
>> So in linear terms, Toom 4 takes roughly 1409 * 245 = 345205 whereas
>> Toom 3 takes 3138 * 125 = 392250. (Read apples vs bananas for units.)
>>
>> Bill.
>>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to