[mpir-devel] Re: Status on parallel algorithms and benchmark

Bill Hart Sat, 18 Apr 2009 18:18:58 -0700

Another data point. At 700 limbs, our Toom 7, which is slightly less
optimised than the Toom 4, is consistently about 0.5% faster than the
Toom 4.


I'm sure you'll be proved right in the end. But I wouldn't like to
call it either way at this stage.

Bill.

2009/4/19 Bill Hart <goodwillh...@googlemail.com>:
> Here's some things to consider:
>
> * I timed MPIR and GMP 4.3.0 at 700 limbs, just below our Toom 4/7
> cutoff. There is a 6 % difference. Adjusting for the speed of the
> assembly code which will be in our next release, there is zero
> difference.
>
> * Our Toom 3 is optimised in a completely different way to GMP 4.3.0,
> perhaps pushing our cutoffs higher. Whether Toom 4 can be optimised as
> effectively as Toom 3 remains to be seen.
>
> * At 700 limbs Toom 4 gives GMP 4.3.0 3% over its Toom 3. The
> optimisations we make to our Toom 3 give us about 2% over our original
> Toom 3. I didn't look to see if any improvements have been made to the
> GMP Toom 3 since the last version, or whether they make any measurable
> difference. I recall reading of a saving of half an operation on
> Bodrato's website. I'm pretty sure our original Toom 3 doesn't have
> this.
>
> There is no question our Toom 4 can be optimised beyond its current
> level but it is unclear if this will make more than 1% difference. In
> the case of Toom 7, things seem hopeless. Optimising any single
> operation, except one of the mp multiplies makes no measurable
> difference. I've tried optimising up to three scalar operations at a
> time and still not been able to measure the effect.
>
> It is true that switching to a "fixed" precision solution like GMP
> instead of the current sign/size solution, would make it easier to
> optimise. But this affects other things negatively.
>
> At present there are factors of 2 or 4 to be had elsewhere in the
> library or even optimisations which will give a uniform speedup for
> almost everything of a few percent. Those seem far more important at
> present.
>
> Bill.
>
> 2009/4/19 Bill Hart <goodwillh...@googlemail.com>:
>> Both of the algorithms are similarly optimised. Actually Toom 4 is
>> slightly more optimised than Toom 7.
>>
>> You may be right, and no "premature decision" has been made. However
>> it occurs to me that Toom 7 breaks into smaller chunks, so the
>> multiplications are cheaper.
>>
>> The crossover is currently 700. Toom 7 breaks into chunks of 100 and
>> 13 multiplies are done. Toom 4 breaks into chunks of 175 and 5
>> multiplies are done. It seems to me to be quite possible for Toom 7 to
>> always beat Toom 4, or nearly so purely because of this sort of thing.
>> I'm not saying this is the case. I'm just saying it may be the case.
>>
>> We are working on lots of assembly code to optimise Toom 4 and Toom 7.
>> As for managing the sign/size stuff, I haven't decided whether I want
>> to do it this way in the long run or not. It has some advantages which
>> the other method does not. One thing I can say for sure is that the
>> difference in speed in the generic case is negligible. The other
>> optimisations we are going to be doing will make a much greater
>> difference.
>>
>> Bill.
>>
>> 2009/4/19 David Harvey <dmhar...@cims.nyu.edu>:
>>>
>>>
>>>
>>> On Apr 15, 7:11 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>>>
>>>> Clicking on the graph does make it bigger, but yeah, the critical
>>>> region we are talking about is a bit compressed.
>>>
>>> I just took a look at the toom4 code. It looks like there is a lot of
>>> unnecessary sign/size management and data copying. I would suggest
>>> some serious optimisation work before making a premature decision to
>>> dump toom4 in favour of toom7. I can't think of any earthly reason why
>>> a toom7 would beat toom4/5/6 near the toom3-toom4 threshold. It's like
>>> claiming that a strassen matrix multiply written in C is fast for a
>>> 10x10 matrix because you're comparing against an n^3 algorithm written
>>> in python. Maybe the C-python analogy is unfair, but you get my drift.
>>>
>>> david
>>>
>>> >>>
>>>
>>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

[mpir-devel] Re: Status on parallel algorithms and benchmark

Reply via email to