I found and fixed the bug in kara_sqr_n. It was calling the multiplication
code up to the wrong threshold (on penryn and k10 it shouldn't have called
it at all).

We now need to retune the crossovers for all arches.

The performance issue has now gone. In fact the performance improvement for
small squares was huge!

Bill.


On 17 February 2014 20:57, Bill Hart <goodwillh...@googlemail.com> wrote:

> The GMP sqr_basecase code was faster on westmere, sandybridge and
> netburst.
>
> Jeff Gilchrist had a nehalem.
>
> Leif Lionhardy had a bobcat.
>
> The build farm used to have an atom, but it is down at the moment.
>
> I also think there is a (performance) bug in the kara_sqr_n code, as it is
> faster at 31 limbs than at 30 across numerous architectures. I'll look into
> that now.
>
> Bill.
>
>
> On 17 February 2014 17:52, Bill Hart <goodwillh...@googlemail.com> wrote:
>
>> I finished comparing the speed of squaring between GMP 5.1 and MPIR 2.6
>> (with some new patches).
>>
>> Our sqr_basecase was much slower on both the k10 and Penryn. So I have
>> now switched MPIR to use the GMP sqr_basecase.asm on these machines (it's
>> in the mpn/x86_64 directory Brian if you are interested in it for Windows
>> -- though it uses a jump table for which there are some small m4 macros in
>> mpn/x86_64/x86_64-defs.m4).
>>
>> I also found a substantial slowdown in the fft squaring. It was calling
>> mpn_mul_n rather than mpn_sqr in the pointwise mults, when squaring. I've
>> now fixed this.
>>
>> Now on both machines MPIR is as fast or faster than GMP for all ranges
>> for squaring, with the following exceptions:
>>
>> penryn: karatsuba between 20-30 limbs GMP is faster by a bit
>> k10: karatsuba between 30-40 limbs GMP is faster by a *lot*
>>
>> penryn: fft up to 25000 limbs, GMP sometimes wins by a very small margin,
>> e.g. 1-5% at most.
>>
>> I think the first two are the same problem. We obviously have some
>> overhead in our mpn_kara_sqr_n function for small sizes. I'll make a ticket
>> for this. Perhaps it is because we do not pass temporary space into the
>> function.
>>
>> The third issue is somewhat surprising but can presumably be fixed by
>> removing some fft overheads for small sizes. I'll also make a ticket for
>> this.
>>
>> At this stage I have not checked whether the GMP sqr_basecase is faster
>> on sandybridge, netburst, westmere, nehalem, bobcat or atom. I expect it
>> will be. I'll also make a ticket for this.
>>
>> Bill.
>>
>>
>> On 17 February 2014 14:35, Bill Hart <goodwillh...@googlemail.com> wrote:
>>
>>> Hi all,
>>>
>>> I just did some timings of MPIR 5.1.3 vs MPIR 2.6.0 to see if we could
>>> benefit from using any of the GMP (balanced) integer multiplication code,
>>> especially toom6, which we don't currently have.
>>>
>>> I did timings on an AMD k10 and and Intel Core2 Penryn in the following
>>> ranges:
>>>
>>> basecase
>>> karatsuba
>>> toom3
>>> toom4
>>> toom6h (GMP only)
>>> toom8h
>>> fft (up to 100000 limbs)
>>>
>>> In the basecase range we are usually slightly faster on both machines,
>>> with only a handful of exceptions where GMP has a slight win. There's
>>> nothing we can do about those exceptions.
>>>
>>> In the karatsuba, toom3 and toom4 ranges we always win, as far as I can
>>> see.
>>>
>>> GMP's does not use toom6h on Penryn. On the K10 our toom4 was faster
>>> than GMP's toom6h in the relevant range. Even if we had toom6h, it would
>>> essentially not be used.
>>>
>>> Our toom8h is exactly the same as in GMP, but because of our faster
>>> basecase, we still win in this range.
>>>
>>> Our FFT is usually faster, with GMP having some wins at around 20000
>>> limbs on both machines and 30000 limbs on the Penryn. After this point our
>>> FFT becomes a lot faster than the current GMP FFT.
>>>
>>> So it seems that we can't do anything to speed up balanced
>>> multiplication.
>>>
>>> As our division code is now quite a bit faster after the two new
>>> algorithms I implemented, I think we are in reasonable shape for basic
>>> operations.
>>>
>>> I'll also take a look at squaring and report back. I had the impression
>>> we we slower in the basecase range, which we may be able to fix.
>>>
>>> Bill.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/mpir-devel.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to