The GMP sqr_basecase code was faster on westmere, sandybridge and netburst.

Jeff Gilchrist had a nehalem.

Leif Lionhardy had a bobcat.

The build farm used to have an atom, but it is down at the moment.

I also think there is a (performance) bug in the kara_sqr_n code, as it is
faster at 31 limbs than at 30 across numerous architectures. I'll look into
that now.

Bill.


On 17 February 2014 17:52, Bill Hart <goodwillh...@googlemail.com> wrote:

> I finished comparing the speed of squaring between GMP 5.1 and MPIR 2.6
> (with some new patches).
>
> Our sqr_basecase was much slower on both the k10 and Penryn. So I have now
> switched MPIR to use the GMP sqr_basecase.asm on these machines (it's in
> the mpn/x86_64 directory Brian if you are interested in it for Windows --
> though it uses a jump table for which there are some small m4 macros in
> mpn/x86_64/x86_64-defs.m4).
>
> I also found a substantial slowdown in the fft squaring. It was calling
> mpn_mul_n rather than mpn_sqr in the pointwise mults, when squaring. I've
> now fixed this.
>
> Now on both machines MPIR is as fast or faster than GMP for all ranges for
> squaring, with the following exceptions:
>
> penryn: karatsuba between 20-30 limbs GMP is faster by a bit
> k10: karatsuba between 30-40 limbs GMP is faster by a *lot*
>
> penryn: fft up to 25000 limbs, GMP sometimes wins by a very small margin,
> e.g. 1-5% at most.
>
> I think the first two are the same problem. We obviously have some
> overhead in our mpn_kara_sqr_n function for small sizes. I'll make a ticket
> for this. Perhaps it is because we do not pass temporary space into the
> function.
>
> The third issue is somewhat surprising but can presumably be fixed by
> removing some fft overheads for small sizes. I'll also make a ticket for
> this.
>
> At this stage I have not checked whether the GMP sqr_basecase is faster on
> sandybridge, netburst, westmere, nehalem, bobcat or atom. I expect it will
> be. I'll also make a ticket for this.
>
> Bill.
>
>
> On 17 February 2014 14:35, Bill Hart <goodwillh...@googlemail.com> wrote:
>
>> Hi all,
>>
>> I just did some timings of MPIR 5.1.3 vs MPIR 2.6.0 to see if we could
>> benefit from using any of the GMP (balanced) integer multiplication code,
>> especially toom6, which we don't currently have.
>>
>> I did timings on an AMD k10 and and Intel Core2 Penryn in the following
>> ranges:
>>
>> basecase
>> karatsuba
>> toom3
>> toom4
>> toom6h (GMP only)
>> toom8h
>> fft (up to 100000 limbs)
>>
>> In the basecase range we are usually slightly faster on both machines,
>> with only a handful of exceptions where GMP has a slight win. There's
>> nothing we can do about those exceptions.
>>
>> In the karatsuba, toom3 and toom4 ranges we always win, as far as I can
>> see.
>>
>> GMP's does not use toom6h on Penryn. On the K10 our toom4 was faster than
>> GMP's toom6h in the relevant range. Even if we had toom6h, it would
>> essentially not be used.
>>
>> Our toom8h is exactly the same as in GMP, but because of our faster
>> basecase, we still win in this range.
>>
>> Our FFT is usually faster, with GMP having some wins at around 20000
>> limbs on both machines and 30000 limbs on the Penryn. After this point our
>> FFT becomes a lot faster than the current GMP FFT.
>>
>> So it seems that we can't do anything to speed up balanced multiplication.
>>
>> As our division code is now quite a bit faster after the two new
>> algorithms I implemented, I think we are in reasonable shape for basic
>> operations.
>>
>> I'll also take a look at squaring and report back. I had the impression
>> we we slower in the basecase range, which we may be able to fix.
>>
>> Bill.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/mpir-devel.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to