[mpir-devel] Re: Toom6 in MPIR

Bill Hart Mon, 17 Feb 2014 08:52:30 -0800

I finished comparing the speed of squaring between GMP 5.1 and MPIR 2.6
(with some new patches).

Our sqr_basecase was much slower on both the k10 and Penryn. So I have now
switched MPIR to use the GMP sqr_basecase.asm on these machines (it's in
the mpn/x86_64 directory Brian if you are interested in it for Windows --
though it uses a jump table for which there are some small m4 macros in
mpn/x86_64/x86_64-defs.m4).

I also found a substantial slowdown in the fft squaring. It was calling
mpn_mul_n rather than mpn_sqr in the pointwise mults, when squaring. I've
now fixed this.

Now on both machines MPIR is as fast or faster than GMP for all ranges for
squaring, with the following exceptions:

penryn: karatsuba between 20-30 limbs GMP is faster by a bit
k10: karatsuba between 30-40 limbs GMP is faster by a *lot*

penryn: fft up to 25000 limbs, GMP sometimes wins by a very small margin,
e.g. 1-5% at most.

I think the first two are the same problem. We obviously have some overhead
in our mpn_kara_sqr_n function for small sizes. I'll make a ticket for
this. Perhaps it is because we do not pass temporary space into the
function.

The third issue is somewhat surprising but can presumably be fixed by
removing some fft overheads for small sizes. I'll also make a ticket for
this.

At this stage I have not checked whether the GMP sqr_basecase is faster on
sandybridge, netburst, westmere, nehalem, bobcat or atom. I expect it will
be. I'll also make a ticket for this.

Bill.

On 17 February 2014 14:35, Bill Hart <goodwillh...@googlemail.com> wrote:

> Hi all,
>
> I just did some timings of MPIR 5.1.3 vs MPIR 2.6.0 to see if we could
> benefit from using any of the GMP (balanced) integer multiplication code,
> especially toom6, which we don't currently have.
>
> I did timings on an AMD k10 and and Intel Core2 Penryn in the following
> ranges:
>
> basecase
> karatsuba
> toom3
> toom4
> toom6h (GMP only)
> toom8h
> fft (up to 100000 limbs)
>
> In the basecase range we are usually slightly faster on both machines,
> with only a handful of exceptions where GMP has a slight win. There's
> nothing we can do about those exceptions.
>
> In the karatsuba, toom3 and toom4 ranges we always win, as far as I can
> see.
>
> GMP's does not use toom6h on Penryn. On the K10 our toom4 was faster than
> GMP's toom6h in the relevant range. Even if we had toom6h, it would
> essentially not be used.
>
> Our toom8h is exactly the same as in GMP, but because of our faster
> basecase, we still win in this range.
>
> Our FFT is usually faster, with GMP having some wins at around 20000 limbs
> on both machines and 30000 limbs on the Penryn. After this point our FFT
> becomes a lot faster than the current GMP FFT.
>
> So it seems that we can't do anything to speed up balanced multiplication.
>
> As our division code is now quite a bit faster after the two new
> algorithms I implemented, I think we are in reasonable shape for basic
> operations.
>
> I'll also take a look at squaring and report back. I had the impression we
> we slower in the basecase range, which we may be able to fix.
>
> Bill.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/mpir-devel.
For more options, visit https://groups.google.com/groups/opt_out.

[mpir-devel] Re: Toom6 in MPIR

Reply via email to