The GMP sqr_basecase code was faster on westmere, sandybridge and netburst.
Jeff Gilchrist had a nehalem. Leif Lionhardy had a bobcat. The build farm used to have an atom, but it is down at the moment. I also think there is a (performance) bug in the kara_sqr_n code, as it is faster at 31 limbs than at 30 across numerous architectures. I'll look into that now. Bill. On 17 February 2014 17:52, Bill Hart <goodwillh...@googlemail.com> wrote: > I finished comparing the speed of squaring between GMP 5.1 and MPIR 2.6 > (with some new patches). > > Our sqr_basecase was much slower on both the k10 and Penryn. So I have now > switched MPIR to use the GMP sqr_basecase.asm on these machines (it's in > the mpn/x86_64 directory Brian if you are interested in it for Windows -- > though it uses a jump table for which there are some small m4 macros in > mpn/x86_64/x86_64-defs.m4). > > I also found a substantial slowdown in the fft squaring. It was calling > mpn_mul_n rather than mpn_sqr in the pointwise mults, when squaring. I've > now fixed this. > > Now on both machines MPIR is as fast or faster than GMP for all ranges for > squaring, with the following exceptions: > > penryn: karatsuba between 20-30 limbs GMP is faster by a bit > k10: karatsuba between 30-40 limbs GMP is faster by a *lot* > > penryn: fft up to 25000 limbs, GMP sometimes wins by a very small margin, > e.g. 1-5% at most. > > I think the first two are the same problem. We obviously have some > overhead in our mpn_kara_sqr_n function for small sizes. I'll make a ticket > for this. Perhaps it is because we do not pass temporary space into the > function. > > The third issue is somewhat surprising but can presumably be fixed by > removing some fft overheads for small sizes. I'll also make a ticket for > this. > > At this stage I have not checked whether the GMP sqr_basecase is faster on > sandybridge, netburst, westmere, nehalem, bobcat or atom. I expect it will > be. I'll also make a ticket for this. > > Bill. > > > On 17 February 2014 14:35, Bill Hart <goodwillh...@googlemail.com> wrote: > >> Hi all, >> >> I just did some timings of MPIR 5.1.3 vs MPIR 2.6.0 to see if we could >> benefit from using any of the GMP (balanced) integer multiplication code, >> especially toom6, which we don't currently have. >> >> I did timings on an AMD k10 and and Intel Core2 Penryn in the following >> ranges: >> >> basecase >> karatsuba >> toom3 >> toom4 >> toom6h (GMP only) >> toom8h >> fft (up to 100000 limbs) >> >> In the basecase range we are usually slightly faster on both machines, >> with only a handful of exceptions where GMP has a slight win. There's >> nothing we can do about those exceptions. >> >> In the karatsuba, toom3 and toom4 ranges we always win, as far as I can >> see. >> >> GMP's does not use toom6h on Penryn. On the K10 our toom4 was faster than >> GMP's toom6h in the relevant range. Even if we had toom6h, it would >> essentially not be used. >> >> Our toom8h is exactly the same as in GMP, but because of our faster >> basecase, we still win in this range. >> >> Our FFT is usually faster, with GMP having some wins at around 20000 >> limbs on both machines and 30000 limbs on the Penryn. After this point our >> FFT becomes a lot faster than the current GMP FFT. >> >> So it seems that we can't do anything to speed up balanced multiplication. >> >> As our division code is now quite a bit faster after the two new >> algorithms I implemented, I think we are in reasonable shape for basic >> operations. >> >> I'll also take a look at squaring and report back. I had the impression >> we we slower in the basecase range, which we may be able to fix. >> >> Bill. >> > > -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to mpir-devel+unsubscr...@googlegroups.com. To post to this group, send email to mpir-devel@googlegroups.com. Visit this group at http://groups.google.com/group/mpir-devel. For more options, visit https://groups.google.com/groups/opt_out.