[mpir-devel] Re: FFT Tune Question & App Benchmark

jason Thu, 26 Feb 2009 16:01:24 -0800

On Thursday 26 February 2009 22:15:35 Bill Hart wrote:
> Hi Jeff,
>
> The FFT currently kicks in at about 5000-7000 limbs or so. Given that
> each 64 bit limb is about 19 decimal digits that is at least 100,000
> decimal digits before the FFT is used. In a later version of MPIR
> we'll have an FFT which kicks in much earlier (about 2000 limbs), but
> it is a big project to change it over.
>
> I guess you know that tuneup only prints the values out. It doesn't
> change the values in the relevant gmp-param.h.
>
> But what you are probably observing is that the default values are
> probably relatively well tuned for your hardware.
>
> As for MPIR 1.0.0, we can't wait either. Soon I hope...
>
> Thanks for your comments and timings. That sort of feedback encourages us.
>
> Bill.
>
> 2009/2/26 Jeff Gilchrist <jeff.gilchr...@gmail.com>:
> > First let me say that it has been very interesting following this
> > devel list with all the talk about optimizing MPIR on AMD and Core2
> > architectures.  I can't wait for release 1.0 so I can start using that
> > with nice Windows support for some Windows factoring binaries I
> > maintain (http://gilchrist.ca/jeff/factoring/).
> >
> > I have a question about the tune/tuneup application.  It says in the
> > MPIR docs that if you are dealing with very large numbers, I should
> > use -f NNN also very large but it doesn't say what is a very large
> > number.  Would a 25088 digit number (2^83339+1)/3 be considered very
> > large?
> >
> > On an AMD Opteron 248 system with 64bit Linux 2.6.9-67, I tried
> > compiling my application with the default ./configure which selected
> > mpn/x86_64/amd64/gmp-mparam.h, I also did a make tune, and tried
> > tuneup -f 15000000.  I just picked 15 million since I didn't know what
> > value of NNN I should be choosing.  The -f 15M gmp-mparam.h generated
> > the following FFT values:
> >
> > #define MUL_FFT_TABLE  { 592, 1440, 3136, 5888, 15360, 36864, 114688,
> > 327680, 1310720, 5242880, 12582912, 0 }
> > #define MUL_FFT_MODF_THRESHOLD          608
> > #define MUL_FFT_THRESHOLD              4992
> >
> > #define SQR_FFT_TABLE  { 656, 1312, 3136, 5888, 15360, 28672, 81920,
> > 327680, 1310720, 3145728, 12582912, 0 }
> > #define SQR_FFT_MODF_THRESHOLD          672
> > #define SQR_FFT_THRESHOLD              4736
> >
> > When I timed my code, I basically got identical times for the generic
> > config, ./tuneup with no command line, and ./tuneup -f 15000000 so I'm
> > wondering if my code would even be using the FFT for multiplies at all
> > or if large number means something much larger?  Or at what point does
> > the FFT code start getting used for multiplies?
> >
> > My code does a large number of squaring, in a loop.  r quickly gets to
> > the same size as 2^q, in this case q = 83339 so 25088 digits:
> > // q = 83339
> > // r = 6
> > // p = (2^q+1) / 3
> > q_uint = mpz_get_ui(q);
> > for (j=1; j <= q_uint; j++)
> > {
> >        // t = r^2 - 2
> >        mpz_mul(t, r, r);
> >        mpz_sub_ui(t, t, 2);
> >        // r = t | p
> >        mpz_mod(r, t, p);
> > }
> >


If this is a typical example that you are interested in , then a faster way is 
to reduce t mod 2^q+1 which can be done with one subtraction , this gives you 
a remainder whichis nearly right , then while(remainder >(2^q+1)/3)subtract 
(2^q+1)/2 from the remaninder , you will only need to do this at most twice.

So we have swapped a mod operation for a few subtractions , this will be 
around 3 to 4 times faster (assuming 1 mod=2 to 3 muls)
If q is divisible by a high power of two then we can double the speed yet 
again.

so to reduce t mod 2^q+1 , subtract the upper q bits from the lower q bits , 
and add back in any borrow from that subtraction

> > The nice thing is that MPIR 0.9 is much faster than stock GMP 4.2.4
> > with this real-world application:
> > MPIR 0.9 (./configure + ./tuneup -f 15000000) = 2m13.679s
> > MPIR 0.9 (./configure) = 2m13.842s
> > MPIR 0.9 (./configure + ./tuneup) = 2m13.979s
> > GMP 4.2.4 (./configure) = 2m56.145s
> >
> > For comparison, I keep forgetting how unbearably slow 32bit math code
> > is (using \mpn\x86\pentium4\sse2\gmp-mparam.h):
> > Intel Core2 E6600 @ 2.4GHz (Vista 32bit)
> > GMP 4.2.4 (./configure) = 8m24.912s
> > MPIR 0.9 (./configure) = 8m38.189s
> >
> > Regards,
> > Jeff.
>
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

[mpir-devel] Re: FFT Tune Question & App Benchmark

Reply via email to