On Thursday 26 February 2009 22:15:35 Bill Hart wrote: > Hi Jeff, > > The FFT currently kicks in at about 5000-7000 limbs or so. Given that > each 64 bit limb is about 19 decimal digits that is at least 100,000 > decimal digits before the FFT is used. In a later version of MPIR > we'll have an FFT which kicks in much earlier (about 2000 limbs), but > it is a big project to change it over. > > I guess you know that tuneup only prints the values out. It doesn't > change the values in the relevant gmp-param.h. > > But what you are probably observing is that the default values are > probably relatively well tuned for your hardware. > > As for MPIR 1.0.0, we can't wait either. Soon I hope... > > Thanks for your comments and timings. That sort of feedback encourages us. > > Bill. > > 2009/2/26 Jeff Gilchrist <jeff.gilchr...@gmail.com>: > > First let me say that it has been very interesting following this > > devel list with all the talk about optimizing MPIR on AMD and Core2 > > architectures. I can't wait for release 1.0 so I can start using that > > with nice Windows support for some Windows factoring binaries I > > maintain (http://gilchrist.ca/jeff/factoring/). > > > > I have a question about the tune/tuneup application. It says in the > > MPIR docs that if you are dealing with very large numbers, I should > > use -f NNN also very large but it doesn't say what is a very large > > number. Would a 25088 digit number (2^83339+1)/3 be considered very > > large? > > > > On an AMD Opteron 248 system with 64bit Linux 2.6.9-67, I tried > > compiling my application with the default ./configure which selected > > mpn/x86_64/amd64/gmp-mparam.h, I also did a make tune, and tried > > tuneup -f 15000000. I just picked 15 million since I didn't know what > > value of NNN I should be choosing. The -f 15M gmp-mparam.h generated > > the following FFT values: > > > > #define MUL_FFT_TABLE { 592, 1440, 3136, 5888, 15360, 36864, 114688, > > 327680, 1310720, 5242880, 12582912, 0 } > > #define MUL_FFT_MODF_THRESHOLD 608 > > #define MUL_FFT_THRESHOLD 4992 > > > > #define SQR_FFT_TABLE { 656, 1312, 3136, 5888, 15360, 28672, 81920, > > 327680, 1310720, 3145728, 12582912, 0 } > > #define SQR_FFT_MODF_THRESHOLD 672 > > #define SQR_FFT_THRESHOLD 4736 > > > > When I timed my code, I basically got identical times for the generic > > config, ./tuneup with no command line, and ./tuneup -f 15000000 so I'm > > wondering if my code would even be using the FFT for multiplies at all > > or if large number means something much larger? Or at what point does > > the FFT code start getting used for multiplies? > > > > My code does a large number of squaring, in a loop. r quickly gets to > > the same size as 2^q, in this case q = 83339 so 25088 digits: > > // q = 83339 > > // r = 6 > > // p = (2^q+1) / 3 > > q_uint = mpz_get_ui(q); > > for (j=1; j <= q_uint; j++) > > { > > // t = r^2 - 2 > > mpz_mul(t, r, r); > > mpz_sub_ui(t, t, 2); > > // r = t | p > > mpz_mod(r, t, p); > > } > >
If this is a typical example that you are interested in , then a faster way is to reduce t mod 2^q+1 which can be done with one subtraction , this gives you a remainder whichis nearly right , then while(remainder >(2^q+1)/3)subtract (2^q+1)/2 from the remaninder , you will only need to do this at most twice. So we have swapped a mod operation for a few subtractions , this will be around 3 to 4 times faster (assuming 1 mod=2 to 3 muls) If q is divisible by a high power of two then we can double the speed yet again. so to reduce t mod 2^q+1 , subtract the upper q bits from the lower q bits , and add back in any borrow from that subtraction > > The nice thing is that MPIR 0.9 is much faster than stock GMP 4.2.4 > > with this real-world application: > > MPIR 0.9 (./configure + ./tuneup -f 15000000) = 2m13.679s > > MPIR 0.9 (./configure) = 2m13.842s > > MPIR 0.9 (./configure + ./tuneup) = 2m13.979s > > GMP 4.2.4 (./configure) = 2m56.145s > > > > For comparison, I keep forgetting how unbearably slow 32bit math code > > is (using \mpn\x86\pentium4\sse2\gmp-mparam.h): > > Intel Core2 E6600 @ 2.4GHz (Vista 32bit) > > GMP 4.2.4 (./configure) = 8m24.912s > > MPIR 0.9 (./configure) = 8m38.189s > > > > Regards, > > Jeff. > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---