Quoting John R Pierce <[EMAIL PROTECTED]>: > but the 64 bit FPU multiply has only 52(?) bits of significance (the rest > goes to exponent), and it generates a 52(?) bit result, so doing a high > precision multiply requires MORE operations.
The largest difference this can cause is a floating point FFT needing a runlength twice as much as an integer FFT. Prime95 includes all sorts of non-power-of-two FFTs to reduce this largest difference; even if it did not, doubling the FFT size increases the runtime by ~2.1-2.5x, much less than the typical slowdown from switching to an integer FFT. Error- free squarings are nice though...you can take shortcuts that would be impossible for an FPU. Remember also that an integer FFT requires all results to be reduced modulo some prime number. For general primes this modular multiplication requires at least four 64x64->64 multiplies, and these typically cannot pipeline at all (if you have a pipelined multiplier, you've got to overlap several independent modmul operations; if you don't have a pipelined multiplier, you'll get dismal performance no matter what you do). There are special primes for which modular reductions are cheap; using one of these primes and a Fast Galois Transform, the fastest integer FFT squaring I've managed is ~15% slower than Mlucas on an opteron. jasonp ------------------------------------------------------ This message was sent using BOO.net's Webmail. http://www.boo.net/ _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
