On Sat, 25 Sep 1999, Guillermo Ballester Valor wrote:

> Yes, certainly I've be able to adapt lucdwt and McLucasUNIX in four
> days. On the other hand, my goal only was to know if working with FFTW
> is a good idea, and timings obtained make me think it could be.


For really big FFTs you can get major gains by using FFTW as a building
block in a bigger program, rather than have it do your entire FFT with
a single function call. As Ernst mentioned, the building block approach 
lets you fold some of the forward and inverse FFT computations together,
and this saves loads of time in cache misses avoided. On the UltraSPARC,
using FFTW for the pieces of your computation rather than the whole thing
is somewhere between 2 and 10 times faster than FFTW alone.

> If your comparison were ported to intel machines, which is wrong, your
> code will run nearly as fast as mprime!!. You say your code is twice
> faster than FFTW, sure it is, *BUT* in my pentium-166 the short code I
> wrote do an iteration of my actual exponent 3975659 in 0.901 seconds
> while mprime take only 0.359. This makes a RPI=40%. Then, your code will
> reach nearly 90% !and without lots of assembler code!.

On the Pentium, assembly-coded small FFTs run more than twice as fast
as FFTW. Even from C, you can do better on the Pentium (do a web search
for djbfft, a free Pentium-optimized FFT package). For a recursive
split-radix, you need about 200 lines of assembly; surely this is worth
twice the speed!

jasonp


_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to