[EMAIL PROTECTED] (Guillermo Ballester Valor) writes:

>The first timing is no good as I expected. These are the result:
>
>F90 compiler : Microsoft Fortran Power station 4.0 
>   Optimizations: all optimizations enabled.
>
>                           mprime      lfftw      Mlucas
>                           (sec/iter)
>Exponent test M(3975659)    0.359      0.901       1.604

I'm not too surprised at this. Since my code appears to be faster than
FFTW on most high-end CPUs, that tells me that FFTW is probably optimized
more for the x86 (very few FP registers) than mine, which is geared toward
hardware with at least 32 FPR's. Normally this means that one uses smaller
complex FFT radices (say, 4 or 8) on machines with 8-16 FPR's, but Jason
Papadopoulos tells me that FFTW uses radices as high as 32. Perhaps they
use conditional compilation to decide what radices to use depending on the
underlying hardware, and use smaller radices on x86. (Any insights, Jason?)

If they do use radices >= 8 on x86, they are probably arranging the code
to minimize register pressure - this could be worth looking at.

>THE CLOCK TIMINGS WRITTEN BY MLUCAS ARE INCORRECT!. I had to timing with
>my hand-clock. It writes a lot more time than real.

Note that Mlucas uses elapsed time rather than CPU time, so if other stuff
is running, the printed time would be larger than CPU. But if your own
elapsed-time measure disagrees, that implies there is a bug with the f90
date_and_time intrinsic in the MS compiler - try to code a super-simple
program (hacked from mine, if you like - look in the source to see how
the character function char_time gets used in conjunction with the above
intrinsic. If you can reproduce the wrong-timing effect, send e-mail and
your sample code to MS compiler support.

I've had no problems with the time stuff on various Unix systems.

>If the rest of the lfftw code have a similar performance, you perhaps
>will reach a RPI>100%.

Oh, I already do, just not on every platform - but I'm working on it. :)

>I can sent you the executable Mlucas and my tested, no buggy, c-code.
>lfftw.c.

Sure, go ahead and send them, preferably Win-or-PK-zipped. You should
make at least the C code ftp'able as well, that way others can check it
out. If it's close to MacLucasUNIX on some systems, we would at least 
have a generic C code which allows non-power-of-2 runlengths (MLU doesn't).

>Well, sorry to mail you on Sunday.

I don't mind - I only check my e-mail if I have time (and on weekends,
the inclination) to do so.

Best regards,
Ernst

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to