I have ftp'd a new beta release of the mers package to:

http://www.garlic.com/~wedgingt/beta.tgz
                                beta.tar.gz
                                beta.zip
                                beta.shar

I'm announcing it to the mailing list because it includes a brand new
Lucas-Lehmer test program called MacLucasFFTW.  It was modified by
Guillermo Ballester Valor from MacLucasUNIX to use the FSF's FFTW
("Fastest Fourier Transform in the West") library, which is available
at www.fftw.org.  I have made some small changes to get it to pass the
mers package's simple Makefile tests on my machine and noone else has
tried it yet, so it's perhaps more of a late alpha version than a
beta, but ...

... I had this odd feeling that people would want it quickly, because
Guillermo reported that it's faster than MacLucasUNIX by a factor of
nearly three (!) (for the same FFT lengths) on his PentiumPro and it's
about a factor of two faster on my Pentium III 450 MHz.  Some of my
changes - short as they were - likely slowed it down, unfortunately.

More good news: FFTW supports non-power-of-two FFT lengths.  And
"tunes" itself to the hardware and software around it at run time, and
MacLucasFFTW saves this tuning information externally in a file for
next time, so more tuning only needs to be done for new, mostly
larger, FFT lengths.

Guillermo was able to use most of the mers package functions, so the
usage and input/output formats and so on are the same as the other
mers package LL test programs.

I've included Guillermo's README file, renamed to README.MacLucasFFTW,
essentially verbatim.

Guillermo also sent me a short program that does the tuning outside of
MacLucasFFTW; I will include that in a later release if there are
requests for it, but MacLucasFFTW can do the tuning itself just as
well, and even prints when it's tuning and for what FFT length.

FFTW itself installs with the usual configure and make, and also
includes a test suite, invoked via make as well.

However, there are some (small) drawbacks: MacLucasFFTW may not be
able to read checkpoint files produced by the other programs; I know
it cannot do so on Intel hardware, because the others, there, use
'long double', a 64 bit mantissa type, and FFTW does not appear to
work with that.  Guillermo and I have no non-Intel hardware to test
with, so someone out there will have to try that.  Of course, for many
exponents, MacLucasFFTW will use a smaller FFT length, so ...

The other Lucas-Lehmer test programs in the mers package will not be
able to read MacLucasFFTW checkpoint files correctly, at least for the
non-power-of-two FFT lengths.  They should, however, simply reject the
MacLucasFFTW checkpoint files that they can't use as invalid rather
than corrupting them or trying to "resume" from bogus data.

Lastly, I'm not sure where FFTW has been ported; machines not running
some flavor of UNIX may or may not be able to compile it: I simply
haven't had time to check.

The timings on my PIII/450MHz of the Makefile tests are:

time -v ./fftlucas -o- testLL.in | sed -e 's/, n = .*//' > test.fft
        User time (seconds): 784.40
        System time (seconds): 0.15
        Percent of CPU this job got: 49%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 26:25.99
time -v ./mersenne1 -o stdout testLL.in | sed -e 's/, n = .*//' > test.mers1
        User time (seconds): 711.40
        System time (seconds): 0.05
        Percent of CPU this job got: 49%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 23:47.97
time -v ./mersenne2 testLL.in | sed -e 's/, n = .*//' > test.mers2
        User time (seconds): 745.58
        System time (seconds): 0.03
        Percent of CPU this job got: 49%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 25:00.22
time -v ./MacLucasUNIX `cat testLL.in` | sed -e 's/, n = .*//' > test.mlu
        User time (seconds): 305.73
        System time (seconds): 0.02
        Percent of CPU this job got: 49%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 10:14.87
time -v ./MacLucasFFTW -ostdout `cat testLL.in` | sed -e 's/, n = .*//' -e '/^[^M]/d' 
-e '/^$/d' > test.mlf
        User time (seconds): 133.56
        System time (seconds): 0.06
        Percent of CPU this job got: 49%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:29.95

MacLucasFFTW had already done the tuning for the FFT lengths used
prior to this run, but I believe it would still be faster than
MacLucasUNIX even if it had not already done the tuning.

(The 49% CPU usage is because my computer was also doing a long term
ecm3 run, including during the FFTW tuning.)

Feel free to send me any questions, bug reports, and so on.  As he
notes in comments and his README file, I believe Guillermo welcomes
feedback as well.

                                                Will

http://www.garlic.com/~wedgingt/mersenne.html
                                beta.tar.gz
                                beta.tgz
                                beta.zip
                                beta.shar
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to