Simon Burge wrote:

> I'll let you know of some results soon.

My early tests on a 200MHz UltraSparc are not that encouraging.

        ./MacLucasUNIX -C -S 10 1400001
        speed: 10 iters in 1.668 seconds, 0.167 iters/sec (fft len 128k)
        speed: 10 iters in 1.688 seconds, 0.169 iters/sec (fft len 128k)
        ./MacLucasFFTW -C -S 10 1400001
        speed: 10 iters in 1.471 seconds, 0.147 iters/sec (fft len 72k)
        speed: 10 iters in 1.484 seconds, 0.148 iters/sec (fft len 72k)
        ./MacLucasFFTW2 -C -S 10 1400001
        speed: 10 iters in 1.229 seconds, 0.123 iters/sec (fft len 72k)
        speed: 10 iters in 1.251 seconds, 0.125 iters/sec (fft len 72k)

        ./MacLucasUNIX -C -S 10 4609273
        speed: 10 iters in 3.820 seconds, 0.382 iters/sec (fft len 256k)
        speed: 10 iters in 3.783 seconds, 0.378 iters/sec (fft len 256k)
        ./MacLucasFFTW  -C -S 10 4609273
        speed: 10 iters in 7.513 seconds, 0.751 iters/sec (fft len 240k)
        speed: 10 iters in 7.550 seconds, 0.755 iters/sec (fft len 240k)
        ./MacLucasFFTW2 -C -S 10 4609273
        speed: 10 iters in 5.009 seconds, 0.501 iters/sec (fft len 240k)
        speed: 10 iters in 5.011 seconds, 0.501 iters/sec (fft len 240k)

        ./MacLucasUNIX -C -S 3 11600001
        speed: 3 iters in 4.845 seconds, 1.615 iters/sec (fft len 1024k)
        speed: 3 iters in 4.809 seconds, 1.603 iters/sec (fft len 1024k)
        ./MacLucasFFTW -C -S 3 11600001
        speed: 3 iters in 6.165 seconds, 2.055 iters/sec (fft len 640k)
        speed: 3 iters in 6.187 seconds, 2.062 iters/sec (fft len 640k)
        ./MacLucasFFTW2 -C -S 3 11600001
        speed: 3 iters in 4.138 seconds, 1.379 iters/sec (fft len 640k)
        speed: 3 iters in 4.141 seconds, 1.380 iters/sec (fft len 640k)

The -C means don't checkpoint ever and -S N means print a speed update
every N iterations.  MacLucasFFTW2 is hard coded to use 2 threads.  The
case for 4609273 is iteresting, with nearly identical FFT lengths...

I'm assuming that you're seeing such a speed-up on Intel because of the
lack of registers that MacLucasUNIX likes, and FFTW is doing a better
job under these conditions.

Will - I'll send you a diff that I used for the timing stuff.

Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to