Simon Burge writes:

   Unless you're doing a timed run, maybe "kill -STOP pid" and "kill -START
   pid" on the ecm3 run might give more accurate results - I hate to think
   of what's happening to the cache...  I use this on machines that have
   mersenne1 running when users notice X load showing a constant load
   average of 1.0.

I tried this just before uploading the new beta with tunefftw.c in it
and MacLucasFFTW's speed improved by less than 3% over the run with
the tuning done while ecm3 was running.  So either the cache is not
the bottleneck or that Linux's context switching with 128 MB RAM is
quite good, as a guess.

   My early tests on a 200MHz UltraSparc are not that encouraging.
   [...times deleted...]

Sigh.:(  Though it does look like MacLucasFFTW is faster when the FFT
length is enough lower ... by, hm, about 10% ?

But MacLucasFFTW2 using two CPUs isn't as fast as two MacLucasUNIX's
running at the same time, is it?  I would guess not, from those
numbers.

   The -C means don't checkpoint ever and -S N means print a speed
   update every N iterations.  MacLucasFFTW2 is hard coded to use 2
   threads.  The case for 4609273 is iteresting, with nearly identical
   FFT lengths...

Yeah.:(

   I'm assuming that you're seeing such a speed-up on Intel because of the
   lack of registers that MacLucasUNIX likes, and FFTW is doing a better
   job under these conditions.

Looks that way, or something similar.

   Will - I'll send you a diff that I used for the timing stuff.

Yes, please do when you have time; it'll save me from having to
reimplement it.

                                                        Will
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to