On Sun, 6 Feb 2000, Lars Lindley wrote:

> > However, the FFT itself is very amenable to parallel processing
> > techniques - on a processor with N independent compute pathways, you
> > can compute N elements in the same time that a single element would
> > take to compute just one.
> >
> How much of the mersenne-test is used for the FFT then?
> Isn't the FFT the single most time-consuming part?
> 
> Wouldn't it make sense to make a parallelized version of prime95 that
> distributes the FFT over N processors?

Parallel FFTs typically have three steps: 1) crunch like mad on local
data, 2) global scatter-gather (everybody trades data with everybody
else), 3) crunch like mad on local data. In the GIMPS case, you want
an inverse FFT too, so that the order is 12321. With the real FFTs that
Prime95 uses it's worse...step 3 has some global communication too.
The communicaton is much more a problem for systems with distributed
memory, but since speed is of the essence here the communication means
cache thrashing. Running two processors in parallel will speed things up,
but I would think that the speedup wouldn't be as great as running two
copies of Prime95. You could probably get some cool single machine
statistics though :)

Part of the problem here is that to know for sure you'd have to
multithread the code, which would be messy. It would be *really*
interesting to multithread things if your machine performed a context 
switch while waiting for a cache miss (in an effort to do useful work
while the CPU would otherwise be stalled), but I think only a sparc
can context switch that fast.

jasonp

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to