>>Parallel FFTs typically have three steps: 1) crunch like mad on local
>>data, 2) global scatter-gather (everybody trades data with everybody
>>else), 3) crunch like mad on local data. In the GIMPS case, you want
>>an inverse FFT too, so that the order is 12321. With the real FFTs that
>>Prime95 uses it's worse...step 3 has some global communication too.
>>The communicaton is much more a problem for systems with distributed
>>memory, but since speed is of the essence here the communication means
>>cache thrashing. Running two processors in parallel will speed things up,
>>but I would think that the speedup wouldn't be as great as running two
>>copies of Prime95. You could probably get some cool single machine
>>statistics though :)
>
>Wouldn't this help tremendously the problem of having to do enormous
>10-million digit primes?
>
>I'm guessing, a client hasn't been attempted, but I'd certainly love to
have
>one. I have way too many machines that are Dual and Quad Processor Intels
that
>could seriously take advantage of this.
Well, I'd say that the main problem with try to parallelize the FFT process
is that it ain't so easy to do, and isn't really worth it in the long run
(i.e. Run 2 copies of Prime95).
a) 99.9% of the math in Prime95 is done using assembly. And multi-threading
w/ assembly isn't exactly easy.
b) Depending on the FFT size, there a several optimizations that could be
used. (Such as radix-4 or radix-8 FFTs)
c) The cost of inter-process communications outweights any performance
gains.
A while back, I thought up a non-FFT algorithm that would be worst case O(N
log N), but I need to work on it a little more to make sure I'm not crazy...
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers