On 3 Nov 2001, at 21:40, Kel Utendorf wrote:

> At 21:01 11/03/2001 -0500, George Woltman wrote:
>  >Can prime95 take advantage of SMT?  I'm skeptical.  If the FFT is
>  broken >up to run in two threads, I'm afraid L2 cache pollution will
>  negate any >advantage of SMT.  Of course, I'm just guessing - to test
>  this theory out we >should compare our throughput running 1 vs. 2
>  copies of prime95 on an >SMT machine.

I'm not sure I fully understand the way in which a SMT processor 
would utilise cache. But I can't see how the problem could be 
worse than running two copies of a program on a SMP system. 
This seems to work fairly well in both Windows and linux regimes 
(attatching a thread to a processor and therefore its associated 
cache, rigidly in the case of Windows, loosely but intelligently in 
the case of linux).

If an SMT processor has a unified cache, cache pollution should 
surely be not too much of a problem? Running one copy & thereby 
getting benefit of the full cache size would run that one copy faster, 
(just as happens with SMP systems where memory bandwidth can 
be crucial) but the total throughput with two copies running would 
surely be greater. Especially on a busy system, where two threads 
get twice as many timeslices as one!

If there is some way in which the FFT could be broken down into 
roughly equal sized chunks, it _might_ be worth synchronizing two 
streams so that e.g. transform in on one thread was always in 
parallel with transform out on the other, and vice versa. Obviously 
you'd need to be running on two different exponents but using the 
same FFT length to gain from this technique. Whether this would 
be any better than running unsynchronized would probably require 
experimentation.
> 
> Could things be setup so that factoring and LL-testing went on 
> "simultaneously?"  This would speed up the overall amount of work
> being done.

Because trial factoring, or P-1/ECM on _small_ exponents, have a 
very low memory bus loading, running a LL test and factoring in 
parallel on a dual-processor SMP system makes a lot of sense. I 
suspect the same situation would apply in an SMT environment.

The "problem" of mass deployment (almost everyone in this 
position, instead of only a few of us) is that there is a great deal of 
LL testing effort required in comparison to trial factoring, so running 
two LL tests in parallel but inefficiently would bring us to 
"milestones" faster than the efficient LL/trial factoring split.


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to