Hi all:

As all we know, P4 does L-L test at 'light speed' thanks to SSE2 
instructions, better memory management and, over all,  to G. Woltman.

I had no chance at the moment to optimize Glucas for SSE2 because I had no 
access to a machine with it and to say the truth, because lastly I have no 
many spare time.

This week I read there are already some  prototypes of  AMD Hammer's  family 
processors running demos. This new processors will include SSE2 extensions 
and so compatibility with P4 code (I think). It would be nice to see how fast 
it will run Prime95.

Back to the subject, I'm wondering about how fast can we do two L-L test in 
parallel using this SSE2 extensions. Basically, I'm thinking in use two 
nearest exponents with the same FFT-length. The memory access in FFT phase 
would be the same, the trig data also the same, the most difficult part would 
be the carry-and-normalization pass.  Most of the code for a single L-L test 
could be reused with small modifications using the basic float type 
(double,double) for SSE2 instead of (double) for normal code.  As result, we 
would get a L-L result and some seconds (iterations) after a new L-L result.

Comments? Suggestions?.

Regards.

Guillermo.
   
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to