Hi all: As all we know, P4 does L-L test at 'light speed' thanks to SSE2 instructions, better memory management and, over all, to G. Woltman.
I had no chance at the moment to optimize Glucas for SSE2 because I had no access to a machine with it and to say the truth, because lastly I have no many spare time. This week I read there are already some prototypes of AMD Hammer's family processors running demos. This new processors will include SSE2 extensions and so compatibility with P4 code (I think). It would be nice to see how fast it will run Prime95. Back to the subject, I'm wondering about how fast can we do two L-L test in parallel using this SSE2 extensions. Basically, I'm thinking in use two nearest exponents with the same FFT-length. The memory access in FFT phase would be the same, the trig data also the same, the most difficult part would be the carry-and-normalization pass. Most of the code for a single L-L test could be reused with small modifications using the basic float type (double,double) for SSE2 instead of (double) for normal code. As result, we would get a L-L result and some seconds (iterations) after a new L-L result. Comments? Suggestions?. Regards. Guillermo. _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
