On Thursday 28 February 2002 10:18, Guillermo Ballester Valor wrote: [... snip ...] > Back to the subject, I'm wondering about how fast can we do two L-L test in > parallel using this SSE2 extensions. Basically, I'm thinking in use two > nearest exponents with the same FFT-length. The memory access in FFT phase > would be the same, the trig data also the same, the most difficult part > would be the carry-and-normalization pass. Most of the code for a single > L-L test could be reused with small modifications using the basic float > type (double,double) for SSE2 instead of (double) for normal code. As > result, we would get a L-L result and some seconds (iterations) after a new > L-L result.
I'd need some convincing that this would be any better than George's method. What you're saying is that, with two parallel streams, you'd run one assignment in one stream and a second assignment in the other. What the Prime95 SSE2 code does is to run only one assignment at a time, with odd numbered FFT elements being processed in one stream and adjacent even numbered FFT elements being processed in the other. The difference here is that your method generates memory bus traffic at twice the rate. George's method takes advantage of the fact that (with properly aligned operands) fetching the "odd" element data automatically fetches the adjacent "even" element data. Memory bandwidth is a serious contraint here. I think you need to demonstrate that your suggested method has some _big_ advantage, because something major is going to be needed to offset the inefficiency caused by the memory bottleneck. Regards Brian Beesley _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
