On Thursday 28 February 2002 10:18, Guillermo Ballester Valor wrote:
[... snip ...]
> Back to the subject, I'm wondering about how fast can we do two L-L test in
> parallel using this SSE2 extensions. Basically, I'm thinking in use two
> nearest exponents with the same FFT-length. The memory access in FFT phase
> would be the same, the trig data also the same, the most difficult part
> would be the carry-and-normalization pass.  Most of the code for a single
> L-L test could be reused with small modifications using the basic float
> type (double,double) for SSE2 instead of (double) for normal code.  As
> result, we would get a L-L result and some seconds (iterations) after a new
> L-L result.

I'd need some convincing that this would be any better than George's method.

What you're saying is that, with two parallel streams, you'd run one 
assignment in one stream and a second assignment in the other. What the 
Prime95 SSE2 code does is to run only one assignment at a time, with odd 
numbered FFT elements being processed in one stream and adjacent even 
numbered FFT elements being processed in the other.

The difference here is that your method generates memory bus traffic at twice 
the rate. George's method takes advantage of the fact that (with properly 
aligned operands) fetching the "odd" element data automatically fetches the 
adjacent "even" element data. 

Memory bandwidth is a serious contraint here. I think you need to demonstrate 
that your suggested method has some _big_ advantage, because something major 
is going to be needed to offset the inefficiency caused by the memory 
bottleneck.

Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to