Hi,

On Friday 01 Mar 2002 21:22, Brian J. Beesley wrote:
> [... snip ...]
>

> > The memory bottleneck was the first thing I thought, and I was near to
> > discard the idea when I realized that the trig bata would be the same,
> > and the required memory access would be less than double the single
> > stream scheme. If a double stream version cost less than double the
> > single one the we can speed up the project a bit.
>
> On Friday 01 March 2002 00:37, George Woltman wrote:
> > Well, that would be true if SSE2 had a multiply vector by scalar
> > instruction. That is, to multiply two values by the same trig value, you
> > must either load two copies the trig value or add instructions to copy
> > the value into both halves
> > of the SSE2 register.
>
> I can't see that being a major problem. Surely there's only one main memory
> fetch to load the two halves of the SSE2 register with the same value, and
> surely the loads can be done in parallel since there's no interaction.
> ( M -> X; then X -> R1 & X -> R2 in parallel, where X is one of the
> temporary registers available to the pipeline)
>

We would have to evaluate the cost of memory traffic to load data with two 
halves the same, or load two differnt data and then double them in two XMM 
registers. I have not any skill in SSE2, no machine to try.  This morning 
I've been reading (on the fly) the intel PDF manual, and I saw that the SSE2 
was made by Intel engineers thinking more in multimedia than in Mathematics 
(or GIMPS) . There are some elemental ops they could be implemented to do 
complex number multiplication easy, or a vector by escalar mul, or an 
exchange within halves .... :-(.  Perhaps in SSE3 :)

> On Thursday 28 February 2002 21:20, Steinar H. Gunderson wrote:
> > Testing a number in parallel with itself is obviously a bad idea if there
> > occurs an undetected error. :-)
>
> Sure. But the only way there would be a problem here (given that the data
> values are independent because of the different random offsets) is if there
> was a major error like miscounting the number of iterations. This is
> relatively easy to test out.
>
> I'm sort of marginally uneasy, rather than terrified, about running a
> double-check in parallel with the first test on the same system at the same
> time. Also, I think most people would rather complete one assignment in
> time T rather than two assignments in time 2T with both results unknown
> till they both complete. Against this is that Guillermo's suggestion does
> something to counter the relatively low rate at which DCs are completed.
>

I also was worried about that idea, but every time I think, it seems less 
absurd to me. 

OTOH, I don't know how difficult would be the carry and normalization code of 
DWT for two _different_ exponents. At first approximation, I recall some code 
I wrote without branches for Glucas, actually a code which makes two streams 
at once. I mean perhaps the cost is small.

Regards.

Guillermo.
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to