Hi, On Friday 01 Mar 2002 21:22, Brian J. Beesley wrote: > [... snip ...] >
> > The memory bottleneck was the first thing I thought, and I was near to > > discard the idea when I realized that the trig bata would be the same, > > and the required memory access would be less than double the single > > stream scheme. If a double stream version cost less than double the > > single one the we can speed up the project a bit. > > On Friday 01 March 2002 00:37, George Woltman wrote: > > Well, that would be true if SSE2 had a multiply vector by scalar > > instruction. That is, to multiply two values by the same trig value, you > > must either load two copies the trig value or add instructions to copy > > the value into both halves > > of the SSE2 register. > > I can't see that being a major problem. Surely there's only one main memory > fetch to load the two halves of the SSE2 register with the same value, and > surely the loads can be done in parallel since there's no interaction. > ( M -> X; then X -> R1 & X -> R2 in parallel, where X is one of the > temporary registers available to the pipeline) > We would have to evaluate the cost of memory traffic to load data with two halves the same, or load two differnt data and then double them in two XMM registers. I have not any skill in SSE2, no machine to try. This morning I've been reading (on the fly) the intel PDF manual, and I saw that the SSE2 was made by Intel engineers thinking more in multimedia than in Mathematics (or GIMPS) . There are some elemental ops they could be implemented to do complex number multiplication easy, or a vector by escalar mul, or an exchange within halves .... :-(. Perhaps in SSE3 :) > On Thursday 28 February 2002 21:20, Steinar H. Gunderson wrote: > > Testing a number in parallel with itself is obviously a bad idea if there > > occurs an undetected error. :-) > > Sure. But the only way there would be a problem here (given that the data > values are independent because of the different random offsets) is if there > was a major error like miscounting the number of iterations. This is > relatively easy to test out. > > I'm sort of marginally uneasy, rather than terrified, about running a > double-check in parallel with the first test on the same system at the same > time. Also, I think most people would rather complete one assignment in > time T rather than two assignments in time 2T with both results unknown > till they both complete. Against this is that Guillermo's suggestion does > something to counter the relatively low rate at which DCs are completed. > I also was worried about that idea, but every time I think, it seems less absurd to me. OTOH, I don't know how difficult would be the carry and normalization code of DWT for two _different_ exponents. At first approximation, I recall some code I wrote without branches for Glucas, actually a code which makes two streams at once. I mean perhaps the cost is small. Regards. Guillermo. _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
