Mersenne Digest Saturday, August 28 1999 Volume 01 : Number 619 ---------------------------------------------------------------------- Date: Tue, 24 Aug 1999 22:20:50 EDT From: [EMAIL PROTECTED] Subject: Mersenne: targeting multiply/add (Was: K7 vs. x86) Hi, Jason: >Or, if you just want to cheapen a complex multiply, turn the standard >form > > xr,xi * yr,ri -> xr*yr-xi*yi, xr*yi+xi*yr > >into > > xr,xi * yr,ri -> xr*(yr - xi/xr * yi), xr*(yi + xi/xr * yr) > >and precompute xr and xi/xr. Presto! 2 multiplies and 2 multiply-adds. An excellent suggestion - why rely on the compiler perhaps being smarter than it really is, when you drop a broad hint with very little modification of the normal code? I tried the above in just the forward radix-16 FFT loop of my code, replacing sines with tangents in the precomputation of the FFT sincos data. No timing change on the MIPS R10K, indicating that the MIPSPro f90 compiler was likely already doing such a replacement for me. But on the Alpha 21064 (which has no MADD instruction) my times for large FFT lengths dropped about 5%! (I expect another 5% when I modify the inverse FFT similarly). Weird, but welcome. Any ideas how the above replacement might improve pipelineability of a twiddle-multiply/add/subtract sequence, assuming just FMUL and FADD are available? Onward and Upward, - -Ernst _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 25 Aug 1999 05:06:11 +0200 (MET DST) From: [EMAIL PROTECTED] Subject: Re: Mersenne: targeting multiply/add (Was: K7 vs. x86) Ernst Mayer writes: - - Hi, Jason: - - - - >Or, if you just want to cheapen a complex multiply, turn the standard - - >form - - > - - > xr,xi * yr,ri -> xr*yr-xi*yi, xr*yi+xi*yr - - > - - >into - - > - - > xr,xi * yr,ri -> xr*(yr - xi/xr * yi), xr*(yi + xi/xr * yr) - - > - - >and precompute xr and xi/xr. Presto! 2 multiplies and 2 multiply-adds. - - - - An excellent suggestion - why rely on the compiler perhaps being smarter - - than it really is, when you drop a broad hint with very little modification - - of the normal code? - - - - I tried the above in just the forward radix-16 FFT loop of my code, replacing - - sines with tangents in the precomputation of the FFT sincos data. No timing - - change on the MIPS R10K, indicating that the MIPSPro f90 compiler was likely - - already doing such a replacement for me. But on the Alpha 21064 (which has - - no MADD instruction) my times for large FFT lengths dropped about 5%! - - (I expect another 5% when I modify the inverse FFT similarly). - - Weird, but welcome. Any ideas how the above replacement might improve - - pipelineability of a twiddle-multiply/add/subtract sequence, assuming just - - FMUL and FADD are available? Compared to the straightforward coding, Jason's formula uses the same instruction count but in a different order. Assume yr, yi, and xr, xi (or xr, xi/xr) are in registers, and we want the result to go into registers (zr, zi). On a machine with multiply and multiply-(add/subtract/reverse subtract) available, the translations might be Standard Jason temp1 = xi*yi temp1 = yr - (xi/xr)*yi temp2 = xi*yr temp2 = yi + (xi/xr)*yr zr = xr*xr - temp1 zr = xr*temp1 zi = xr*yi + temp2 zi = xr*temp2 Jason's method needs fewer temporaries on a machine with separate add and multiply instructions. The same register might be used for (xi/xr)*yi, temp1, and zr. The standard method needs separate temporaries for xr*yr and xi*yi. Jason's method lets the add instructions start earlier, during the twiddle computation -- the standard method saturates the multipliers. Jason's method may let the multiplies needed for zr and zi be combined with additions later in the code, on architectures with multiply-add. When (xr, xi) denotes cos(theta) + i*sin(theta), Jason's method needs tables with cos(theta) and tan(theta). The standard method lets one use sin(theta) = cos(pi/2 - theta) to reduce table space for trigonometric functions. Jason's method needs special casing if xr might be zero, in which case (xr, xi) is pure imaginary. I am not sure which method has more potential round-off error. Peter Montgomery [EMAIL PROTECTED] _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 25 Aug 1999 09:53:24 +0200 From: Laurent Desnogues <[EMAIL PROTECTED]> Subject: Mersenne: Regarding: multiply/add Jason Stratos Papadopoulos wrote: > > There was a paper in (I believe) Appl. Math and Comp. from about two > years ago that reformulated radix-2,3,4 and 5 FFT butterfiles to use > multiply-adds wherever possible. The savings can be impressive: a radix-2 > FFT butterfly has 4 multiplies and six adds, but this boils down to > six multiply-adds. Can't the RS6000 do two multiply-adds per clock? I have found a paper titled "Implementation of Efficient FFT Algorithms on Fused Multiply-Add Architectures" by E. Linzer and E. Feig, IEEE Trans. on Signal Processing, vol. 41, no 1, Jan 93. They call their method scaled FFT. For a radix-8, with size 8, Cooley-Tukey FFT has 56 m/a ops and their method has 52 m/a ops. The number of m/a ops for radix 8 is: - Cooley-Tukey: 7/2 n log_2(n) - 57/14 n + 32/7 - scaled FFT: 11/4 n log_2(n) - 57/28 n + 16/7. Regarding the errors (root-mean-square error per point): DFT inverse DFT - Cooley-Tukey: 5.3500 10^-15 3.3999 10^-15 - scaled FFT: 5.3441 10^-15 3.2386 10^-15 They say the impovement in error is probably due to the fast that a m/a op is more precise than a mult followed by an add. Their experiments were done on an RS/6000 model 530. I can't comment on that article, since I'm not mathematically-gifted enough to just read it without a lot of time with a pencil! Laurent _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 25 Aug 1999 21:14:23 EDT From: [EMAIL PROTECTED] Subject: Mersenne: Compaq dumps NT-on-Alpha Check out www.cnn.com/tech/computing/9908/24/nt.dump.idg/index.html I wonder whether this tells us more about WinNT, the Alpha platform's future, or Compaq's financial state? - -Ernst _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 25 Aug 1999 22:40:55 -0400 (EDT) From: Chip Lynch <[EMAIL PROTECTED]> Subject: Re: Mersenne: Compaq dumps NT-on-Alpha This confuses me... slashdot carried this story in the last 48 hours (at least I think it was them; I can't find the link now), but I thought it was very clear that CompaQ was dropping only the 32-Bit Windows dvelopment from Alpha, and KEEPING 64-Bit. The CNN story definitely leads you to believe that ALL NT support is being dropped. As much as I applaud forcing Microsoft to evolve by threatening them with rival operating systems, I can't imagine Compaq could completely drop NT support from the Alpha and not lost major revenue. Anyone closer to this know the story? - ---Chip On Wed, 25 Aug 1999 [EMAIL PROTECTED] wrote: > Check out > > www.cnn.com/tech/computing/9908/24/nt.dump.idg/index.html > > I wonder whether this tells us more about WinNT, the Alpha platform's future, > or Compaq's financial state? > > -Ernst > _________________________________________________________________ > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm > Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers > \\ ^ // (o o) ---oOO--(_)--OOo------------------------------------ | Chip Lynch | Computer Guru | | [EMAIL PROTECTED] | | | (703) 465-4176 (w) | (202) 362-7978 (h) | ---------------------------------------------------- _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 25 Aug 1999 20:13:08 -0700 From: "John R Pierce" <[EMAIL PROTECTED]> Subject: Re: Mersenne: Compaq dumps NT-on-Alpha > This confuses me... slashdot carried this story in the last 48 hours (at > least I think it was them; I can't find the link now), but I thought it > was very clear that CompaQ was dropping only the 32-Bit Windows dvelopment > from Alpha, and KEEPING 64-Bit. The CNN story definitely leads you to > believe that ALL NT support is being dropped. That was the spin of the various stories a few days ago. And yes, news.com has a similar story today saying its apparently ALL nt/alpha development. > As much as I applaud forcing Microsoft to evolve by threatening them with > rival operating systems, I can't imagine Compaq could completely drop NT > support from the Alpha and not lost major revenue. Anyone closer to this > know the story? It appears Compaq is bleeding red ink, and Alpha/NT is not very profitable for them short term. Further, in effect, by putting Alpha 64 bit development resources, they are really helping out Intel's IA-64 Merced by insuring there will be a 64 bit NT/Win2000 ready when Merced actually rolls out. - -jrp _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Thu, 26 Aug 1999 08:21:34 -0600 From: "Aaron Blosser" <[EMAIL PROTECTED]> Subject: RE: Mersenne: Compaq dumps NT-on-Alpha > > As much as I applaud forcing Microsoft to evolve by threatening > them with > > rival operating systems, I can't imagine Compaq could completely drop NT > > support from the Alpha and not lost major revenue. Anyone > closer to this > > know the story? > > It appears Compaq is bleeding red ink, and Alpha/NT is not very profitable > for them short term. Further, in effect, by putting Alpha 64 bit > development resources, they are really helping out Intel's IA-64 Merced by > insuring there will be a 64 bit NT/Win2000 ready when Merced > actually rolls > out. This has me boggled as well. At the Compaq conference a few weeks ago (at which Capellas, the new CEO, gave the opening remarks), there was no indication of this whatsoever. The Alphas they had setup were running Tru64 for the most part, but they did have a couple Alpha systems running Win2K, showing those off. Hmm... Once I get my CD covering the conference events, I'll have to go over his opening remarks and look for clues that this would happen. I do know that Capellas intended to shake things up at Compaq, but this seems so bizarre. Aaron _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Thu, 26 Aug 1999 10:25:53 -0500 From: Brian Pinto <[EMAIL PROTECTED]> Subject: RE: Mersenne: Compaq dumps NT-on-Alpha This is the link that was posted on slashdot: http://www.zdnet.com/pcweek/stories/news/0,4153,2318096,00.html Slashdot's story was "Ixnay WinNT on Alpha" and can be found here: (http://slashdot.org/article.pl?sid=99/08/20/204212&mode=thread) > -----Original Message----- > From: John R Pierce [SMTP:[EMAIL PROTECTED]] > Sent: Wednesday, August 25, 1999 10:13 PM > To: Mersenne discussion list > Subject: Re: Mersenne: Compaq dumps NT-on-Alpha > > > This confuses me... slashdot carried this story in the last 48 hours > (at > > least I think it was them; I can't find the link now), but I thought > it > > was very clear that CompaQ was dropping only the 32-Bit Windows > dvelopment > > from Alpha, and KEEPING 64-Bit. The CNN story definitely leads you > to > > believe that ALL NT support is being dropped. > > That was the spin of the various stories a few days ago. And yes, > news.com > has a similar story today saying its apparently ALL nt/alpha > development. > > > As much as I applaud forcing Microsoft to evolve by threatening them > with > > rival operating systems, I can't imagine Compaq could completely > drop NT > > support from the Alpha and not lost major revenue. Anyone closer to > this > > know the story? > > It appears Compaq is bleeding red ink, and Alpha/NT is not very > profitable > for them short term. Further, in effect, by putting Alpha 64 bit > development resources, they are really helping out Intel's IA-64 > Merced by > insuring there will be a 64 bit NT/Win2000 ready when Merced actually > rolls > out. > > -jrp > > _________________________________________________________________ > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm > Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Sat, 28 Aug 1999 18:24:42 EDT From: [EMAIL PROTECTED] Subject: Mersenne: How about some C code? Ok, I've cleaned up the Mlucas 2.6 source code and I think we're ready to take a crack at a C version. The revised source (which also includes a fix for a bug which David Willmore brought to my attention - don't worry, it can only cause the code to quit unexpectedly when it starts a new exponent as part of a multi-exponent range test - any actual test results should be fine) is at ftp://209.133.33.182/pub/mayer/Mlucas_2.6c.f90.gz and Alpha and MIPS executables are also available in the .../pub/mayer/bin directory at the above site. 2.6c also has some slightly improved memory management - it's 5-10% faster on my Alpha 21064 and 21164 than 2.6b was. With v2.5, Gord Palameta was able (using an automated translator and subsequent manual work) to get a decent C version in a remarkably short time; Brian Beuning also created one, but we never did any head-to-head comparisons of the two resulting codes. This time, I'd like to keep some coherency to any such effort, so once I see who offers to help, we can discuss how distribute the labor. Of course anyone is free to go it alone, but I will ask Conrad Curry, who maintains the GIMPS "other available source code" page to only post a link to a C version if it gets my stamp of approval. Potential benefits of an Mlucas.c: 1) Will run on any Unix platform; no special compiler should be needed (though we need to find the best C compiler for each platform); 2) Will make it easier to interface with the eventual PrimeNet for Unix; 3) Will make it easier to interface with Peter Montgomery's Mfactor code; 4) Will allow us to try neat tricks like inlining cache-bypassing stores (if the architecture supports such an operation) which might really help speed the code on systems with small caches. With that said, are there any volunteers? I'll of course be around to help explain what-this-and-that in the F90 code means, but will be putting most of my near-term effort into a P-1 factorer geared for large exponents. Cheers, Ernst _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ End of Mersenne Digest V1 #619 ******************************