5. References. I am indebted to the people who prepared the following references:
"Matters Computational: ideas, algorithms and source code", by J org Arndt, see http://www.jjj.de/fxt/fxtbook.pdf Specifically I used chapter 3. Note that I took very great care to not refer to the source code. I found the notation used in equations such as 19.2-6 very helpful, and this book was also the resource I used for a description of the Matrix Fourier Algorithm and the split radix algorithm (which did not actually end up being used by my integer multiplication implementation). "Primes numbers: a computational perspective", by Richard Crandall and Carl Pomerance, 2nd ed., 2005, Springer. Specifically I used this reference to figure out why my negacyclic convolution didn't work (I had forgotten that the coefficients would be signed). Two other references which I referred to, but which did not end up featuring in the implementation I settled on (so far), but to which I am grateful to the authors for providing are: "A GMP-based implementation of Schonhage-Strassen's Large Integer Multiplication Algorithm" by Pierrick Gaudry, Alexander Kruppa and Paul Zimmermann, ISAAC 2007 proceedings, pp 167-174. See http://www.loria.fr/~gaudry/publis/issac07.pdf and "Multidigit multiplication for mathematicians" by Dan Bernstein (to appear). see http://cr.yp.to/papers/m3.pdf Specifically this paper helped me sort out why my radix 2/3 implementation didn't work. I would also say that the algebraic way of thinking about the FFT which is advocated in this paper also helped when it came to thinking up a truncation strategy. I had been looking at this paper to work out why my radix 2/3 thing didn't work and went for a walk to town in frustration. By the time I got back I had basically figured out how to do truncation without radix 3. The initial thought that set it off was to think about the FFT as breaking into a cyclic and negacyclic half, as described in section 2 of these notes. Bill. 2010/1/3 Bill Hart <goodwillh...@googlemail.com>: > Hmm, so the only reason MPIR can be beating me for small > multiplications is general coding efficiency. My butterfly strategy > probably introduces overheads when the coefficients are small, i.e. > only a few limbs. > > The overheads could be lowered by having an iterative FFT for short > lengths. But we should worry about that later. It will complicate the > code too much and the existing code needs consolidation first. There's > a lot of code duplication currently. > > Bill. > > 2010/1/3 Bill Hart <goodwillh...@googlemail.com>: >> 2010/1/3 David Harvey <dmhar...@cims.nyu.edu>: >>> >>> >>> On Jan 3, 9:03 am, Bill Hart <goodwillh...@googlemail.com> wrote: >>> >>>> I figured I probably did misunderstand. What I think they are trying >>>> to get at is that one can use coefficients modulo 2^N+1 except at >>>> certain levels because the roots of unity are actually the same. But >>>> then the discussion becomes one of combining a Fermat and Mersenne >>>> transform, which is the basic strategy that the implementation uses. I >>>> got confused. I still am. >>> >>> I think they are saying that to multiply two integers whose product >>> has N bits, choose a and b (as in Lemma 1 of their paper) such that a >>> + b ~ N and then multiply modulo 2^a - 1 (use Mersenne transform) and >>> modulo 2^b + 1 (use Fermat transform). The smoothness comes from the >>> freedom to choose a and b. >>> >> >> OK, I think I see. Their big N corresponds to an FFT for the original >> integers, which are multiplied mod 2^N+1 in the case of a Fermat >> transform or 2^N-1 in the case of a Mersenne transform. I think I >> would call these negacyclic and cyclic convolutions respectively. >> >> But then a Mersenne transform is nothing but an ordinary FFT, no? >> >> The idea of combining with a negacyclic FFT seems to be one of the >> strategies I discussed and discarded. >> >> I still don't understand what they mean by "power-of-2 roots of unity >> are needed only at the "lower level"". I understand that by the "lower >> level" they seem to mean the pointwise mults. But what is a power of 2 >> root of unity? Aren't they all powers of 2, except when using the >> sqrt2 trick. >> >> Maybe I don't understand the Schonhage-Strassen algorithm correctly. >> Perhaps they not only worked with coefficients mod 2^B+1 but also did >> the convolution mod x^m+1. They were worried about asymptotic >> complexity, not speed, so they wanted their algorithm to recurse so >> that the pointwise mults could be done using the same algorithm. Gosh >> that is exactly what the authors of this paper say! >> >> So my misunderstanding came from the fact that I assumed SS advocated >> a Mersenne transform, i.e. mod x^m-1, not a Fermat transform, mod >> x^m+1. >> >> Well, just to be clear. I worked mod x^m-1 because it is more >> efficient, and in a ring mod p where p = 2^B+1, for which one must >> perform a negacyclic convolution if one wishes to recurse, which >> involves twiddling to get an ordinary cyclic convolution. >> >> Anyhow, this is all a non-strategy once you have truncation. All that >> is required is to implement a single transform which works with >> coefficients mod 2^B-1, which I was formerly calling a Mersenne ring, >> when the large integers being multiplied are "small". Then you don't >> want to recurse for doing the pointwise mults and so the pointwise >> mults can be done mod 2^B/2-1 and 2^B/2+1 using our mpn_mulmod_2expp1 >> and mpn_mulmod_2expm1, which can do whatever is most efficient. >> >> Bill. >> > -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-de...@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.