5. References.

I am indebted to the people who prepared the following references:

"Matters Computational: ideas, algorithms and source code", by J org
Arndt, see http://www.jjj.de/fxt/fxtbook.pdf

Specifically I used chapter 3. Note that I took very great care to not
refer to the source code. I found the notation used in equations such
as 19.2-6 very helpful, and this book was also the resource I used for
a description of the Matrix Fourier Algorithm and the split radix
algorithm (which did not actually end up being used by my integer
multiplication implementation).

"Primes numbers: a computational perspective", by Richard Crandall and
Carl Pomerance, 2nd ed., 2005, Springer.

Specifically I used this reference to figure out why my negacyclic
convolution didn't work (I had forgotten that the coefficients would
be signed).

Two other references which I referred to, but which did not end up
featuring in the implementation I settled on (so far), but to which I
am grateful to the authors for providing are:

"A GMP-based implementation of Schonhage-Strassen's Large Integer
Multiplication Algorithm" by Pierrick Gaudry, Alexander Kruppa and
Paul Zimmermann, ISAAC 2007 proceedings, pp 167-174. See
http://www.loria.fr/~gaudry/publis/issac07.pdf

and

"Multidigit multiplication for mathematicians" by Dan Bernstein (to
appear). see http://cr.yp.to/papers/m3.pdf

Specifically this paper helped me sort out why my radix 2/3
implementation didn't work. I would also say that the algebraic way of
thinking about the FFT which is advocated in this paper also helped
when it came to thinking up a truncation strategy. I had been looking
at this paper to work out why my radix 2/3 thing didn't work and went
for a walk to town in frustration. By the time I got back I had
basically figured out how to do truncation without radix 3. The
initial thought that set it off was to think about the FFT as breaking
into a cyclic and negacyclic half, as described in section 2 of these
notes.

Bill.

2010/1/3 Bill Hart <goodwillh...@googlemail.com>:
> Hmm, so the only reason MPIR can be beating me for small
> multiplications is general coding efficiency. My butterfly strategy
> probably introduces overheads when the coefficients are small, i.e.
> only a few limbs.
>
> The overheads could be lowered by having an iterative FFT for short
> lengths. But we should worry about that later. It will complicate the
> code too much and the existing code needs consolidation first. There's
> a lot of code duplication currently.
>
> Bill.
>
> 2010/1/3 Bill Hart <goodwillh...@googlemail.com>:
>> 2010/1/3 David Harvey <dmhar...@cims.nyu.edu>:
>>>
>>>
>>> On Jan 3, 9:03 am, Bill Hart <goodwillh...@googlemail.com> wrote:
>>>
>>>> I figured I probably did misunderstand. What I think they are trying
>>>> to get at is that one can use coefficients modulo 2^N+1 except at
>>>> certain levels because the roots of unity are actually the same. But
>>>> then the discussion becomes one of combining a Fermat and Mersenne
>>>> transform, which is the basic strategy that the implementation uses. I
>>>> got confused. I still am.
>>>
>>> I think they are saying that to multiply two integers whose product
>>> has N bits, choose a and b (as in Lemma 1 of their paper) such that a
>>> + b ~ N and then multiply modulo 2^a - 1 (use Mersenne transform) and
>>> modulo 2^b + 1 (use Fermat transform). The smoothness comes from the
>>> freedom to choose a and b.
>>>
>>
>> OK, I think I see. Their big N corresponds to an FFT for the original
>> integers, which are multiplied mod 2^N+1 in the case of a Fermat
>> transform or 2^N-1 in the case of a Mersenne transform. I think I
>> would call these negacyclic and cyclic convolutions respectively.
>>
>> But then a Mersenne transform is nothing but an ordinary FFT, no?
>>
>> The idea of combining with a negacyclic FFT seems to be one of the
>> strategies I discussed and discarded.
>>
>> I still don't understand what they mean by "power-of-2 roots of unity
>> are needed only at the "lower level"". I understand that by the "lower
>> level" they seem to mean the pointwise mults. But what is a power of 2
>> root of unity? Aren't they all powers of 2, except when using the
>> sqrt2 trick.
>>
>> Maybe I don't understand the Schonhage-Strassen algorithm correctly.
>> Perhaps they not only worked with coefficients mod 2^B+1 but also did
>> the convolution mod x^m+1. They were worried about asymptotic
>> complexity, not speed, so they wanted their algorithm to recurse so
>> that the pointwise mults could be done using the same algorithm. Gosh
>> that is exactly what the authors of this paper say!
>>
>> So my misunderstanding came from the fact that I assumed SS advocated
>> a Mersenne transform, i.e. mod x^m-1, not a Fermat transform, mod
>> x^m+1.
>>
>> Well, just to be clear. I worked mod x^m-1 because it is more
>> efficient, and in a ring mod p where p = 2^B+1, for which one must
>> perform a negacyclic convolution if one wishes to recurse, which
>> involves twiddling to get an ordinary cyclic convolution.
>>
>> Anyhow, this is all a non-strategy once you have truncation. All that
>> is required is to implement a single transform which works with
>> coefficients mod 2^B-1, which I was formerly calling a Mersenne ring,
>> when the large integers being multiplied are "small". Then you don't
>> want to recurse for doing the pointwise mults and so the pointwise
>> mults can be done mod 2^B/2-1 and 2^B/2+1 using our mpn_mulmod_2expp1
>> and mpn_mulmod_2expm1, which can do whatever is most efficient.
>>
>> Bill.
>>
>

--

You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-de...@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.


Reply via email to