Re: [mpir-devel] A new FFT for the New Year

Bill Hart Sat, 02 Jan 2010 11:09:17 -0800

I've done some more timings. This time I turned off MPIR's sqrt2 trick
which I don't have yet.


For some multiplications this causes MPIR to take way too long due to
tuning issues. For example, there are points where larger numbers take
less time to multiply.

I made a comparison showing how much faster the new code is, compared
to MPIR, with a negative percentage when MPIR is faster. I've also
omitted all percentages where the two times are within 5% of each
other, which is probably below the error threshold for these timings
anyway. For MPIR times, if a later time is lower I used that when
computing the percentages, as technically this effect could be tuned
away.

As you can see, the new code is up to 100% faster, and over quite a
large range has a big advantage over the MPIR FFT. Eventually MPIR
catches up.

For smaller multiplications MPIR has a distinct advantage which I will
discuss later, but even so, rarely is it more than a few percent
faster in this range!

519168/10000: 29.5s / 33.9s (14.9%)

584064/10000: 35.5 / 34.1s
648960/1000: 4.0s / 3.9s
713856/1000: 4.4s / 4.3s (-11.4%)
778752/1000: 4.7s / 5.5s (17.0%)
843648/1000: 5.4s / 5.3s
908544/1000: 5.7s / 7.6s
973440/1000: 6.1s / 5.9s
1038336/1000: 6.4s / 6.7s

1297920/1000: 8.7s / 9.0s
1557504/1000: 10.1s / 12.7s (25.7%)
1817088/1000: 12.5s / 15.8s  (26.4%)
2084864/1000 : 13.9s / 15.2s (9.4%)

2345472/1000 : 16.5s / 16.6s
2606080/1000 : 18.5s / 18.2s
2866688/1000 : 20.1s / 19.9s
3127296/1000 : 21.6s / 26.8s (24%)
3387904/1000 : 23.7s / 28.2s (15.2%)
3387904/1000: 25.3s / 30.5s (7.9%)
3909120/1000: 26.9s / 27.3s
4169728/1000: 28.4s / 35.4s (24.6%)

5212160/1000: 43.0s / 40.6s (-5.6%)
6254592/100: 5.0s / 6.5s (30%)
7297024/100: 6.1s / 7.0s (14.8%)
8364032/100:  6.6s / 8.7s (31.8%)

9409536/100:  9.4s / 17.8s (60.6%)
10455040/100:  10.3s / 19.6s (46.6%)
11500544/100: 11.3s / 20.5s (33.6%)
12546048/100: 12.8s / 15.1s (18.0%)
13591552/100: 13.9s / 16.6s (10.1%)
14637056/100: 14.8s / 15.3s
15682560/100: 15.9s / 24.7s (35.8%)
167280641/100: 16.0s / 21.6s (35%)

20910080/100: 21.7s / 23.5s (8.3%)
25092096/100: 25.5s / 37.8s (48.2%)
29274112/100: 29.6s / 45.9s (55.1%)
33456128/100: 35.2s / 50.7s (44.0%)

37684224/10: 4.7s / 8.7s (48.9%)
41871360/10: 5.1s / 9.4s (37.3%)
46058496/10: 5.6s / 7.0s (25.0%)
50245632/10: 6.0s / 7.9s (31.7%)
54432768/10: 6.6s / 8.2s (24.2%)
58619904/10: 7.0s / 8.0s (14.3%)
62807040/10: 7.5s / 11.6s (54.7%)
66994176/10: 8.3s / 16.7s (101.2%)

83742720/10: 10.4s / 16.9s (62.5%)
100491264/10: 12.3s / 16.4s (33.3%)
117239808/10: 14.8s / 17.2s (16.2%)
134103040/10: 16.9s / 32.2s (90.5%)

150865920/10: 23.3s / 32.2s (38.2%)
167628800/10: 26.3s / 33.8s (28.5%)
184391680/10: 28.7s / 34.0s (18.5%)
201154560/10: 31.1s / 40.6s (30.5%)
217917440/10: 32.9s / 37.3s (13.4%)
234680320/10: 35.0s / 37.8s (8.0%)
251443200/10: 38.8s / 39.8s

268206080/1: 4.2s / 5.7s (35.7%)
335257600/1: 5.5s / 6.4s (16.4%)
402309120/1: 6.3s / 7.9s (25.4%)
469360640/1: 7.5s / 8.7s (16.0%)

536608768/1: 8.4s / 9.9s (17.9%)
603684864/1: 11.7s / 10.5s (-10.3%)
670760960/1: 13.3s / 12.6s (-5.3%)
737837056/1: 14.1s / 13.3s (-5.7%)
804913152/1: 15.7s / 15.9s
871989248/1: 17.2s / 16.6s
939065344/1: 17.9s / 17.5s
1006141440/1: 19.0s / 17.8s (-6.3%)

1073217536/1: 20.7s / 20.6s
1341521920/1: 26.8s / 25.3s (-5.6%)
1609826304/1: 30.8s / 31.2s
1878130688/1: 36.8s / 37.4s

2146959360/1: 41.1s / 41.9s

Bill.

2010/1/1 Bill Hart <goodwillh...@googlemail.com>:
> 2. FFT's for nitwits (like me)
> ====================
>
> I will explain why I am a nitwit in a later section. But to understand
> that, you need to understand some FFT basics first. So I describe them
> here.
>
> 2a. Description of the FFT
> ===================
>
> An FFT is not an integer multiplication routine per se, but it can be
> used to construct one.
>
> The first step in multiplying huge integers A and B is to view the
> problem as a polynomial multiplication. Write your large integers in
> some base D (usually D = 2^r for some r), i.e. write A = f1(D), B =
> f2(D) for polynomials f1 and f2 whose coefficients are in the range
> [0, D).
>
> Next, multiply the polynomials g(x) = f1(x)*f2(x).
>
> Finally, evaluate the product polynomial at D, i.e. C = A*B = f1(D)*f2(D).
>
> So from now on I'll talk about large integer multiplication as though
> it is polynomial multiplication.
>
> Polynomial multiplication is computed using convolution of the two
> polynomials. (Cyclic) convolution of polynomials is defined (up to
> scaling) as multiplication of the polynomials modulo x^n - 1 for some
> n. Note that if the degree of g is less than n then g(x) can be
> recovered precisely from the convolution.
>
> The FFT can be used to efficiently compute the convolution.
> Specifically convolution of two polynomials f1 and f2 can be computed
> as FFT^(-1)(FFT(f1).FFT(f2)) where the dot is the coordinate-wise
> product of vectors. The inverse of an FFT can be computed (up to
> scaling) by an IFFT. Thus for convolution modulo x^n - 1 where n =
> 2^k, the convolution can be computed as 2^(-k)*IFFT(FFT(f1).FFT(f2)),
> where all the operations are on vectors of coefficients. In particular
> the coordinate-wise product involves multiplication of coefficients
> which are in [0,D). These products are known as pointwise
> multiplications.
>
> You should view the FFT as evaluating the polynomial at n = 2^k
> different roots of unity, 1, z, z^2, ...., z^{2k-1}, where z is a
> primitive 2^k-th root of unity. The pointwise products multiply such
> values, effectively giving you the product polynomial g (modulo x^n -
> 1) evaluated at each of these roots of unity. The coefficients of the
> polynomial g are retrieved by inverting the FFT process, i.e. by
> interpolating the vector of pointwise products.
>
> So how does the FFT actually work?
>
> We start with a polynomial modulo x^n - 1 where n = 2^k. To save on
> notation, let's write x^(2m) - 1 where m = 2^(k-1).
>
> We write x^(2m) - 1 = (x^m - 1)(x^m + 1). If we wish to multiply
> polynomials modulo x^(2m) - 1, our first step is to express our
> polynomials modulo x^m - 1 and x^m + 1. We then compute the products
> mod x^m - 1 and mod x^m + 1 and recombine using CRT.
>
> Let f(x) = b(x)*x^m + a(x) where the degree of  a(x) and b(x) are less
> than m, then f(x) modulo x^m - 1 is just a(x) + b(x), i.e. we simply
> add coefficients in pairs. Similarly modulo x^m + 1 the polynomial
> would be a(x) - b(x).
>
> Taking the product of polynomials modulo x^m - 1 can then be computed
> recursively using the same algorithm, i.e. by a convolution modulo x^m
> - 1/
>
> To compute the product modulo x^m + 1 we make a substitution y = z*x
> where z is a primitive 2^k-th root of unity. This transforms the
> problem into one modulo y^m - 1, which can again be done recursively
> using the same algorithm. Note that substituting y = z*x is equivalent
> to multiplying the coefficient x^i of f(x) by z^i. So we twiddle the
> coefficients by roots of unity.
>
> A convolution mod x^m - 1 is called a cyclic convolution, and a
> convolution mod x^m + 1 is called a negacyclic convolution. So the FFT
> algorithm works by breaking the cyclic convolution into a cyclic
> convolution of half the size and a negacyclic convolution of half the
> size and transforming the negacyclic convolution into a cyclic
> convolution by twiddling by roots of unity.
>
> In practice the evaluation step, adding or subtracting coefficients in
> pairs, and the twiddle step for the negacyclic half, are combined.
> This leads to the familiar FFT "butterfly" which replaces the left
> half of a vector of coefficients of f(x), [a0, a1, a2, ...., a{m-1},
> b0, b1, ..., b{m-1}] with [a0+b0, a1+b1, ...., a{m-1}+b{m-1}] and the
> right half by [a0-b0, z(a1-b1), z^2*(a2-b2),...,
> z^{m-1}*(a{m-1}-b{m-1}]. The butterfly step can be conveniently
> written [a{i}, b{i}] => [a{i}+b{i}, z^i*(a{i}-b{i})].
>
> One then recursively computes the FFT butterfly of the left and right
> halves of the vector. Eventually one gets down to vectors of length 1
> which represent polynomials modulo x-1 (or really x-z^i for i =
> 0,..,2m). Note that taking a polynomial f(x) modulo x - z^i is the
> same thing as evaluating it at z^i. So what the FFT really computes is
> [f(1), f(z), f(z^2), ...., f(z^{2m})] (well perhaps not in that order,
> but in some order - probably bit reversed order, i.e. z^i is swapped
> with z^j where i is the reverse of j in binary, when they are
> considered as binary numbers of log_2(2m) bits).
>
> This left-right FFT recursion scheme is known as a decimation in
> frequency (DIF) FFT.
>
> Note that multiplication of polynomials modulo x-1 is simply integer
> multiplication. These are the pointwise multiplications we spoke of
> earlier. We simply multiply our vectors of coefficients pointwise.
>
> Now suppose I have a polynomial known modulo x^m - 1 and x^m + 1, how
> can I invert this process and obtain it modulo x^(2m) - 1?
> Specifically, if I have c{i} = a{i}+b{i} and d{i} = z^i*(a{i}-b{i}),
> how can I retrieve a{i} and b{i}. Clearly the answer is to multiply
> d{i} by z^{-i}, then adding it to c{i} yields 2*a{i} and subtracting
> it from c{i} yields 2*b{i}. Notice the extra factor of 2 which has
> crept in.
>
> So the IFFT butterfly step replaces [c0, d0, c1, d1, ...., c{m-1},
> d{m-1}] with [c0+d0, c0-d0, c1+z^(-1)*d1, c1-z*d1, ....,
> c{m-1}+z^{-(m-1)}*d{m-1}, c{m-1}-z^{-(m-1)}*d{m-1}]. Or we can write
> the butterfly conveniently as [c{i}, d{i}] => [c{i}+z^-i*d{i},
> c{i}-z^-i*d{i}].
>
> Note we have paired the odd indexed entries in our vector with the
> even indexed entries. An FFT which has the butterfly step [c{i}, d{i}]
> => [c{i}+z^i*d{i}, c{i}-z^i*d{i}] (note the change in sign of the
> exponents on the twiddle factors to convert from an IFFT to an FFT) is
> called a decimation in time (DIT) FFT.
>
> So our IFFT is simply a DIT FFT with the sign of the exponent of the
> twiddle factors changed.
>
> Now, an important note for later is that the reason this all worked is
> that we can invert the FFT process. Specifically c{i}+z^-i*d{i} =
> 2*a{i}+b{i}-b{i} = 2*a{i}, and similarly we can retrieve 2*b{i}.
>
> One final note. These factors of 2 that creep in can be removed at any
> stage. They are sometimes removed at each butterfly step, sometimes
> all at once at the end of the IFFT.
>
> 2b. FFT algorithms
> ==============
>
> There are multiple different ways of implementing the FFT algorithm in
> practice, and a number of important additional algorithms used in
> multiplying integers using an FFT. I will describe them here.
>
> a) Complex FFT. One can use complex numbers and floating point
> arithmetic to multiply integers using the FFT. One has to be careful
> that at each step not too much precision is being lost. For example,
> one could work with double precision floating point numbers.
>
> Naturally it is inefficient to use complex numbers, as the polynomials
> we are trying to multiply have only real coefficients. There are some
> strategies for getting around this. One can exploit the symmetries of
> the FFT to get FFT's which work on real values and take roughly half
> the time/space. For example the Fast Hartley Transform works with only
> real values, and there are various other variations on this theme.
>
> Note that roots of unity here are complex values, or for the fast
> Hartley transform, real values which are linear combinations of sines
> and cosines. So computing the twiddle factors can be expensive.
>
> The biggest issue with real/complex FFT's is the loss of precision.
> They are really fast as they can use the native floating point
> arithmetic of the processor, they just usually don't guarantee a
> correct result, and aren't so fast or don't support very long
> transforms if they do.
>
> b) Schonhage-Strassen FFTs. Schonhage and Strassen suggested working
> in the ring Z/pZ where p is of the form 2^m+1. Here 2 is a 2m-th
> (2^k-th) root of unity. Simply raise 2^m and you see that you get -1
> in this ring, thus 2^(2m) must be 1 in this ring.
>
> Note that multiplication by a root of unity then becomes
> multiplication by a power of 2 (modulo 2^m + 1), or a bitshift on a
> binary computer. Also working in this ring is convenient, as reduction
> modulo p can be done with a subtraction. In practice we can work in a
> ring Z/pZ where p = 2^(a*m)+1 for some value a. Here 2^a is a 2m-th
> (2^k-th) root of unity. A ring of this form is called a Fermat ring
> and p a generalised Fermat number.
>
> Usually p is chosen to be a large prime that takes up numerous machine
> limbs. For performance reasons a and m are usually chosen so that
> reduction modulo
> p does not involve any bitshifts, i.e. 2^(a*m) is a multiple of a limb shift.
>
> c) Mersenne Ring FFT. Work in a ring modulo p = 2^(b*m) - 1, again for
> m a power of 2 and b arbitrary. Here 2 is an m-th root of unity,
> reduction modulo p is an addition. Numbers p of this form are called
> generalised Mersenne numbers.
>
> d) Number Theoretic Transforms (NTT's). Work in a ring Z/pZ where p is
> a prime such that Z/pZ has lots of roots of unity. Usually p is chosen
> to be a small prime that fits in a machine word. We usually start with
> a prime p of the form p = a*n + 1 for some small value a and some
> larger value n (for example a power of 2 so that reduction mod p has a
> chance of being cheap). We find a primitive root r mod p, and then s =
> r^a (mod p) by Fermat's Little Theorem is a primitive n-th root of
> unity. Thus we can support a transform of length n on the ring Z/pZ.
>
> Many of the historically large integer products that have been
> computed, e.g. in computation of many digits of pi, involved NTT's.
> One can use a few of them, modulo different primes p (usually 2-5 of
> them) and combine using CRT.
>
> The problems with NTT's are twofold. Firstly, once one gets beyond
> about 5 of them, it becomes a significant proportion of the runtime to
> do the CRT step. Also, in practice, NTT implementations don't compare
> favourably with optimised Schonhage-Strassen implementations, being an
> order of magnitude slower in practice. On a 32 bit machine, the
> maximum transform length for a word sized prime can also be limiting
> for very large problems.
>
> The main drawback with NTT's is that multiplication by a root of unity
> is no longer a bit shift. One must also compute all the roots of
> unity, which usually means keeping some table of them handy. For very
> long transforms, the table itself may exceed the size of cache. Or one
> can partially compute the table and then perform some additional
> computations on-the-fly.
>
> -----------------
>
> Now I will discuss some additional algorithms which one needs to know about.
>
> e) Multiplication of integers modulo B^m + 1, using the negacyclic
> convolution. Recall that the negacyclic convolution can be used to
> multiply polynomials modulo x^n+1. If we evaluate such a polynomial at
> some point x = B then we can use a negacyclic convolution to multiply
> integers modulo B^n + 1. There's nothing special about using this
> strategy to multiply integers. A cyclic convolution will generally be
> faster, as one doesn't need the extra twiddle factors to turn the
> negacyclic convolution into a cyclic convolution (so it can be
> computed using an FFT). But for multiplying integers modulo a number
> of the form B^n + 1 it is very useful.
>
> Note that the pointwise multiplications in our convolution algorithm
> are precisely multiplications modulo a number of the form B^m + 1,
> when we are working in a Fermat ring. Thus we can effect our pointwise
> multiplications using a negacyclic convolution. This allows the FFT to
> recurse down to an FFT for the pointwise multiplications (when they
> become sufficiently large). This is more efficient than multiplying
> using an ordinary cyclic convolution and then reducing modulo p, as a
> convolution of half the length can be used (you actually *want* it to
> wrap around). This can save a significant amount of time once the
> pointwise multiplications become so large that the use of an FFT is
> indicated. If fact, because the convolution will be half the size, a
> negacyclic convolution can be used *before* one would consider using
> an ordinary FFT routine to multiply the integers.
>
> That's all I need to discuss for this section, as I will explain some
> other algorithms in later sections as they are needed.
>
> Bill.
>
> 2010/1/1 Bill Hart <goodwillh...@googlemail.com>:
>> 1. Formal release of code.
>>
>> I have included two new files in the top source directory of mpir in
>> svn. The first is new_fft_with_flint.c, the second new_fft.c. They are
>> isomorphic except that the second does not need to be linked against
>> FLINT to run. There will be no difference in timings or results, I
>> just removed all the timing and test code which relied on FLINT.
>>
>> Here is the command I use to build the code once you have built MPIR:
>>
>> gcc -O2 new_fft.c -o fft -I/home/wbhart/mpir-trunk
>> -L/home/wbhart/mpir-trunk/.libs -lmpir
>>
>> Modify as desired for your own system.
>>
>> Please note the usual disclaimer:
>>
>> This code is INCOMPLETE, INEFFICIENT, INCONSISTENT often INCORRECT,
>> ILL-TESTED and the original author is ILL-TEMPERED.
>>
>> It is LGPL v2+ licensed and I have taken extreme care to ensure that
>> all the code was written from scratch by me without following someone
>> else's work. The four functions FFT_split_* and FFT_combine_* were
>> taken from FLINT, but to my knowledge were written entirely by me. I
>> hereby release them under the LGPL v2+.
>>
>> The timing code does not print times, it merely completes a certain
>> number of iterations and exits. Use the linux time command to get
>> times.
>>
>> I have run very few tests on the multiplication routine, perhaps on
>> the order of a few thousand integer multiplications, at very few
>> sizes. This code is extremely new and should be treated as quite
>> likely broken.
>>
>> Note that the negacyclic convolution is currently disabled. I don't
>> even know if it passes its test code when switched on at present. I
>> modified a number of the FFT's since writing it, so probably not. But
>> it did pass it's test code. However note it does not deal with a
>> couple of (trivial) special cases, so is incomplete and formally
>> incorrect because of that.
>>
>> There is an implementation of the split radix FFT contained in the
>> file too. I found the times were *identical* to those I got from the
>> radix 2 code, so I abandoned it. There is also an old implementation
>> of the radix 2 FFT/IFFT which I also abandoned. The basic radix 2 FFT
>> function is called FFT_radix2_new.
>>
>> All the truncate, mfa and twiddle functions are new in that they don't
>> rely on the old radix 2 implementation, I believe.
>>
>> Many of the code comments have been cut and pasted from earlier code
>> they were derived from, so will be incorrect. This is especially true
>> of the comments for the FFT and IFFT functions (including the truncate
>> and mfa functions), except for the radix2_new and split_radix
>> functions which I hope are about right.
>>
>> Bill.
>>
>> 2010/1/1 Bill Hart <goodwillh...@googlemail.com>:
>>> As the code for this FFT is quite involved, I plan to incrementally
>>> release a set of notes about it, starting tomorrow morning (today
>>> already), which may take hours to days for me to complete. I got the
>>> code working for the first time this afternoon, so please be patient
>>> with me.
>>>
>>> Here is the rough list of topics I hope to cover in the notes:
>>>
>>> Table of Contents
>>> =============
>>>
>>> 1) The formal code release. I need to check all the test code still
>>> passes and make it easy for people to build and run. That will take an
>>> hour or so for me to do.
>>>
>>> 2) FFT's for nitwits (like me). How does an FFT help in multiplying
>>> numbers, basically, and what are the important algorithms and
>>> techniques.
>>>
>>> 3) Why a new FFT? This is an important question which needs answering.
>>> I will discuss what led me to work on a new FFT implementation, rather
>>> than improve or port another implementation.
>>>
>>> 4) The strategy. What does this FFT actually do? Which algorithms does
>>> it use?  I tried many things, most of which completely failed. In
>>> section 3 I will have already discussed other strategies I tried which
>>> did not work. Here I will discuss what did work. Whilst many of the
>>> details came from my head, I'm pretty sure the general outline will
>>> contain few surprises.
>>>
>>> 5) References. Few. I purposely tried to be creative.
>>>
>>> 8) Implementors notes. A long list of things that can be improved. In
>>> releasing the code publicly it is my hope that contributors will help
>>> make this the best FFT implementation it can be. The list will include
>>> many trivial items which anyone could work on, all the way through to
>>> fairly advanced topics which would probably require original research
>>> to do really well. I'll rank the items according to how hard I think
>>> they'd be to do.
>>>
>>> The code will be under revision control, so it is my hope that people
>>> will be willing to contribute to it. I believe I spent between 150 and
>>> 200 hours writing it, all told.
>>>
>>> Till tomorrow.
>>>
>>> Bill.
>>>
>>> 2010/1/1 Bill Hart <goodwillh...@googlemail.com>:
>>>> Ha! I forgot to mention, those numbers represent the size of the
>>>> integers in bits. I multiply two different integers with the same
>>>> number of bits, though that is not a limitation of the code. It will
>>>> accept integers of different lengths and adjust accordingly.
>>>>
>>>> Bill.
>>>>
>>>> 2010/1/1 Bill Hart <goodwillh...@googlemail.com>:
>>>>> I re-timed that point and I believe I may have just mistranscribed it.
>>>>> Here it is with the corrected time. Thanks for pointing it out!
>>>>>
>>>>> I've also included some times from the FLINT FFT, though only
>>>>> non-truncated ones (i.e. where FLINT uses a convolution which is an
>>>>> exact power of 2 length - not a limitation of FLINT, I just didn't get
>>>>> around to timing it at the other points yet).
>>>>>
>>>>> 8364032:  6.8s / 8.0s
>>>>> 9409536:  9.8s / 8.7s
>>>>> 10455040:  10.8s / 9.6s
>>>>> 11500544: 12.3s / 12.6s
>>>>> 12546048: 12.7s / 15.3s
>>>>> 13591552: 14.0s / 14.2s
>>>>> 14637056: 14.9s / 15.3s
>>>>> 15682560: 16.0s / 14.5s
>>>>> 16728064: 16.8s / 19.5s
>>>>>
>>>>> 20910080: 24.3s / 23.2s
>>>>> 25092096: 28.5s / 33.1s
>>>>> 29274112: 33.1s / 31.8s
>>>>> 33456128: 37.4s / 47.7s
>>>>>
>>>>> 519168: iters 10000: 30.7s / 32.2s / 32.3s
>>>>> 2084864: iters 1000: 13.9s / 15.2s / 13.0s
>>>>> 8364032: iters 100: 6.8s / 8.0s / 7.6s
>>>>> 33497088: iters 100: 37.4s / 47.7s / 37.5s
>>>>> 134103040: iters 10: 17.0s / 24.8s / 19.6s
>>>>> 536608768: iters 1: 8.6s / 9.4s / 8.6s
>>>>> 2146959360: iters 1: 42.3s / 40.3s / 38.9s
>>>>>
>>>>> Bill.
>>>>>
>>>>>
>>>>> 2009/12/31 Bill Hart <goodwillh...@googlemail.com>:
>>>>>> It's either a tuning issue or timing instability on the machine.
>>>>>>
>>>>>> Bill.
>>>>>>
>>>>>> On 31/12/2009, Robert Gerbicz <robert.gerb...@gmail.com> wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> How is it possible that to multiple longer numbers takes less time, for
>>>>>>> example:
>>>>>>>
>>>>>>> 12546048: 14.7s / 15.3s
>>>>>>> 13591552: 14.0s / 14.2s
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups
>>>>>>> "mpir-devel" group.
>>>>>>> To post to this group, send email to mpir-de...@googlegroups.com.
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> mpir-devel+unsubscr...@googlegroups.com.
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/group/mpir-devel?hl=en.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

--

You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-de...@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Re: [mpir-devel] A new FFT for the New Year

Reply via email to