[sage-devel] Re: numerically stable fast univariate polynomial multiplication over RR[x]

Bill Hart Thu, 29 Apr 2010 15:02:46 -0700

I wrote a little program to compute the "exact" product of two
polynomials with uniformly random unsigned coefficients < 2^prec for
some precision prec. I then compute the number of bits that are
correct in each coefficient and subtract from the precision. That
gives a measure of how many bits are "lost". I take the maximum number
of bits lost over the coefficients of the entire product polynomial. I
printed the results by length of polynomials multiplied. I give the
minimum number of bits lost, the average and the maximum, for 30
iterations in each case.


Here are the results for the Fast Hartley Transform multiplication.
The column prec just shows the working precision I used (of course it
doesn't significantly change the results if I change prec - the same
number of bits are lost regardless of the initial working precision):

len = 1, min = 0, av = 0, max = 1, prec = 106
len = 2, min = 0, av = 0, max = 3, prec = 106
len = 3, min = 0, av = 0, max = 8, prec = 106
len = 4, min = 0, av = 1, max = 10, prec = 106
len = 6, min = 0, av = 0, max = 4, prec = 106
len = 8, min = 0, av = 1, max = 8, prec = 106
len = 11, min = 0, av = 1, max = 4, prec = 106
len = 15, min = 0, av = 2, max = 7, prec = 106
len = 20, min = 0, av = 2, max = 8, prec = 106
len = 26, min = 0, av = 2, max = 7, prec = 106
len = 34, min = 0, av = 3, max = 7, prec = 106
len = 45, min = 0, av = 3, max = 11, prec = 106
len = 59, min = 0, av = 4, max = 15, prec = 106
len = 77, min = 0, av = 4, max = 10, prec = 106
len = 101, min = 1, av = 4, max = 9, prec = 106
len = 132, min = 1, av = 4, max = 8, prec = 106
len = 172, min = 3, av = 5, max = 9, prec = 106
len = 224, min = 0, av = 5, max = 10, prec = 106
len = 292, min = 3, av = 5, max = 12, prec = 106
len = 380, min = 3, av = 7, max = 16, prec = 106
len = 494, min = 3, av = 6, max = 12, prec = 106
len = 643, min = 4, av = 7, max = 14, prec = 106
len = 836, min = 4, av = 7, max = 10, prec = 106
len = 1087, min = 2, av = 6, max = 13, prec = 106
len = 1414, min = 4, av = 8, max = 20, prec = 106
len = 1839, min = 4, av = 9, max = 19, prec = 106
len = 2391, min = 6, av = 9, max = 17, prec = 106
len = 3109, min = 6, av = 9, max = 17, prec = 106
len = 4042, min = 7, av = 9, max = 19, prec = 106
len = 5255, min = 7, av = 10, max = 18, prec = 106
len = 6832, min = 5, av = 11, max = 19, prec = 106

By this same measure, you almost always get min = 0, av = 0, max = 1
for classical multiplication.

Of course this is the least interesting case (uniform random unsigned
coefficients), and we have no other algorithms to compare with (yet).

Bill.




On Apr 29, 7:51 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
> Should this happen? :
>
> sage: f._mul_fateman(f)
> ERROR: An unexpected error occurred while tokenizing input
> The following traceback may be corrupted or invalid
> The error message is: ('EOF in multi-line statement', (22841, 0))
>
> ERROR: An unexpected error occurred while tokenizing input
> The following traceback may be corrupted or invalid
> The error message is: ('EOF in multi-line statement', (7, 0))
>
> ---------------------------------------------------------------------------
> OverflowError                             Traceback (most recent call
> last)
>
> /home/wbhart/<ipython console> in <module>()
>
> /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/
> polynomial/polynomial_element.so in
> sage.rings.polynomial.polynomial_element.Polynomial._mul_fateman (sage/
> rings/polynomial/polynomial_element.c:15643)()
>
> /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/
> polynomial/polynomial_fateman.pyc in _mul_fateman_mul(f, g)
>      81     z_poly_f_list = z_poly_f.coeffs()
>      82     z_poly_g_list = z_poly_g.coeffs()
> ---> 83     padding =
> _mul_fateman_to_int2(z_poly_f_list,z_poly_g_list)
>      84
>      85     n_f = z_poly_f(1<<padding)
>
> /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/
> polynomial/polynomial_fateman.pyc in _mul_fateman_to_int2(f_list,
> g_list)
>      24     max_coeff_g = max([abs(i) for i in g_list])
>      25     b =
> (1+min(len(f_list),len(g_list)))*max_coeff_f*max_coeff_g
> ---> 26     return int(pyceil(pylog(b,2)))
>      27
>      28 def _mul_fateman_to_poly(number,padding):
>
> OverflowError: cannot convert float infinity to integer
>
> Bill.
>
> On Apr 29, 7:47 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
>
>
> > Being a little more precise about it. Here are the Sage timings
> > performed on the same machine as I did the FLINT timings:
>
> > sage: f=(x+1)^1000
> > sage: timeit("f*f")
> > 5 loops, best of 3: 87.5 ms per loop
> > sage: timeit("f._mul_karatsuba(f)")
> > 5 loops, best of 3: 370 ms per loop
>
> > flint2 (with FHT): 5.5 ms
>
> > sage: R.<x> = RR[]
> > sage: f=(x+1)^10000
> > sage: timeit("f*f")
> > 5 loops, best of 3: 8.74 s per loop
> > sage: timeit("f._mul_karatsuba(f)")
> > 5 loops, best of 3: 17.3 s per loop
>
> > flint2 (with FHT): 0.255s
>
> > Bill.
>
> > On Apr 29, 5:45 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > I made the speedup I mentioned to the Fast Hartley Transform code in
> > > flint2.
>
> > > Again I pick on the benchmarks Robert gave:
>
> > > > sage: f = (x+1)^1000
> > > > sage: timeit("f*f")
> > > > 5 loops, best of 3: 142 ms per loop
> > > > sage: timeit("f._mul_karatsuba(f)")
> > > > 5 loops, best of 3: 655 ms per loop
>
> > > That time is down to 5.5 ms with the FHT code in FLINT.
>
> > > (If we change that to f = (x+1)^10000 then the time to compute f*f is
> > > now about 255 ms.)
>
> > > Bill.
>
> > > On Apr 28, 8:01 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > I coded up a basic classical multiplication (no attempt whatsoever to
> > > > prevent cancellation). Fortunately the result of the classical and
> > > > Hartley multiplication routines agree (up to some precision loss).
>
> > > > I picked on the timings Robert gave:
>
> > > > sage: f = (x+1)^1000
> > > > sage: timeit("f*f")
> > > > 5 loops, best of 3: 142 ms per loop
> > > > sage: timeit("f._mul_karatsuba(f)")
> > > > 5 loops, best of 3: 655 ms per loop
>
> > > > So with the classical algorithm this multiplication takes about 16 ms.
> > > > With the Fast Hartley Transform it currently takes 26 ms.
>
> > > > If we change that to (1+x)^10000 then the times are 1400 ms and 610 ms
> > > > respectively.
>
> > > > The FHT should be much faster after I make an important speedup some
> > > > time today or tomorrow. It should comfotably win in that first
> > > > benchmark.
>
> > > > Bill.
>
> > > > On Apr 28, 6:01 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > Rader-Brenner is less stable because it uses purely imaginary twiddle
> > > > > factors whose absolute value is considerably bigger than 1. It has
> > > > > been largely discarded in favour of the split-radix method which uses
> > > > > less real multiplications and additions anyhow.
>
> > > > > It's very interesting to see an implementation of it, because it is
> > > > > not described in very many places!
>
> > > > > Bill.
>
> > > > > On Apr 28, 5:10 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > > A very nice looking library, but it uses Karatsuba and Toom Cook, 
> > > > > > the
> > > > > > very things we decided we unstable. I also read that the 
> > > > > > Rader-Brenner
> > > > > > FFT is very unstable.
>
> > > > > > Bill.
>
> > > > > > On Apr 28, 2:14 pm, YannLC <yannlaiglecha...@gmail.com> wrote:
>
> > > > > > > This might be of interest:
>
> > > > > > > from the people of MPFR,
>
> > > > > > > "Mpfrcx is a library for the arithmetic of univariate polynomials 
> > > > > > > over
> > > > > > > arbitrary precision real (Mpfr) or complex (Mpc) numbers, without
> > > > > > > control on the rounding. For the time being, only the few 
> > > > > > > functions
> > > > > > > needed to implement the floating point approach to complex
> > > > > > > multiplication are implemented. On the other hand, these comprise
> > > > > > > asymptotically fast multiplication routines such as Toom–Cook and 
> > > > > > > the
> > > > > > > FFT."
>
> > > > > > >http://www.multiprecision.org/index.php?prog=mpfrcx
>
> > > > > > > On Apr 28, 3:53 am, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > > > > You can access the code here:
>
> > > > > > > >http://selmer.warwick.ac.uk/gitweb/flint2.git?a=shortlog;h=refs/heads...
>
> > > > > > > > You need the latest MPIR release candidate, the latest MPFR and 
> > > > > > > > either
> > > > > > > > an x86 or x86_64 machine to run it.
>
> > > > > > > > set the liib and include paths in the top level makefile, set 
> > > > > > > > the
> > > > > > > > LD_LIBRARY_PATH's in flint_env, then do:
>
> > > > > > > > source flint_env
> > > > > > > > make library
> > > > > > > > make check MOD=mpfr_poly
>
> > > > > > > > Most of the relevant code is in the mpfr_poly directory.
>
> > > > > > > > If it doesn't work for you, please understand this is 
> > > > > > > > development code
> > > > > > > > only, therefore probably broken in many ways.
>
> > > > > > > > Information on the Fast Hartley Transform can be found in Joerg
> > > > > > > > Arndt's fxtbook:
>
> > > > > > > >http://www.jjj.de/fxt/fxtbook.pdf
>
> > > > > > > > Bill.
>
> > > > > > > > On Apr 28, 2:45 am, Bill Hart <goodwillh...@googlemail.com> 
> > > > > > > > wrote:
>
> > > > > > > > > Well I got the polynomial convolution working with the Fast 
> > > > > > > > > Hartley
> > > > > > > > > Transform. It seems to pass my primitive test code.
>
> > > > > > > > > As an example of a first timing, 1000 iterations of 
> > > > > > > > > multiplying
> > > > > > > > > polynomials of length 512 with 100 floating point bits 
> > > > > > > > > precision per
> > > > > > > > > coefficient takes 17s.
>
> > > > > > > > > I think I can make it take about half that time (at least), 
> > > > > > > > > but that
> > > > > > > > > is a job for tomorrow.
>
> > > > > > > > > I haven't looked yet to see how much precision is lost 
> > > > > > > > > heuristically.
>
> > > > > > > > > Bill.
>
> > > > > > > > > On Apr 27, 11:49 pm, Bill Hart <goodwillh...@googlemail.com> 
> > > > > > > > > wrote:
>
> > > > > > > > > > Well, I coded up a mini mpfr_poly module and Fast Hartley 
> > > > > > > > > > Transform in
> > > > > > > > > > flint2 using mpfr's as coefficients.
>
> > > > > > > > > > It's hard to know if it is doing the right thing, there's 
> > > > > > > > > > no test code
> > > > > > > > > > yet. But it compiles and doesn't segfault. :-)
>
> > > > > > > > > > A little bit more work required to turn it into a 
> > > > > > > > > > convolution, but as
> > > > > > > > > > far as I know, no inverse transform required, as it is its 
> > > > > > > > > > own
> > > > > > > > > > inverse. Maybe by tomorrow I'll have a fast convolution and 
> > > > > > > > > > we'll be
> > > > > > > > > > able to answer some of these questions heuristically.
>
> > > > > > > > > > It won't be super efficient, no truncation, no radix 4, no 
> > > > > > > > > > special
> > > > > > > > > > case for double precision coefficients, no cache friendly 
> > > > > > > > > > matrix
> > > > > > > > > > fourier stuff, etc, but I did reuse the twiddle factors as 
> > > > > > > > > > much as
> > > > > > > > > > possible, I think.
>
> > > > > > > > > > Bill.
>
> > > > > > > > > > On Apr 27, 5:18 pm, Bill Hart <goodwillh...@googlemail.com> 
> > > > > > > > > > wrote:
>
> > > > > > > > > > > Also see pages 252 and following of Polynomial and Matrix
> > > > > > > > > > > Computations, Volume 1 by Dario Bini and Victor Pan, 
> > > > > > > > > > > which seems to
> > > > > > > > > > > answer your question in detail.
>
> > > > > > > > > > > Bill.
>
> > > > > > > > > > > On Apr 27, 4:57 pm, Bill Hart 
> > > > > > > > > > > <goodwillh...@googlemail.com> wrote:
>
> > > > > > > > > > > > Numerical stability is not something I have any 
> > > > > > > > > > > > experience with. I am
> > > > > > > > > > > > not sure if is is equivalent in some sense to the loss 
> > > > > > > > > > > > of precision
> > > > > > > > > > > > which occurs when doing FFT arithmetic using a floating 
> > > > > > > > > > > > point FFT.
>
> > > > > > > > > > > > The issue seems to be accumulation of "numerical 
> > > > > > > > > > > > noise". There are
> > > > > > > > > > > > proven bounds on how many bits are lost to this as the 
> > > > > > > > > > > > computation
> > > > > > > > > > > > proceeds, but the general consensus has been that for 
> > > > > > > > > > > > "random
> > > > > > > > > > > > coefficients" the numerical noise is much, much less 
> > > > > > > > > > > > than the proven
> > > > > > > > > > > > bounds would indicate.
>
> > > > > > > > > > > > Zimmermann, Gaudry, Kruppa suggest that the following 
> > > > > > > > > > > > paper contains
> > > > > > > > > > > > information about proven bounds for the floating point 
> > > > > > > > > > > > FFT:
>
> > > > > > > > > > > > Percival, C. Rapid multiplication modulo the sum
> > > > > > > > > > > > and difference of highly composite numbers. Math.
> > > > > > > > > > > > Comp. 72, 241 (2003), 387–395
>
> > > > > > > > > > > > Paul Zimmermann would almost certainly know what the 
> > > > > > > > > > > > current State of
> > > > > > > > > > > > the Art is.
>
> > > > > > > > > > > > Similar stability issues arise in matrix arithmetic 
> > > > > > > > > > > > with floating
> > > > > > > > > > > > point coefficient. It seems that one can always prove 
> > > > > > > > > > > > something when
> > > > > > > > > > > > you
>
> ...
>
> read more »

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: numerically stable fast univariate polynomial multiplication over RR[x]

Reply via email to