[sage-devel] Re: numerically stable fast univariate polynomial multiplication over RR[x]

Bill Hart Thu, 29 Apr 2010 16:12:23 -0700

If the coefficients are allowed to vary in sign and vary wildly in the
number of bits, then nothing will save you. Here are the figures for
classical multiplication:


len = 1, min = 0, av = 18, max = 79, prec = 106
len = 2, min = 0, av = 23, max = 101, prec = 106
len = 3, min = 0, av = 19, max = 98, prec = 106
len = 4, min = 0, av = 16, max = 96, prec = 106
len = 6, min = 0, av = 21, max = 85, prec = 106
len = 8, min = 0, av = 26, max = 87, prec = 106
len = 11, min = 0, av = 24, max = 80, prec = 106
len = 15, min = 0, av = 15, max = 85, prec = 106
len = 20, min = 0, av = 25, max = 91, prec = 106
len = 26, min = 0, av = 16, max = 75, prec = 106
len = 34, min = 0, av = 20, max = 104, prec = 106
len = 45, min = 0, av = 20, max = 96, prec = 106
len = 59, min = 0, av = 16, max = 71, prec = 106
len = 77, min = 0, av = 26, max = 86, prec = 106
len = 101, min = 0, av = 24, max = 93, prec = 106
len = 132, min = 0, av = 10, max = 61, prec = 106
len = 172, min = 0, av = 19, max = 88, prec = 106
len = 224, min = 0, av = 21, max = 93, prec = 106
len = 292, min = 0, av = 16, max = 93, prec = 106
len = 380, min = 0, av = 23, max = 98, prec = 106
len = 494, min = 0, av = 25, max = 106, prec = 106
len = 643, min = 0, av = 31, max = 103, prec = 106
len = 836, min = 0, av = 28, max = 89, prec = 106
len = 1087, min = 0, av = 13, max = 90, prec = 106
len = 1414, min = 0, av = 22, max = 89, prec = 106
len = 1839, min = 0, av = 15, max = 83, prec = 106
len = 2391, min = 0, av = 21, max = 82, prec = 106

And also for FHT:

len = 1, min = 0, av = 12, max = 79, prec = 106
len = 2, min = 0, av = 32, max = 108, prec = 106
len = 3, min = 0, av = 39, max = 123, prec = 106
len = 4, min = 0, av = 59, max = 145, prec = 106
len = 6, min = 10, av = 76, max = 136, prec = 106
len = 8, min = 2, av = 90, max = 175, prec = 106
len = 11, min = 5, av = 91, max = 162, prec = 106
len = 15, min = 29, av = 85, max = 160, prec = 106
len = 20, min = 14, av = 102, max = 186, prec = 106
len = 26, min = 20, av = 96, max = 162, prec = 106
len = 34, min = 11, av = 95, max = 191, prec = 106
len = 45, min = 0, av = 92, max = 189, prec = 106
len = 59, min = 10, av = 93, max = 160, prec = 106
len = 77, min = 40, av = 111, max = 179, prec = 106
len = 101, min = 44, av = 110, max = 181, prec = 106
len = 132, min = 47, av = 93, max = 156, prec = 106
len = 172, min = 19, av = 100, max = 184, prec = 106
len = 224, min = 38, av = 104, max = 186, prec = 106
len = 292, min = 13, av = 97, max = 187, prec = 106
len = 380, min = 31, av = 105, max = 192, prec = 106
len = 494, min = 27, av = 109, max = 198, prec = 106
len = 643, min = 47, av = 120, max = 198, prec = 106
len = 836, min = 13, av = 111, max = 185, prec = 106
len = 1087, min = 51, av = 96, max = 184, prec = 106
len = 1414, min = 6, av = 110, max = 184, prec = 106
len = 1839, min = 30, av = 97, max = 180, prec = 106
len = 2391, min = 30, av = 102, max = 177, prec = 106

Clearly, in this situation, one wants to double the precision one is
working at, at least.

Bill.

On Apr 29, 11:02 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
> I wrote a little program to compute the "exact" product of two
> polynomials with uniformly random unsigned coefficients < 2^prec for
> some precision prec. I then compute the number of bits that are
> correct in each coefficient and subtract from the precision. That
> gives a measure of how many bits are "lost". I take the maximum number
> of bits lost over the coefficients of the entire product polynomial. I
> printed the results by length of polynomials multiplied. I give the
> minimum number of bits lost, the average and the maximum, for 30
> iterations in each case.
>
> Here are the results for the Fast Hartley Transform multiplication.
> The column prec just shows the working precision I used (of course it
> doesn't significantly change the results if I change prec - the same
> number of bits are lost regardless of the initial working precision):
>
> len = 1, min = 0, av = 0, max = 1, prec = 106
> len = 2, min = 0, av = 0, max = 3, prec = 106
> len = 3, min = 0, av = 0, max = 8, prec = 106
> len = 4, min = 0, av = 1, max = 10, prec = 106
> len = 6, min = 0, av = 0, max = 4, prec = 106
> len = 8, min = 0, av = 1, max = 8, prec = 106
> len = 11, min = 0, av = 1, max = 4, prec = 106
> len = 15, min = 0, av = 2, max = 7, prec = 106
> len = 20, min = 0, av = 2, max = 8, prec = 106
> len = 26, min = 0, av = 2, max = 7, prec = 106
> len = 34, min = 0, av = 3, max = 7, prec = 106
> len = 45, min = 0, av = 3, max = 11, prec = 106
> len = 59, min = 0, av = 4, max = 15, prec = 106
> len = 77, min = 0, av = 4, max = 10, prec = 106
> len = 101, min = 1, av = 4, max = 9, prec = 106
> len = 132, min = 1, av = 4, max = 8, prec = 106
> len = 172, min = 3, av = 5, max = 9, prec = 106
> len = 224, min = 0, av = 5, max = 10, prec = 106
> len = 292, min = 3, av = 5, max = 12, prec = 106
> len = 380, min = 3, av = 7, max = 16, prec = 106
> len = 494, min = 3, av = 6, max = 12, prec = 106
> len = 643, min = 4, av = 7, max = 14, prec = 106
> len = 836, min = 4, av = 7, max = 10, prec = 106
> len = 1087, min = 2, av = 6, max = 13, prec = 106
> len = 1414, min = 4, av = 8, max = 20, prec = 106
> len = 1839, min = 4, av = 9, max = 19, prec = 106
> len = 2391, min = 6, av = 9, max = 17, prec = 106
> len = 3109, min = 6, av = 9, max = 17, prec = 106
> len = 4042, min = 7, av = 9, max = 19, prec = 106
> len = 5255, min = 7, av = 10, max = 18, prec = 106
> len = 6832, min = 5, av = 11, max = 19, prec = 106
>
> By this same measure, you almost always get min = 0, av = 0, max = 1
> for classical multiplication.
>
> Of course this is the least interesting case (uniform random unsigned
> coefficients), and we have no other algorithms to compare with (yet).
>
> Bill.
>
> On Apr 29, 7:51 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
>
>
> > Should this happen? :
>
> > sage: f._mul_fateman(f)
> > ERROR: An unexpected error occurred while tokenizing input
> > The following traceback may be corrupted or invalid
> > The error message is: ('EOF in multi-line statement', (22841, 0))
>
> > ERROR: An unexpected error occurred while tokenizing input
> > The following traceback may be corrupted or invalid
> > The error message is: ('EOF in multi-line statement', (7, 0))
>
> > ---------------------------------------------------------------------------
> > OverflowError                             Traceback (most recent call
> > last)
>
> > /home/wbhart/<ipython console> in <module>()
>
> > /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/
> > polynomial/polynomial_element.so in
> > sage.rings.polynomial.polynomial_element.Polynomial._mul_fateman (sage/
> > rings/polynomial/polynomial_element.c:15643)()
>
> > /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/
> > polynomial/polynomial_fateman.pyc in _mul_fateman_mul(f, g)
> >      81     z_poly_f_list = z_poly_f.coeffs()
> >      82     z_poly_g_list = z_poly_g.coeffs()
> > ---> 83     padding =
> > _mul_fateman_to_int2(z_poly_f_list,z_poly_g_list)
> >      84
> >      85     n_f = z_poly_f(1<<padding)
>
> > /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/
> > polynomial/polynomial_fateman.pyc in _mul_fateman_to_int2(f_list,
> > g_list)
> >      24     max_coeff_g = max([abs(i) for i in g_list])
> >      25     b =
> > (1+min(len(f_list),len(g_list)))*max_coeff_f*max_coeff_g
> > ---> 26     return int(pyceil(pylog(b,2)))
> >      27
> >      28 def _mul_fateman_to_poly(number,padding):
>
> > OverflowError: cannot convert float infinity to integer
>
> > Bill.
>
> > On Apr 29, 7:47 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > Being a little more precise about it. Here are the Sage timings
> > > performed on the same machine as I did the FLINT timings:
>
> > > sage: f=(x+1)^1000
> > > sage: timeit("f*f")
> > > 5 loops, best of 3: 87.5 ms per loop
> > > sage: timeit("f._mul_karatsuba(f)")
> > > 5 loops, best of 3: 370 ms per loop
>
> > > flint2 (with FHT): 5.5 ms
>
> > > sage: R.<x> = RR[]
> > > sage: f=(x+1)^10000
> > > sage: timeit("f*f")
> > > 5 loops, best of 3: 8.74 s per loop
> > > sage: timeit("f._mul_karatsuba(f)")
> > > 5 loops, best of 3: 17.3 s per loop
>
> > > flint2 (with FHT): 0.255s
>
> > > Bill.
>
> > > On Apr 29, 5:45 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > I made the speedup I mentioned to the Fast Hartley Transform code in
> > > > flint2.
>
> > > > Again I pick on the benchmarks Robert gave:
>
> > > > > sage: f = (x+1)^1000
> > > > > sage: timeit("f*f")
> > > > > 5 loops, best of 3: 142 ms per loop
> > > > > sage: timeit("f._mul_karatsuba(f)")
> > > > > 5 loops, best of 3: 655 ms per loop
>
> > > > That time is down to 5.5 ms with the FHT code in FLINT.
>
> > > > (If we change that to f = (x+1)^10000 then the time to compute f*f is
> > > > now about 255 ms.)
>
> > > > Bill.
>
> > > > On Apr 28, 8:01 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > I coded up a basic classical multiplication (no attempt whatsoever to
> > > > > prevent cancellation). Fortunately the result of the classical and
> > > > > Hartley multiplication routines agree (up to some precision loss).
>
> > > > > I picked on the timings Robert gave:
>
> > > > > sage: f = (x+1)^1000
> > > > > sage: timeit("f*f")
> > > > > 5 loops, best of 3: 142 ms per loop
> > > > > sage: timeit("f._mul_karatsuba(f)")
> > > > > 5 loops, best of 3: 655 ms per loop
>
> > > > > So with the classical algorithm this multiplication takes about 16 ms.
> > > > > With the Fast Hartley Transform it currently takes 26 ms.
>
> > > > > If we change that to (1+x)^10000 then the times are 1400 ms and 610 ms
> > > > > respectively.
>
> > > > > The FHT should be much faster after I make an important speedup some
> > > > > time today or tomorrow. It should comfotably win in that first
> > > > > benchmark.
>
> > > > > Bill.
>
> > > > > On Apr 28, 6:01 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > > Rader-Brenner is less stable because it uses purely imaginary 
> > > > > > twiddle
> > > > > > factors whose absolute value is considerably bigger than 1. It has
> > > > > > been largely discarded in favour of the split-radix method which 
> > > > > > uses
> > > > > > less real multiplications and additions anyhow.
>
> > > > > > It's very interesting to see an implementation of it, because it is
> > > > > > not described in very many places!
>
> > > > > > Bill.
>
> > > > > > On Apr 28, 5:10 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > > > A very nice looking library, but it uses Karatsuba and Toom Cook, 
> > > > > > > the
> > > > > > > very things we decided we unstable. I also read that the 
> > > > > > > Rader-Brenner
> > > > > > > FFT is very unstable.
>
> > > > > > > Bill.
>
> > > > > > > On Apr 28, 2:14 pm, YannLC <yannlaiglecha...@gmail.com> wrote:
>
> > > > > > > > This might be of interest:
>
> > > > > > > > from the people of MPFR,
>
> > > > > > > > "Mpfrcx is a library for the arithmetic of univariate 
> > > > > > > > polynomials over
> > > > > > > > arbitrary precision real (Mpfr) or complex (Mpc) numbers, 
> > > > > > > > without
> > > > > > > > control on the rounding. For the time being, only the few 
> > > > > > > > functions
> > > > > > > > needed to implement the floating point approach to complex
> > > > > > > > multiplication are implemented. On the other hand, these 
> > > > > > > > comprise
> > > > > > > > asymptotically fast multiplication routines such as Toom–Cook 
> > > > > > > > and the
> > > > > > > > FFT."
>
> > > > > > > >http://www.multiprecision.org/index.php?prog=mpfrcx
>
> > > > > > > > On Apr 28, 3:53 am, Bill Hart <goodwillh...@googlemail.com> 
> > > > > > > > wrote:
>
> > > > > > > > > You can access the code here:
>
> > > > > > > > >http://selmer.warwick.ac.uk/gitweb/flint2.git?a=shortlog;h=refs/heads...
>
> > > > > > > > > You need the latest MPIR release candidate, the latest MPFR 
> > > > > > > > > and either
> > > > > > > > > an x86 or x86_64 machine to run it.
>
> > > > > > > > > set the liib and include paths in the top level makefile, set 
> > > > > > > > > the
> > > > > > > > > LD_LIBRARY_PATH's in flint_env, then do:
>
> > > > > > > > > source flint_env
> > > > > > > > > make library
> > > > > > > > > make check MOD=mpfr_poly
>
> > > > > > > > > Most of the relevant code is in the mpfr_poly directory.
>
> > > > > > > > > If it doesn't work for you, please understand this is 
> > > > > > > > > development code
> > > > > > > > > only, therefore probably broken in many ways.
>
> > > > > > > > > Information on the Fast Hartley Transform can be found in 
> > > > > > > > > Joerg
> > > > > > > > > Arndt's fxtbook:
>
> > > > > > > > >http://www.jjj.de/fxt/fxtbook.pdf
>
> > > > > > > > > Bill.
>
> > > > > > > > > On Apr 28, 2:45 am, Bill Hart <goodwillh...@googlemail.com> 
> > > > > > > > > wrote:
>
> > > > > > > > > > Well I got the polynomial convolution working with the Fast 
> > > > > > > > > > Hartley
> > > > > > > > > > Transform. It seems to pass my primitive test code.
>
> > > > > > > > > > As an example of a first timing, 1000 iterations of 
> > > > > > > > > > multiplying
> > > > > > > > > > polynomials of length 512 with 100 floating point bits 
> > > > > > > > > > precision per
> > > > > > > > > > coefficient takes 17s.
>
> > > > > > > > > > I think I can make it take about half that time (at least), 
> > > > > > > > > > but that
>
> ...
>
> read more »

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: numerically stable fast univariate polynomial multiplication over RR[x]

Reply via email to