[sage-devel] Re: numerically stable fast univariate polynomial multiplication over RR[x]

Bill Hart Fri, 30 Apr 2010 12:15:06 -0700

I installed Andreas Enge's mpfrcx package:

http://www.multiprecision.org/index.php?prog=mpfrcx&page=html


which just worked for me.

I took a closer look and the Rader-Brennan FFT is a complex FFT taking
mpcx polynomials as input. When multiplying polynomials over the
reals, it simply converts to complex polynomials and uses the complex
RB FFT.

Anyhow, this wrapper was very nice for me as it just plugged straight
into my existing profiling code with no modifications whatsoever!!
That sort of thing almost never happens.

Anyhow, here is the precision loss figures for unsigned coefficients
all of about the same magnitude. Looks like about 2.5-3 times the
precision loss of the Fast Hartley Transform:

len = 1, min = 0, av = 0, max = 1, prec = 106
len = 2, min = 0, av = 1, max = 3, prec = 106
len = 3, min = 0, av = 2, max = 12, prec = 106
len = 4, min = 0, av = 2, max = 8, prec = 106
len = 6, min = 1, av = 4, max = 9, prec = 106
len = 8, min = 2, av = 5, max = 12, prec = 106
len = 11, min = 2, av = 5, max = 9, prec = 106
len = 15, min = 4, av = 6, max = 12, prec = 106
len = 20, min = 6, av = 8, max = 13, prec = 106
len = 26, min = 6, av = 8, max = 12, prec = 106
len = 34, min = 8, av = 11, max = 18, prec = 106
len = 45, min = 9, av = 12, max = 20, prec = 106
len = 59, min = 10, av = 12, max = 22, prec = 106
len = 77, min = 10, av = 12, max = 17, prec = 106
len = 101, min = 10, av = 13, max = 17, prec = 106
len = 132, min = 13, av = 15, max = 20, prec = 106
len = 172, min = 13, av = 16, max = 21, prec = 106
len = 224, min = 14, av = 16, max = 22, prec = 106
len = 292, min = 15, av = 17, max = 26, prec = 106
len = 380, min = 16, av = 19, max = 27, prec = 106
len = 494, min = 16, av = 18, max = 24, prec = 106
len = 643, min = 16, av = 18, max = 24, prec = 106
len = 836, min = 16, av = 18, max = 22, prec = 106
len = 1087, min = 18, av = 20, max = 25, prec = 106
len = 1414, min = 19, av = 21, max = 33, prec = 106
len = 1839, min = 19, av = 21, max = 30, prec = 106
len = 2391, min = 22, av = 24, max = 31, prec = 106
len = 3109, min = 23, av = 25, max = 30, prec = 106
len = 4042, min = 23, av = 25, max = 30, prec = 106
len = 5255, min = 24, av = 26, max = 32, prec = 106
len = 6832, min = 25, av = 27, max = 31, prec = 106


Bill.

On Apr 30, 12:12 am, Bill Hart <goodwillh...@googlemail.com> wrote:
> If the coefficients are allowed to vary in sign and vary wildly in the
> number of bits, then nothing will save you. Here are the figures for
> classical multiplication:
>
> len = 1, min = 0, av = 18, max = 79, prec = 106
> len = 2, min = 0, av = 23, max = 101, prec = 106
> len = 3, min = 0, av = 19, max = 98, prec = 106
> len = 4, min = 0, av = 16, max = 96, prec = 106
> len = 6, min = 0, av = 21, max = 85, prec = 106
> len = 8, min = 0, av = 26, max = 87, prec = 106
> len = 11, min = 0, av = 24, max = 80, prec = 106
> len = 15, min = 0, av = 15, max = 85, prec = 106
> len = 20, min = 0, av = 25, max = 91, prec = 106
> len = 26, min = 0, av = 16, max = 75, prec = 106
> len = 34, min = 0, av = 20, max = 104, prec = 106
> len = 45, min = 0, av = 20, max = 96, prec = 106
> len = 59, min = 0, av = 16, max = 71, prec = 106
> len = 77, min = 0, av = 26, max = 86, prec = 106
> len = 101, min = 0, av = 24, max = 93, prec = 106
> len = 132, min = 0, av = 10, max = 61, prec = 106
> len = 172, min = 0, av = 19, max = 88, prec = 106
> len = 224, min = 0, av = 21, max = 93, prec = 106
> len = 292, min = 0, av = 16, max = 93, prec = 106
> len = 380, min = 0, av = 23, max = 98, prec = 106
> len = 494, min = 0, av = 25, max = 106, prec = 106
> len = 643, min = 0, av = 31, max = 103, prec = 106
> len = 836, min = 0, av = 28, max = 89, prec = 106
> len = 1087, min = 0, av = 13, max = 90, prec = 106
> len = 1414, min = 0, av = 22, max = 89, prec = 106
> len = 1839, min = 0, av = 15, max = 83, prec = 106
> len = 2391, min = 0, av = 21, max = 82, prec = 106
>
> And also for FHT:
>
> len = 1, min = 0, av = 12, max = 79, prec = 106
> len = 2, min = 0, av = 32, max = 108, prec = 106
> len = 3, min = 0, av = 39, max = 123, prec = 106
> len = 4, min = 0, av = 59, max = 145, prec = 106
> len = 6, min = 10, av = 76, max = 136, prec = 106
> len = 8, min = 2, av = 90, max = 175, prec = 106
> len = 11, min = 5, av = 91, max = 162, prec = 106
> len = 15, min = 29, av = 85, max = 160, prec = 106
> len = 20, min = 14, av = 102, max = 186, prec = 106
> len = 26, min = 20, av = 96, max = 162, prec = 106
> len = 34, min = 11, av = 95, max = 191, prec = 106
> len = 45, min = 0, av = 92, max = 189, prec = 106
> len = 59, min = 10, av = 93, max = 160, prec = 106
> len = 77, min = 40, av = 111, max = 179, prec = 106
> len = 101, min = 44, av = 110, max = 181, prec = 106
> len = 132, min = 47, av = 93, max = 156, prec = 106
> len = 172, min = 19, av = 100, max = 184, prec = 106
> len = 224, min = 38, av = 104, max = 186, prec = 106
> len = 292, min = 13, av = 97, max = 187, prec = 106
> len = 380, min = 31, av = 105, max = 192, prec = 106
> len = 494, min = 27, av = 109, max = 198, prec = 106
> len = 643, min = 47, av = 120, max = 198, prec = 106
> len = 836, min = 13, av = 111, max = 185, prec = 106
> len = 1087, min = 51, av = 96, max = 184, prec = 106
> len = 1414, min = 6, av = 110, max = 184, prec = 106
> len = 1839, min = 30, av = 97, max = 180, prec = 106
> len = 2391, min = 30, av = 102, max = 177, prec = 106
>
> Clearly, in this situation, one wants to double the precision one is
> working at, at least.
>
> Bill.
>
> On Apr 29, 11:02 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
>
>
> > I wrote a little program to compute the "exact" product of two
> > polynomials with uniformly random unsigned coefficients < 2^prec for
> > some precision prec. I then compute the number of bits that are
> > correct in each coefficient and subtract from the precision. That
> > gives a measure of how many bits are "lost". I take the maximum number
> > of bits lost over the coefficients of the entire product polynomial. I
> > printed the results by length of polynomials multiplied. I give the
> > minimum number of bits lost, the average and the maximum, for 30
> > iterations in each case.
>
> > Here are the results for the Fast Hartley Transform multiplication.
> > The column prec just shows the working precision I used (of course it
> > doesn't significantly change the results if I change prec - the same
> > number of bits are lost regardless of the initial working precision):
>
> > len = 1, min = 0, av = 0, max = 1, prec = 106
> > len = 2, min = 0, av = 0, max = 3, prec = 106
> > len = 3, min = 0, av = 0, max = 8, prec = 106
> > len = 4, min = 0, av = 1, max = 10, prec = 106
> > len = 6, min = 0, av = 0, max = 4, prec = 106
> > len = 8, min = 0, av = 1, max = 8, prec = 106
> > len = 11, min = 0, av = 1, max = 4, prec = 106
> > len = 15, min = 0, av = 2, max = 7, prec = 106
> > len = 20, min = 0, av = 2, max = 8, prec = 106
> > len = 26, min = 0, av = 2, max = 7, prec = 106
> > len = 34, min = 0, av = 3, max = 7, prec = 106
> > len = 45, min = 0, av = 3, max = 11, prec = 106
> > len = 59, min = 0, av = 4, max = 15, prec = 106
> > len = 77, min = 0, av = 4, max = 10, prec = 106
> > len = 101, min = 1, av = 4, max = 9, prec = 106
> > len = 132, min = 1, av = 4, max = 8, prec = 106
> > len = 172, min = 3, av = 5, max = 9, prec = 106
> > len = 224, min = 0, av = 5, max = 10, prec = 106
> > len = 292, min = 3, av = 5, max = 12, prec = 106
> > len = 380, min = 3, av = 7, max = 16, prec = 106
> > len = 494, min = 3, av = 6, max = 12, prec = 106
> > len = 643, min = 4, av = 7, max = 14, prec = 106
> > len = 836, min = 4, av = 7, max = 10, prec = 106
> > len = 1087, min = 2, av = 6, max = 13, prec = 106
> > len = 1414, min = 4, av = 8, max = 20, prec = 106
> > len = 1839, min = 4, av = 9, max = 19, prec = 106
> > len = 2391, min = 6, av = 9, max = 17, prec = 106
> > len = 3109, min = 6, av = 9, max = 17, prec = 106
> > len = 4042, min = 7, av = 9, max = 19, prec = 106
> > len = 5255, min = 7, av = 10, max = 18, prec = 106
> > len = 6832, min = 5, av = 11, max = 19, prec = 106
>
> > By this same measure, you almost always get min = 0, av = 0, max = 1
> > for classical multiplication.
>
> > Of course this is the least interesting case (uniform random unsigned
> > coefficients), and we have no other algorithms to compare with (yet).
>
> > Bill.
>
> > On Apr 29, 7:51 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > Should this happen? :
>
> > > sage: f._mul_fateman(f)
> > > ERROR: An unexpected error occurred while tokenizing input
> > > The following traceback may be corrupted or invalid
> > > The error message is: ('EOF in multi-line statement', (22841, 0))
>
> > > ERROR: An unexpected error occurred while tokenizing input
> > > The following traceback may be corrupted or invalid
> > > The error message is: ('EOF in multi-line statement', (7, 0))
>
> > > ---------------------------------------------------------------------------
> > > OverflowError                             Traceback (most recent call
> > > last)
>
> > > /home/wbhart/<ipython console> in <module>()
>
> > > /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/
> > > polynomial/polynomial_element.so in
> > > sage.rings.polynomial.polynomial_element.Polynomial._mul_fateman (sage/
> > > rings/polynomial/polynomial_element.c:15643)()
>
> > > /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/
> > > polynomial/polynomial_fateman.pyc in _mul_fateman_mul(f, g)
> > >      81     z_poly_f_list = z_poly_f.coeffs()
> > >      82     z_poly_g_list = z_poly_g.coeffs()
> > > ---> 83     padding =
> > > _mul_fateman_to_int2(z_poly_f_list,z_poly_g_list)
> > >      84
> > >      85     n_f = z_poly_f(1<<padding)
>
> > > /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/
> > > polynomial/polynomial_fateman.pyc in _mul_fateman_to_int2(f_list,
> > > g_list)
> > >      24     max_coeff_g = max([abs(i) for i in g_list])
> > >      25     b =
> > > (1+min(len(f_list),len(g_list)))*max_coeff_f*max_coeff_g
> > > ---> 26     return int(pyceil(pylog(b,2)))
> > >      27
> > >      28 def _mul_fateman_to_poly(number,padding):
>
> > > OverflowError: cannot convert float infinity to integer
>
> > > Bill.
>
> > > On Apr 29, 7:47 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > Being a little more precise about it. Here are the Sage timings
> > > > performed on the same machine as I did the FLINT timings:
>
> > > > sage: f=(x+1)^1000
> > > > sage: timeit("f*f")
> > > > 5 loops, best of 3: 87.5 ms per loop
> > > > sage: timeit("f._mul_karatsuba(f)")
> > > > 5 loops, best of 3: 370 ms per loop
>
> > > > flint2 (with FHT): 5.5 ms
>
> > > > sage: R.<x> = RR[]
> > > > sage: f=(x+1)^10000
> > > > sage: timeit("f*f")
> > > > 5 loops, best of 3: 8.74 s per loop
> > > > sage: timeit("f._mul_karatsuba(f)")
> > > > 5 loops, best of 3: 17.3 s per loop
>
> > > > flint2 (with FHT): 0.255s
>
> > > > Bill.
>
> > > > On Apr 29, 5:45 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > I made the speedup I mentioned to the Fast Hartley Transform code in
> > > > > flint2.
>
> > > > > Again I pick on the benchmarks Robert gave:
>
> > > > > > sage: f = (x+1)^1000
> > > > > > sage: timeit("f*f")
> > > > > > 5 loops, best of 3: 142 ms per loop
> > > > > > sage: timeit("f._mul_karatsuba(f)")
> > > > > > 5 loops, best of 3: 655 ms per loop
>
> > > > > That time is down to 5.5 ms with the FHT code in FLINT.
>
> > > > > (If we change that to f = (x+1)^10000 then the time to compute f*f is
> > > > > now about 255 ms.)
>
> > > > > Bill.
>
> > > > > On Apr 28, 8:01 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > > I coded up a basic classical multiplication (no attempt whatsoever 
> > > > > > to
> > > > > > prevent cancellation). Fortunately the result of the classical and
> > > > > > Hartley multiplication routines agree (up to some precision loss).
>
> > > > > > I picked on the timings Robert gave:
>
> > > > > > sage: f = (x+1)^1000
> > > > > > sage: timeit("f*f")
> > > > > > 5 loops, best of 3: 142 ms per loop
> > > > > > sage: timeit("f._mul_karatsuba(f)")
> > > > > > 5 loops, best of 3: 655 ms per loop
>
> > > > > > So with the classical algorithm this multiplication takes about 16 
> > > > > > ms.
> > > > > > With the Fast Hartley Transform it currently takes 26 ms.
>
> > > > > > If we change that to (1+x)^10000 then the times are 1400 ms and 610 
> > > > > > ms
>
> ...
>
> read more »

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: numerically stable fast univariate polynomial multiplication over RR[x]

Reply via email to