If the coefficients are allowed to vary in sign and vary wildly in the number of bits, then nothing will save you. Here are the figures for classical multiplication:
len = 1, min = 0, av = 18, max = 79, prec = 106 len = 2, min = 0, av = 23, max = 101, prec = 106 len = 3, min = 0, av = 19, max = 98, prec = 106 len = 4, min = 0, av = 16, max = 96, prec = 106 len = 6, min = 0, av = 21, max = 85, prec = 106 len = 8, min = 0, av = 26, max = 87, prec = 106 len = 11, min = 0, av = 24, max = 80, prec = 106 len = 15, min = 0, av = 15, max = 85, prec = 106 len = 20, min = 0, av = 25, max = 91, prec = 106 len = 26, min = 0, av = 16, max = 75, prec = 106 len = 34, min = 0, av = 20, max = 104, prec = 106 len = 45, min = 0, av = 20, max = 96, prec = 106 len = 59, min = 0, av = 16, max = 71, prec = 106 len = 77, min = 0, av = 26, max = 86, prec = 106 len = 101, min = 0, av = 24, max = 93, prec = 106 len = 132, min = 0, av = 10, max = 61, prec = 106 len = 172, min = 0, av = 19, max = 88, prec = 106 len = 224, min = 0, av = 21, max = 93, prec = 106 len = 292, min = 0, av = 16, max = 93, prec = 106 len = 380, min = 0, av = 23, max = 98, prec = 106 len = 494, min = 0, av = 25, max = 106, prec = 106 len = 643, min = 0, av = 31, max = 103, prec = 106 len = 836, min = 0, av = 28, max = 89, prec = 106 len = 1087, min = 0, av = 13, max = 90, prec = 106 len = 1414, min = 0, av = 22, max = 89, prec = 106 len = 1839, min = 0, av = 15, max = 83, prec = 106 len = 2391, min = 0, av = 21, max = 82, prec = 106 And also for FHT: len = 1, min = 0, av = 12, max = 79, prec = 106 len = 2, min = 0, av = 32, max = 108, prec = 106 len = 3, min = 0, av = 39, max = 123, prec = 106 len = 4, min = 0, av = 59, max = 145, prec = 106 len = 6, min = 10, av = 76, max = 136, prec = 106 len = 8, min = 2, av = 90, max = 175, prec = 106 len = 11, min = 5, av = 91, max = 162, prec = 106 len = 15, min = 29, av = 85, max = 160, prec = 106 len = 20, min = 14, av = 102, max = 186, prec = 106 len = 26, min = 20, av = 96, max = 162, prec = 106 len = 34, min = 11, av = 95, max = 191, prec = 106 len = 45, min = 0, av = 92, max = 189, prec = 106 len = 59, min = 10, av = 93, max = 160, prec = 106 len = 77, min = 40, av = 111, max = 179, prec = 106 len = 101, min = 44, av = 110, max = 181, prec = 106 len = 132, min = 47, av = 93, max = 156, prec = 106 len = 172, min = 19, av = 100, max = 184, prec = 106 len = 224, min = 38, av = 104, max = 186, prec = 106 len = 292, min = 13, av = 97, max = 187, prec = 106 len = 380, min = 31, av = 105, max = 192, prec = 106 len = 494, min = 27, av = 109, max = 198, prec = 106 len = 643, min = 47, av = 120, max = 198, prec = 106 len = 836, min = 13, av = 111, max = 185, prec = 106 len = 1087, min = 51, av = 96, max = 184, prec = 106 len = 1414, min = 6, av = 110, max = 184, prec = 106 len = 1839, min = 30, av = 97, max = 180, prec = 106 len = 2391, min = 30, av = 102, max = 177, prec = 106 Clearly, in this situation, one wants to double the precision one is working at, at least. Bill. On Apr 29, 11:02 pm, Bill Hart <goodwillh...@googlemail.com> wrote: > I wrote a little program to compute the "exact" product of two > polynomials with uniformly random unsigned coefficients < 2^prec for > some precision prec. I then compute the number of bits that are > correct in each coefficient and subtract from the precision. That > gives a measure of how many bits are "lost". I take the maximum number > of bits lost over the coefficients of the entire product polynomial. I > printed the results by length of polynomials multiplied. I give the > minimum number of bits lost, the average and the maximum, for 30 > iterations in each case. > > Here are the results for the Fast Hartley Transform multiplication. > The column prec just shows the working precision I used (of course it > doesn't significantly change the results if I change prec - the same > number of bits are lost regardless of the initial working precision): > > len = 1, min = 0, av = 0, max = 1, prec = 106 > len = 2, min = 0, av = 0, max = 3, prec = 106 > len = 3, min = 0, av = 0, max = 8, prec = 106 > len = 4, min = 0, av = 1, max = 10, prec = 106 > len = 6, min = 0, av = 0, max = 4, prec = 106 > len = 8, min = 0, av = 1, max = 8, prec = 106 > len = 11, min = 0, av = 1, max = 4, prec = 106 > len = 15, min = 0, av = 2, max = 7, prec = 106 > len = 20, min = 0, av = 2, max = 8, prec = 106 > len = 26, min = 0, av = 2, max = 7, prec = 106 > len = 34, min = 0, av = 3, max = 7, prec = 106 > len = 45, min = 0, av = 3, max = 11, prec = 106 > len = 59, min = 0, av = 4, max = 15, prec = 106 > len = 77, min = 0, av = 4, max = 10, prec = 106 > len = 101, min = 1, av = 4, max = 9, prec = 106 > len = 132, min = 1, av = 4, max = 8, prec = 106 > len = 172, min = 3, av = 5, max = 9, prec = 106 > len = 224, min = 0, av = 5, max = 10, prec = 106 > len = 292, min = 3, av = 5, max = 12, prec = 106 > len = 380, min = 3, av = 7, max = 16, prec = 106 > len = 494, min = 3, av = 6, max = 12, prec = 106 > len = 643, min = 4, av = 7, max = 14, prec = 106 > len = 836, min = 4, av = 7, max = 10, prec = 106 > len = 1087, min = 2, av = 6, max = 13, prec = 106 > len = 1414, min = 4, av = 8, max = 20, prec = 106 > len = 1839, min = 4, av = 9, max = 19, prec = 106 > len = 2391, min = 6, av = 9, max = 17, prec = 106 > len = 3109, min = 6, av = 9, max = 17, prec = 106 > len = 4042, min = 7, av = 9, max = 19, prec = 106 > len = 5255, min = 7, av = 10, max = 18, prec = 106 > len = 6832, min = 5, av = 11, max = 19, prec = 106 > > By this same measure, you almost always get min = 0, av = 0, max = 1 > for classical multiplication. > > Of course this is the least interesting case (uniform random unsigned > coefficients), and we have no other algorithms to compare with (yet). > > Bill. > > On Apr 29, 7:51 pm, Bill Hart <goodwillh...@googlemail.com> wrote: > > > > > Should this happen? : > > > sage: f._mul_fateman(f) > > ERROR: An unexpected error occurred while tokenizing input > > The following traceback may be corrupted or invalid > > The error message is: ('EOF in multi-line statement', (22841, 0)) > > > ERROR: An unexpected error occurred while tokenizing input > > The following traceback may be corrupted or invalid > > The error message is: ('EOF in multi-line statement', (7, 0)) > > > --------------------------------------------------------------------------- > > OverflowError Traceback (most recent call > > last) > > > /home/wbhart/<ipython console> in <module>() > > > /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/ > > polynomial/polynomial_element.so in > > sage.rings.polynomial.polynomial_element.Polynomial._mul_fateman (sage/ > > rings/polynomial/polynomial_element.c:15643)() > > > /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/ > > polynomial/polynomial_fateman.pyc in _mul_fateman_mul(f, g) > > 81 z_poly_f_list = z_poly_f.coeffs() > > 82 z_poly_g_list = z_poly_g.coeffs() > > ---> 83 padding = > > _mul_fateman_to_int2(z_poly_f_list,z_poly_g_list) > > 84 > > 85 n_f = z_poly_f(1<<padding) > > > /usr/local/sage/sage-4.4/local/lib/python2.6/site-packages/sage/rings/ > > polynomial/polynomial_fateman.pyc in _mul_fateman_to_int2(f_list, > > g_list) > > 24 max_coeff_g = max([abs(i) for i in g_list]) > > 25 b = > > (1+min(len(f_list),len(g_list)))*max_coeff_f*max_coeff_g > > ---> 26 return int(pyceil(pylog(b,2))) > > 27 > > 28 def _mul_fateman_to_poly(number,padding): > > > OverflowError: cannot convert float infinity to integer > > > Bill. > > > On Apr 29, 7:47 pm, Bill Hart <goodwillh...@googlemail.com> wrote: > > > > Being a little more precise about it. Here are the Sage timings > > > performed on the same machine as I did the FLINT timings: > > > > sage: f=(x+1)^1000 > > > sage: timeit("f*f") > > > 5 loops, best of 3: 87.5 ms per loop > > > sage: timeit("f._mul_karatsuba(f)") > > > 5 loops, best of 3: 370 ms per loop > > > > flint2 (with FHT): 5.5 ms > > > > sage: R.<x> = RR[] > > > sage: f=(x+1)^10000 > > > sage: timeit("f*f") > > > 5 loops, best of 3: 8.74 s per loop > > > sage: timeit("f._mul_karatsuba(f)") > > > 5 loops, best of 3: 17.3 s per loop > > > > flint2 (with FHT): 0.255s > > > > Bill. > > > > On Apr 29, 5:45 pm, Bill Hart <goodwillh...@googlemail.com> wrote: > > > > > I made the speedup I mentioned to the Fast Hartley Transform code in > > > > flint2. > > > > > Again I pick on the benchmarks Robert gave: > > > > > > sage: f = (x+1)^1000 > > > > > sage: timeit("f*f") > > > > > 5 loops, best of 3: 142 ms per loop > > > > > sage: timeit("f._mul_karatsuba(f)") > > > > > 5 loops, best of 3: 655 ms per loop > > > > > That time is down to 5.5 ms with the FHT code in FLINT. > > > > > (If we change that to f = (x+1)^10000 then the time to compute f*f is > > > > now about 255 ms.) > > > > > Bill. > > > > > On Apr 28, 8:01 pm, Bill Hart <goodwillh...@googlemail.com> wrote: > > > > > > I coded up a basic classical multiplication (no attempt whatsoever to > > > > > prevent cancellation). Fortunately the result of the classical and > > > > > Hartley multiplication routines agree (up to some precision loss). > > > > > > I picked on the timings Robert gave: > > > > > > sage: f = (x+1)^1000 > > > > > sage: timeit("f*f") > > > > > 5 loops, best of 3: 142 ms per loop > > > > > sage: timeit("f._mul_karatsuba(f)") > > > > > 5 loops, best of 3: 655 ms per loop > > > > > > So with the classical algorithm this multiplication takes about 16 ms. > > > > > With the Fast Hartley Transform it currently takes 26 ms. > > > > > > If we change that to (1+x)^10000 then the times are 1400 ms and 610 ms > > > > > respectively. > > > > > > The FHT should be much faster after I make an important speedup some > > > > > time today or tomorrow. It should comfotably win in that first > > > > > benchmark. > > > > > > Bill. > > > > > > On Apr 28, 6:01 pm, Bill Hart <goodwillh...@googlemail.com> wrote: > > > > > > > Rader-Brenner is less stable because it uses purely imaginary > > > > > > twiddle > > > > > > factors whose absolute value is considerably bigger than 1. It has > > > > > > been largely discarded in favour of the split-radix method which > > > > > > uses > > > > > > less real multiplications and additions anyhow. > > > > > > > It's very interesting to see an implementation of it, because it is > > > > > > not described in very many places! > > > > > > > Bill. > > > > > > > On Apr 28, 5:10 pm, Bill Hart <goodwillh...@googlemail.com> wrote: > > > > > > > > A very nice looking library, but it uses Karatsuba and Toom Cook, > > > > > > > the > > > > > > > very things we decided we unstable. I also read that the > > > > > > > Rader-Brenner > > > > > > > FFT is very unstable. > > > > > > > > Bill. > > > > > > > > On Apr 28, 2:14 pm, YannLC <yannlaiglecha...@gmail.com> wrote: > > > > > > > > > This might be of interest: > > > > > > > > > from the people of MPFR, > > > > > > > > > "Mpfrcx is a library for the arithmetic of univariate > > > > > > > > polynomials over > > > > > > > > arbitrary precision real (Mpfr) or complex (Mpc) numbers, > > > > > > > > without > > > > > > > > control on the rounding. For the time being, only the few > > > > > > > > functions > > > > > > > > needed to implement the floating point approach to complex > > > > > > > > multiplication are implemented. On the other hand, these > > > > > > > > comprise > > > > > > > > asymptotically fast multiplication routines such as Toom–Cook > > > > > > > > and the > > > > > > > > FFT." > > > > > > > > >http://www.multiprecision.org/index.php?prog=mpfrcx > > > > > > > > > On Apr 28, 3:53 am, Bill Hart <goodwillh...@googlemail.com> > > > > > > > > wrote: > > > > > > > > > > You can access the code here: > > > > > > > > > >http://selmer.warwick.ac.uk/gitweb/flint2.git?a=shortlog;h=refs/heads... > > > > > > > > > > You need the latest MPIR release candidate, the latest MPFR > > > > > > > > > and either > > > > > > > > > an x86 or x86_64 machine to run it. > > > > > > > > > > set the liib and include paths in the top level makefile, set > > > > > > > > > the > > > > > > > > > LD_LIBRARY_PATH's in flint_env, then do: > > > > > > > > > > source flint_env > > > > > > > > > make library > > > > > > > > > make check MOD=mpfr_poly > > > > > > > > > > Most of the relevant code is in the mpfr_poly directory. > > > > > > > > > > If it doesn't work for you, please understand this is > > > > > > > > > development code > > > > > > > > > only, therefore probably broken in many ways. > > > > > > > > > > Information on the Fast Hartley Transform can be found in > > > > > > > > > Joerg > > > > > > > > > Arndt's fxtbook: > > > > > > > > > >http://www.jjj.de/fxt/fxtbook.pdf > > > > > > > > > > Bill. > > > > > > > > > > On Apr 28, 2:45 am, Bill Hart <goodwillh...@googlemail.com> > > > > > > > > > wrote: > > > > > > > > > > > Well I got the polynomial convolution working with the Fast > > > > > > > > > > Hartley > > > > > > > > > > Transform. It seems to pass my primitive test code. > > > > > > > > > > > As an example of a first timing, 1000 iterations of > > > > > > > > > > multiplying > > > > > > > > > > polynomials of length 512 with 100 floating point bits > > > > > > > > > > precision per > > > > > > > > > > coefficient takes 17s. > > > > > > > > > > > I think I can make it take about half that time (at least), > > > > > > > > > > but that > > ... > > read more » -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org