[sage-devel] Re: numerically stable fast univariate polynomial multiplication over RR[x]

Bill Hart Thu, 29 Apr 2010 11:47:53 -0700

Being a little more precise about it. Here are the Sage timings
performed on the same machine as I did the FLINT timings:


sage: f=(x+1)^1000
sage: timeit("f*f")
5 loops, best of 3: 87.5 ms per loop
sage: timeit("f._mul_karatsuba(f)")
5 loops, best of 3: 370 ms per loop

flint2 (with FHT): 5.5 ms

sage: R.<x> = RR[]
sage: f=(x+1)^10000
sage: timeit("f*f")
5 loops, best of 3: 8.74 s per loop
sage: timeit("f._mul_karatsuba(f)")
5 loops, best of 3: 17.3 s per loop

flint2 (with FHT): 0.255s

Bill.


On Apr 29, 5:45 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
> I made the speedup I mentioned to the Fast Hartley Transform code in
> flint2.
>
> Again I pick on the benchmarks Robert gave:
>
> > sage: f = (x+1)^1000
> > sage: timeit("f*f")
> > 5 loops, best of 3: 142 ms per loop
> > sage: timeit("f._mul_karatsuba(f)")
> > 5 loops, best of 3: 655 ms per loop
>
> That time is down to 5.5 ms with the FHT code in FLINT.
>
> (If we change that to f = (x+1)^10000 then the time to compute f*f is
> now about 255 ms.)
>
> Bill.
>
> On Apr 28, 8:01 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
>
>
> > I coded up a basic classical multiplication (no attempt whatsoever to
> > prevent cancellation). Fortunately the result of the classical and
> > Hartley multiplication routines agree (up to some precision loss).
>
> > I picked on the timings Robert gave:
>
> > sage: f = (x+1)^1000
> > sage: timeit("f*f")
> > 5 loops, best of 3: 142 ms per loop
> > sage: timeit("f._mul_karatsuba(f)")
> > 5 loops, best of 3: 655 ms per loop
>
> > So with the classical algorithm this multiplication takes about 16 ms.
> > With the Fast Hartley Transform it currently takes 26 ms.
>
> > If we change that to (1+x)^10000 then the times are 1400 ms and 610 ms
> > respectively.
>
> > The FHT should be much faster after I make an important speedup some
> > time today or tomorrow. It should comfotably win in that first
> > benchmark.
>
> > Bill.
>
> > On Apr 28, 6:01 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > Rader-Brenner is less stable because it uses purely imaginary twiddle
> > > factors whose absolute value is considerably bigger than 1. It has
> > > been largely discarded in favour of the split-radix method which uses
> > > less real multiplications and additions anyhow.
>
> > > It's very interesting to see an implementation of it, because it is
> > > not described in very many places!
>
> > > Bill.
>
> > > On Apr 28, 5:10 pm, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > A very nice looking library, but it uses Karatsuba and Toom Cook, the
> > > > very things we decided we unstable. I also read that the Rader-Brenner
> > > > FFT is very unstable.
>
> > > > Bill.
>
> > > > On Apr 28, 2:14 pm, YannLC <yannlaiglecha...@gmail.com> wrote:
>
> > > > > This might be of interest:
>
> > > > > from the people of MPFR,
>
> > > > > "Mpfrcx is a library for the arithmetic of univariate polynomials over
> > > > > arbitrary precision real (Mpfr) or complex (Mpc) numbers, without
> > > > > control on the rounding. For the time being, only the few functions
> > > > > needed to implement the floating point approach to complex
> > > > > multiplication are implemented. On the other hand, these comprise
> > > > > asymptotically fast multiplication routines such as Toom–Cook and the
> > > > > FFT."
>
> > > > >http://www.multiprecision.org/index.php?prog=mpfrcx
>
> > > > > On Apr 28, 3:53 am, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > > You can access the code here:
>
> > > > > >http://selmer.warwick.ac.uk/gitweb/flint2.git?a=shortlog;h=refs/heads...
>
> > > > > > You need the latest MPIR release candidate, the latest MPFR and 
> > > > > > either
> > > > > > an x86 or x86_64 machine to run it.
>
> > > > > > set the liib and include paths in the top level makefile, set the
> > > > > > LD_LIBRARY_PATH's in flint_env, then do:
>
> > > > > > source flint_env
> > > > > > make library
> > > > > > make check MOD=mpfr_poly
>
> > > > > > Most of the relevant code is in the mpfr_poly directory.
>
> > > > > > If it doesn't work for you, please understand this is development 
> > > > > > code
> > > > > > only, therefore probably broken in many ways.
>
> > > > > > Information on the Fast Hartley Transform can be found in Joerg
> > > > > > Arndt's fxtbook:
>
> > > > > >http://www.jjj.de/fxt/fxtbook.pdf
>
> > > > > > Bill.
>
> > > > > > On Apr 28, 2:45 am, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > > > Well I got the polynomial convolution working with the Fast 
> > > > > > > Hartley
> > > > > > > Transform. It seems to pass my primitive test code.
>
> > > > > > > As an example of a first timing, 1000 iterations of multiplying
> > > > > > > polynomials of length 512 with 100 floating point bits precision 
> > > > > > > per
> > > > > > > coefficient takes 17s.
>
> > > > > > > I think I can make it take about half that time (at least), but 
> > > > > > > that
> > > > > > > is a job for tomorrow.
>
> > > > > > > I haven't looked yet to see how much precision is lost 
> > > > > > > heuristically.
>
> > > > > > > Bill.
>
> > > > > > > On Apr 27, 11:49 pm, Bill Hart <goodwillh...@googlemail.com> 
> > > > > > > wrote:
>
> > > > > > > > Well, I coded up a mini mpfr_poly module and Fast Hartley 
> > > > > > > > Transform in
> > > > > > > > flint2 using mpfr's as coefficients.
>
> > > > > > > > It's hard to know if it is doing the right thing, there's no 
> > > > > > > > test code
> > > > > > > > yet. But it compiles and doesn't segfault. :-)
>
> > > > > > > > A little bit more work required to turn it into a convolution, 
> > > > > > > > but as
> > > > > > > > far as I know, no inverse transform required, as it is its own
> > > > > > > > inverse. Maybe by tomorrow I'll have a fast convolution and 
> > > > > > > > we'll be
> > > > > > > > able to answer some of these questions heuristically.
>
> > > > > > > > It won't be super efficient, no truncation, no radix 4, no 
> > > > > > > > special
> > > > > > > > case for double precision coefficients, no cache friendly matrix
> > > > > > > > fourier stuff, etc, but I did reuse the twiddle factors as much 
> > > > > > > > as
> > > > > > > > possible, I think.
>
> > > > > > > > Bill.
>
> > > > > > > > On Apr 27, 5:18 pm, Bill Hart <goodwillh...@googlemail.com> 
> > > > > > > > wrote:
>
> > > > > > > > > Also see pages 252 and following of Polynomial and Matrix
> > > > > > > > > Computations, Volume 1 by Dario Bini and Victor Pan, which 
> > > > > > > > > seems to
> > > > > > > > > answer your question in detail.
>
> > > > > > > > > Bill.
>
> > > > > > > > > On Apr 27, 4:57 pm, Bill Hart <goodwillh...@googlemail.com> 
> > > > > > > > > wrote:
>
> > > > > > > > > > Numerical stability is not something I have any experience 
> > > > > > > > > > with. I am
> > > > > > > > > > not sure if is is equivalent in some sense to the loss of 
> > > > > > > > > > precision
> > > > > > > > > > which occurs when doing FFT arithmetic using a floating 
> > > > > > > > > > point FFT.
>
> > > > > > > > > > The issue seems to be accumulation of "numerical noise". 
> > > > > > > > > > There are
> > > > > > > > > > proven bounds on how many bits are lost to this as the 
> > > > > > > > > > computation
> > > > > > > > > > proceeds, but the general consensus has been that for 
> > > > > > > > > > "random
> > > > > > > > > > coefficients" the numerical noise is much, much less than 
> > > > > > > > > > the proven
> > > > > > > > > > bounds would indicate.
>
> > > > > > > > > > Zimmermann, Gaudry, Kruppa suggest that the following paper 
> > > > > > > > > > contains
> > > > > > > > > > information about proven bounds for the floating point FFT:
>
> > > > > > > > > > Percival, C. Rapid multiplication modulo the sum
> > > > > > > > > > and difference of highly composite numbers. Math.
> > > > > > > > > > Comp. 72, 241 (2003), 387–395
>
> > > > > > > > > > Paul Zimmermann would almost certainly know what the 
> > > > > > > > > > current State of
> > > > > > > > > > the Art is.
>
> > > > > > > > > > Similar stability issues arise in matrix arithmetic with 
> > > > > > > > > > floating
> > > > > > > > > > point coefficient. It seems that one can always prove 
> > > > > > > > > > something when
> > > > > > > > > > you know something about the successive minima. But I am 
> > > > > > > > > > not too sure
> > > > > > > > > > what the equivalent for an FFT would be. You might be into 
> > > > > > > > > > the area of
> > > > > > > > > > research rather than something that is known, i.e. it might 
> > > > > > > > > > just be
> > > > > > > > > > conventional wisdom that the FFT behaves well.
>
> > > > > > > > > > The people to ask about this are definitely Joris van der 
> > > > > > > > > > Hoeven and
> > > > > > > > > > Andreas Enge. If you find a very nice algorithm, let me 
> > > > > > > > > > know and we'll
> > > > > > > > > > implement it in flint2, which should have a type for 
> > > > > > > > > > polynomials with
> > > > > > > > > > floating point coefficients (using MPFR).
>
> > > > > > > > > > Bill.
>
> > > > > > > > > > On Apr 27, 2:45 pm, Jason Grout 
> > > > > > > > > > <jason-s...@creativetrax.com> wrote:
>
> > > > > > > > > > > On 04/26/2010 10:54 PM, Robert Bradshaw wrote:
>
> > > > > > > > > > > > I should comment on this, as I wrote the code and 
> > > > > > > > > > > > comments in question.
> > > > > > > > > > > > There actually is a fair amount of research out there 
> > > > > > > > > > > > on stable
> > > > > > > > > > > > multiplication of polynomials over the real numbers, 
> > > > > > > > > > > > but (if I remember
> > > > > > > > > > > > right, it was a while ago) there were some results to 
> > > > > > > > > > > > the effect that no
> > > > > > > > > > > > asymptotically fast algorithm has good stability 
> > > > > > > > > > > > properties over all
> > > > > > > > > > > > possible coefficients.
>
> > > > > > > > > > > If you have a moment and if you remember, could you point 
> > > > > > > > > > > out a
> > > > > > > > > > > reference or two?  I searched for several hours before 
> > > > > > > > > > > posting, and
> > > > > > > > > > > couldn't find anything that seemed to address the 
> > > > > > > > > > > question of stability
> > > > > > > > > > > satisfactorily.
>
> > > > > > > > > > > > Of course, in practice one often runs into "well
> > > > > > > > > > > > behavied" polynomials in R[x]. (In particular, most of 
> > > > > > > > > > > > the interest is,
> > > > > > > > > > > > understandably, about truncated power series, though 
> > > > > > > > > > > > there are other
> > > > > > > > > > > > uses.) The obvious thing to do is a (non-discrete) FFT. 
> > > > > > > > > > > > What I was
> > > > > > > > > > > > thinking about implementing (though I never got back to 
> > > > > > > > > > > > it) is something
> > > > > > > > > > > > that works like this:
>
> > > > > > > > > > > > 1) Choose some a and rescale f(x) -> f(alpha*x) so all 
> > > > > > > > > > > > coefficients have
> > > > > > > > > > > > roughly the same size.
> > > > > > > > > > > > 2) Multiply the resulting polynomial over Z using the 
> > > > > > > > > > > > insanely fast code
> > > > > > > > > > > > in FLINT. The resulting answer will be exact product of 
> > > > > > > > > > > > the truncated
> > > > > > > > > > > > input, and the precision loss (conversely, required 
> > > > > > > > > > > > precision for no
> > > > > > > > > > > > loss) was fairly easy to work out if I remember right .
> > > > > > > > > > > > 3) Cast the result back into R[x].
>
> > > > > > > > > > > Thanks!
>
> > > > > > > > > > > Jason
>
> > > > > > > > > > > --
> > > > > > > > > > > To post to this group,
>
> ...
>
> read more »

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: numerically stable fast univariate polynomial multiplication over RR[x]

Reply via email to