At 04:53 PM 6/28/02 -0500, Jeremy Blosser wrote:

>The Fastest Fourier Transform in the West (FFTW) http://www.fftw.org/
>has an implementation of a DWT using NTT that you can check out. The NTT
>going to be slower, but I can't remember exactly why. Perhaps it was the
>overhead of the Chinese-remainder theorem you have to apply when you go
>over boundaries or something like that... Someone on the list can point
>you in a better direction I'm sure...

NTTs are slower for two major reasons: first, small floating point FFTs have
simple "twiddle factors", and building a big FFT from these small FFTs
will reduce the amount of arithmetic you have to do. NTTs are different;
the prime you pick for an NTT determines what the twiddle factors look
like, and they're almost never simple, so you can't reduce the amount of
arithmetic like you can with an FFT.

The other big reason is technology based. 99% of the computers in the
world suck at integer multiplication; integer muls have never been a 
priority to computer architects, and NTTs require huge numbers of them.
While most desktop computers can complete one floating point multiply 
every clock cycle, integer multiplication is almost always slower, and
sometimes tremendously slower. It's worse that NTTs use *modular* arithmetic,
and that each modular multiply requires several integer multiplies. And
because NTTs can't be simplified like FFTs can, you can't reduce the number
of modular multiplies by modifying the NTT structure.

Basically, unless you use an Itanium or an Alpha ev6, an NTT is doomed to
be at best 3-5x slower than a corresponding FFT.

SPEC is accepting submissions for their next generation of benchmark suite...
does anyone have an idea for a benchmark that will reward manufacturers who
pay attention to multiplies?

jasonp
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to