On Thu, Jun 17, 1999 at 11:09:12PM +0100, Brian J. Beesley wrote:
> When you do your NTT, you're going to need at least twice as many
> bits in the elements of the transform as there are bits in the number
> you're testing (because you're going to want to square the values in
> the elements, without any bits falling off the more significant end).
> If you're working into millions of bits, I think this forces you to
> use (at least) 64-bit elements. That scuppers any plans to use MMX
> instructions.
That is correct - for any reasonable length FFT or NTT you will need
at least 64 bit elements.
You can synthesise these elements by doing two or three 32 bit
transforms and combining them with the chinese remainder theorem. I
experimented with this on the ARM and came to the conclusion that
doing it like this was slower because the operation count was larger.
However if there is a really significant speed up by using the MMX
instructions then it may be practical to combine these single
precision NTTs.
--
Nick Craig-Wood
[EMAIL PROTECTED]
http://www.axis.demon.co.uk/
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm