Bill Daly wrote:
> This is very interesting. I did some experiments using q = 2^64 - 2^32 +
> 1
Yes that is exactly the prime I am using.
> to see how a multiply mod q would stack up against a complex floating
> point multiply, and I found that on a 266Mhz Pentium II the
> multiplication mod q was about 15% faster. Unfortunately, I have lost
> both the code and the results (my laptop was stolen), but they are
> easily reconstructed. I didn't follow up on it because I couldn't see
> how to integrate the DWT with an integer FFT, but if you have found a
> way, then maybe it is in fact possible to implement an integer version
> of LL which is competitive with prime95.
It is all to do with the properties of p. You've discovered already that you
can only fourier transform with lengths that divide into p-1.
Factors of p-1 are 2^32 * 3 * 5 * 17 * 257 * 65537
This means that practically an FFT can be defined of length up to 2^32 with
optional factors of 3 and 5.
If our transform length is n then to do a Discrete Weighted Transform we need
an n-th root of 2. 2 has order 192 mod p (ie 2^192 mod p = 1) so we can have
(p-1)/192 = (2^58 - 2^26) / 3 = 2^26 * 5 * 17 * 257 * 65537 th root of 2.
This means that we can do the DWT for lengths up to 2^26 with an optional
factor of 5
This is extremely fortunate, I did extensive searches looking for primes
which satisfy both the FFT and the DWT constraints and this was the only one
with decent properties (for FFT, DWT and mod p with no division). It would
have been nice to have a factor of 3 but you can't have everything!
Some other random facts :-
7 is a primitive root mod p
An n-th root of unity can be generated by 7^(5*(p-1)/n) mod p.
An n-th root of two can be generated by 7^(5*(p-1)/192/n) mod p
So a suitable 5 * 2^26-th root of 1 is 0xED41D05B78D6E286 and the 5 *
2^26-th root of 2 is 0xC47FC73D33F80E14
Note that a 64-th root of unity is 8 mod p.
As for how you do the DWT, once you've got your n-th roots of 1 and 2 you
just proceed exactly as per the floating point version just in the mod p
field rather than the complex field.
Peter-Lawrence Montgomery helped me a great deal working this all out. He
also worked out some great short cuts for doing 128 bit and 64 bit reductions
mod p. I turned all that into lean ARM assembler!
> The particular form of q is what makes reduction mod q efficient.
[snip]
Note that if you choose your root of unity so that the 64-th root of unity is
8 then you can convert lots of the primitive operations into shifts rather
than multiplies. This is a big win on the StrongARM.
> It is of course fortuitous that q is prime.
Fortuitous isn't the word I would have used, it took many days of CPU time
for me to fix on this prime!
Hopefully I will have the time to tidy up the code and release it to the
world soon...
--
Nick Craig-Wood
[EMAIL PROTECTED]
http://www.axis.demon.co.uk/