Bill Daly wrote:

> This is very interesting. I did some experiments using q = 2^64 - 2^32 +
> 1 

Yes that is exactly the prime I am using.

> to see how a multiply mod q would stack up against a complex floating
> point multiply, and I found that on a 266Mhz Pentium II the
> multiplication mod q was about 15% faster. Unfortunately, I have lost
> both the code and the results (my laptop was stolen), but they are
> easily reconstructed. I didn't follow up on it because I couldn't see
> how to integrate the DWT with an integer FFT, but if you have found a
> way, then maybe it is in fact possible to implement an integer version
> of LL which is competitive with prime95.

It is all to do with the properties of p.  You've discovered already that you
can only fourier transform with lengths that divide into p-1.

Factors of p-1 are 2^32 * 3 * 5 * 17 * 257 * 65537

This means that practically an FFT can be defined of length up to 2^32 with
optional factors of 3 and 5.

If our transform length is n then to do a Discrete Weighted Transform we need
an n-th root of 2.  2 has order 192 mod p (ie 2^192 mod p = 1) so we can have

(p-1)/192 = (2^58 - 2^26) / 3 = 2^26 * 5 * 17 * 257 * 65537 th root of 2.

This means that we can do the DWT for lengths up to 2^26 with an optional
factor of 5

This is extremely fortunate, I did extensive searches looking for primes
which satisfy both the FFT and the DWT constraints and this was the only one
with decent properties (for FFT, DWT and mod p with no division).  It would
have been nice to have a factor of 3 but you can't have everything!

Some other random facts :-

  7 is a primitive root mod p
  
  An n-th root of unity can be generated by 7^(5*(p-1)/n) mod p.
  
  An n-th root of two can be generated by 7^(5*(p-1)/192/n) mod p
  
  So a suitable 5 * 2^26-th root of 1 is 0xED41D05B78D6E286 and the 5 *
  2^26-th root of 2 is 0xC47FC73D33F80E14
  
  Note that a 64-th root of unity is 8 mod p.

As for how you do the DWT, once you've got your n-th roots of 1 and 2 you
just proceed exactly as per the floating point version just in the mod p
field rather than the complex field.

Peter-Lawrence Montgomery helped me a great deal working this all out.  He
also worked out some great short cuts for doing 128 bit and 64 bit reductions
mod p.  I turned all that into lean ARM assembler!

> The particular form of q is what makes reduction mod q efficient.
[snip]

Note that if you choose your root of unity so that the 64-th root of unity is
8 then you can convert lots of the primitive operations into shifts rather
than multiplies.  This is a big win on the StrongARM.

> It is of course fortuitous that q is prime.

Fortuitous isn't the word I would have used, it took many days of CPU time
for me to fix on this prime!

Hopefully I will have the time to tidy up the code and release it to the
world soon...

-- 
Nick Craig-Wood
[EMAIL PROTECTED]
http://www.axis.demon.co.uk/

Reply via email to