On 8 Jul 99, at 6:19, Lucas Wiman wrote:

> In the book _Primes and Programming_ Head's method of multiplying
> two numbers mod n is mentioned.  Is this actually more effiecient
> than simply multiplying the two numbers and taking the modulus?

Look at it this way. Head's method is essentially binary long 
multiplication, with the "wrinkle" that you take the remainder modulo 
n after each shift & add operation. The remaindering operation here 
is very simple, as all you have to do is to subtract n if the result 
>= n. Nevertheless you're into a _lot_ of operations, involving badly-
aligned data (which means doing lots of multi-precision shifts - 
which are slow because of dependency chains - also the "subtract n if 
>= n" is slow because the branch is hard to predict).

Conversely, multiplying the whole number out & then taking the 
modulus once _may_ be inefficient, because you will probably be 
forced out of registers & have to do a lot of operations in memory.

My factoring code compromises. It does long multiplication 32 bits at 
a time, creating a 128-bit product from a 96-bit by 32-bit 
multiplication operation, then reduces this modulo n leaving a result 
with at most 95 significant bits. The shifts involved are 32-bit 
shifts i.e. an address offset costing no CPU time. I _do_ have to 
reduce modulo n 3 times, but the multiplications actually dominate  
the time, and I can get away with using only the limited registers in 
the x86 processor plus four cache lines worth of memory (including 
storage for factor, remainder, etc.).

I suppose you could call this a variant of Head's method ... note 
that I get only 95 bits from a 96 bit field in the same way that Head 
gets only (n-1) bits from an n bit field, you lose one bit precision 
because you have to add together the partial results of the 
multiplication after taking the modulus. Nevertheless Head's method 
is useful, especially if you're working in a language like Java that 
has limited support for multiprecision arithmetic, since it allows 
you to do arithmetic modulo n for n up to 2^30 even if you have only 
32-bit signed arithmetic available.

There are more time-saving tricks employed, like "early exit" from 
the whole stage when a partial multiplier is zero, and saving the 
partial products thereby reducing the number of 32-bit by 32-bit 
multiplies required from 9 to 6. 

> 
> If so, is it implemented in the various mersenne factoring programs
> in use?

I caught onto this idea when I was trying to do 2^p mod f as fast as 
possible for 2^64 < f < 2^80 - and got the extension to 2^95 "for 
free". I found it to be 10% - 20% faster than multiplying out the 
whole product (i.e. calculating x^2) then taking the modulus, and at 
least 10 times as fast as a straightforward implementation of Head's 
method. It will probably not be optimal for other size factors or for 
other processor architectures. I did, however, find that I got the 
best performance doing the job this way on Intel Pentium / PPro type 
CPUs for factors a little larger than 2^64 - which was the task 
briefed.


Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

Reply via email to