Re: Mersenne: Head's algorithm for multiplying mod n

1999-07-09 Thread Jud McCranie

At 06:51 PM 7/9/99 +0100, Brian J. Beesley wrote:

>For reasonably small multi-precision numbers, Head's method is 
>actually very good, if you're working on a true RISC processor with 
>no integer multiply instruction.

I started using Head's algorithm in 1981 on my Apple II.  It was better
than triple- and quad-precision routines.  It has propagated through my
software, and I've assumed that it is still preferable.  I probably need to
reevaluate it on a Pentium.  
+--+
| Jud "program first and think later" McCranie |
+--+



Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Head's algorithm for multiplying mod n

1999-07-09 Thread Brian J. Beesley

On 8 Jul 99, at 23:33, Lucas Wiman wrote:

> >That is going to be a *lot* slower than FFT convolution, for numbers the size
> >of the Mersenne numbers we're testing! FFT is O(n*log(n)) where n is the 
> >number of limbs in the numbers being multiplied. Head's method is O(n^2), 
> >with O being slightly smaller than for straight long multiplication. A 
> >recursive method splitting the number in the middle, then doing a subtraction 
> >trick to make three instead of four submultiplies, is O(n^y), where y is 
> >log(3)/log(2), which is about 19/12.

This is all true for numbers with many times the number of bits that 
can be stored in a computer register. However, find a recursive 
method for triple-precision arithmetic that beats the "classic" long 
multiplication method, and you'll definitely have earned whatever 
carrot may be dangled!
> 
> Well, I wasn't talking about the massive scale of numbers used in LL tests.
> I was speaking of the < 95 bit integers used in factoring, and using Head's
> algorithm in the calculation of 2^p mod f.  I doubt that FFT would be
> worthwhile here, and would probably slow it down considerably.

Too right!

> Chris Nash writes:
> >But enough already. One thing I'm surprised nobody has mentioned yet - is it
> >even in Lucas' FAQ? - is that modular reduction in the Mersenne case is
> >sinfully inexpensive. Even with a pure integer, no FFT implementation,
> >reduction mod 2^p-1 is very quick - take the lower p bits, take the rest of
> >the number shifted down by p bits, add them together. The size of the
> >original number is usually around 2p bits, so one iteration of the
> >shift-and-add will get the result correct to within 1 extra subtraction at
> >most.

This is also true, but we don't (often) want to check if 2^p-1 is a 
factor of anything ... in this case, we're trying to find factors of 
2^p-1.

For reasonably small multi-precision numbers, Head's method is 
actually very good, if you're working on a true RISC processor with 
no integer multiply instruction.

Regards
Brian Beesley

Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Head's algorithm for multiplying mod n

1999-07-08 Thread Lucas Wiman

Pierre Abbat writes:
>>> In the book _Primes and Programming_ Head's method of multiplying
>>> two numbers mod n is mentioned.  Is this actually more efficient
>>> than simply multiplying the two numbers and taking the modulus?
>>
>> Look at it this way. Head's method is essentially binary long
>> multiplication, with the "wrinkle" that you take the remainder modulo
>> n after each shift & add operation.

>That is going to be a *lot* slower than FFT convolution, for numbers the size
>of the Mersenne numbers we're testing! FFT is O(n*log(n)) where n is the 
>number of limbs in the numbers being multiplied. Head's method is O(n^2), 
>with O being slightly smaller than for straight long multiplication. A 
>recursive method splitting the number in the middle, then doing a subtraction 
>trick to make three instead of four submultiplies, is O(n^y), where y is 
>log(3)/log(2), which is about 19/12.

Well, I wasn't talking about the massive scale of numbers used in LL tests.
I was speaking of the < 95 bit integers used in factoring, and using Head's
algorithm in the calculation of 2^p mod f.  I doubt that FFT would be
worthwhile here, and would probably slow it down considerably.

Chris Nash writes:
>But enough already. One thing I'm surprised nobody has mentioned yet - is it
>even in Lucas' FAQ? - is that modular reduction in the Mersenne case is
>sinfully inexpensive. Even with a pure integer, no FFT implementation,
>reduction mod 2^p-1 is very quick - take the lower p bits, take the rest of
>the number shifted down by p bits, add them together. The size of the
>original number is usually around 2p bits, so one iteration of the
>shift-and-add will get the result correct to within 1 extra subtraction at
>most.

I think I might split off the parts about modular arithmetic into their
own section, or in the beginning stuff section.  I'll probably add this
question sometime tomorrow.

-Lucas Wiman

Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Head's algorithm for multiplying mod n

1999-07-08 Thread Pierre Abbat

> Limbs? It is good to know that the world has many different literal meanings
> in many languages for "bits" - variety is good for us all. (The French word
> for "buffer" is also, I seem to remember, rather amusing).

"Limb" is the term used in gmp for a digit in a large base (such as 2147483648)
in which a number is represented in a multiple-precision package.

phma

Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Head's algorithm for multiplying mod n

1999-07-08 Thread Chris Nash

Hi folks

> > > In the book _Primes and Programming_ Head's method of multiplying
> > > two numbers mod n is mentioned.  Is this actually more effiecient
> > > than simply multiplying the two numbers and taking the modulus?
> > Look at it this way. Head's method is essentially binary long
> > multiplication, with the "wrinkle" that you take the remainder modulo
> > n after each shift & add operation.
> That is going to be a *lot* slower than FFT convolution, for numbers the
size
> of the Mersenne numbers we're testing! FFT is O(n*log(n)) where n is the
number
> of limbs in the numbers being multiplied.

Limbs? It is good to know that the world has many different literal meanings
in many languages for "bits" - variety is good for us all. (The French word
for "buffer" is also, I seem to remember, rather amusing).

But enough already. One thing I'm surprised nobody has mentioned yet - is it
even in Lucas' FAQ? - is that modular reduction in the Mersenne case is
sinfully inexpensive. Even with a pure integer, no FFT implementation,
reduction mod 2^p-1 is very quick - take the lower p bits, take the rest of
the number shifted down by p bits, add them together. The size of the
original number is usually around 2p bits, so one iteration of the
shift-and-add will get the result correct to within 1 extra subtraction at
most.

As for the FFT implementation, modular reduction is less than inexpensive -
it is virtually free - in fact, it makes use of a fact that means the
algorithm can be a large factor faster than "arbitrary" arithmetic in the
Mersenne case. In a standard FFT multiplication, the numbers have to be
buffered with a large "dead zone" of zeroes - this is due to the fact that
the FFT algorithm works with a periodic signal, so data bits "wrap around"
from either end unless you have a large enough zero-buffer. The Mersenne
method uses it to it's advantage - the bits 2^p and above *should* wrap onto
the least-significant bits as part of the modular reduction. With a little
adjustment (using a non-rational arithmetic base so 2^p is in fact a power
of 2 power of the base), the modular reduction then becomes a desirable, and
free, side-effect of the multiply algorithm - not to mention, the FFT buffer
is half as large, and so calculations are implicitly faster.

Chris Nash
Lexington KY
UNITED STATES
=
Still a co-discoverer of the largest known *non*-Mersenne prime.



Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Head's algorithm for multiplying mod n

1999-07-08 Thread Jud McCranie

At 08:11 PM 7/8/99 -0400, Pierre Abbat wrote:

>That is going to be a *lot* slower than FFT convolution, for numbers the size
>of the Mersenne numbers we're testing! 

Head's algorithm is for getting x*y mod n when 0<=x,yM, where M is the largest integer you can store in a format
native to the computer.  I don't think it applies for Mersenne testing, but
it helps a lot with Fermat-type tests (pseudoprimes, etc).

+--+
| Jud "program first and think later" McCranie |
+--+



Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Head's algorithm for multiplying mod n

1999-07-08 Thread Pierre Abbat

On Thu, 08 Jul 1999, Brian J. Beesley wrote:
> On 8 Jul 99, at 6:19, Lucas Wiman wrote:
> 
> > In the book _Primes and Programming_ Head's method of multiplying
> > two numbers mod n is mentioned.  Is this actually more effiecient
> > than simply multiplying the two numbers and taking the modulus?
> 
> Look at it this way. Head's method is essentially binary long 
> multiplication, with the "wrinkle" that you take the remainder modulo 
> n after each shift & add operation. 

That is going to be a *lot* slower than FFT convolution, for numbers the size
of the Mersenne numbers we're testing! FFT is O(n*log(n)) where n is the number
of limbs in the numbers being multiplied. Head's method is O(n^2), with O
being slightly smaller than for straight long multiplication. A recursive method
splitting the number in the middle, then doing a subtraction trick to make
three instead of four submultiplies, is O(n^y), where y is log(3)/log(2), which
is about 19/12.

phma

Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Head's algorithm for multiplying mod n

1999-07-08 Thread Brian J. Beesley

On 8 Jul 99, at 6:19, Lucas Wiman wrote:

> In the book _Primes and Programming_ Head's method of multiplying
> two numbers mod n is mentioned.  Is this actually more effiecient
> than simply multiplying the two numbers and taking the modulus?

Look at it this way. Head's method is essentially binary long 
multiplication, with the "wrinkle" that you take the remainder modulo 
n after each shift & add operation. The remaindering operation here 
is very simple, as all you have to do is to subtract n if the result 
>= n. Nevertheless you're into a _lot_ of operations, involving badly-
aligned data (which means doing lots of multi-precision shifts - 
which are slow because of dependency chains - also the "subtract n if 
>= n" is slow because the branch is hard to predict).

Conversely, multiplying the whole number out & then taking the 
modulus once _may_ be inefficient, because you will probably be 
forced out of registers & have to do a lot of operations in memory.

My factoring code compromises. It does long multiplication 32 bits at 
a time, creating a 128-bit product from a 96-bit by 32-bit 
multiplication operation, then reduces this modulo n leaving a result 
with at most 95 significant bits. The shifts involved are 32-bit 
shifts i.e. an address offset costing no CPU time. I _do_ have to 
reduce modulo n 3 times, but the multiplications actually dominate  
the time, and I can get away with using only the limited registers in 
the x86 processor plus four cache lines worth of memory (including 
storage for factor, remainder, etc.).

I suppose you could call this a variant of Head's method ... note 
that I get only 95 bits from a 96 bit field in the same way that Head 
gets only (n-1) bits from an n bit field, you lose one bit precision 
because you have to add together the partial results of the 
multiplication after taking the modulus. Nevertheless Head's method 
is useful, especially if you're working in a language like Java that 
has limited support for multiprecision arithmetic, since it allows 
you to do arithmetic modulo n for n up to 2^30 even if you have only 
32-bit signed arithmetic available.

There are more time-saving tricks employed, like "early exit" from 
the whole stage when a partial multiplier is zero, and saving the 
partial products thereby reducing the number of 32-bit by 32-bit 
multiplies required from 9 to 6. 

> 
> If so, is it implemented in the various mersenne factoring programs
> in use?

I caught onto this idea when I was trying to do 2^p mod f as fast as 
possible for 2^64 < f < 2^80 - and got the extension to 2^95 "for 
free". I found it to be 10% - 20% faster than multiplying out the 
whole product (i.e. calculating x^2) then taking the modulus, and at 
least 10 times as fast as a straightforward implementation of Head's 
method. It will probably not be optimal for other size factors or for 
other processor architectures. I did, however, find that I got the 
best performance doing the job this way on Intel Pentium / PPro type 
CPUs for factors a little larger than 2^64 - which was the task 
briefed.


Regards
Brian Beesley

Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Head's algorithm for multiplying mod n

1999-07-08 Thread Jud McCranie

At 06:19 AM 7/8/99 -0400, you wrote:
>All,
>In the book _Primes and Programming_ Head's method of multiplying
>two numbers mod n is mentioned.  Is this actually more effiecient
>than simply multiplying the two numbers and taking the modulus?
>
Yes, because it keeps the numbers smaller.  It was originally: Method from
Multiplication Modulo N, by A. K. Head, Bit 20 (1980) 115-116 



+--+
| Jud "program first and think later" McCranie |
+--+



Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Head's algorithm for multiplying mod n

1999-07-08 Thread Alan Powell

At 06:19 AM 7/8/99 -0400, Lucas Wiman wrote:
>In the book _Primes and Programming_ Head's method of multiplying
>two numbers mod n is mentioned.  Is this actually more effiecient
>than simply multiplying the two numbers and taking the modulus?

No it is actually much less efficient, but you are missing
the point.  In his book Peter Giblin explains why on P93.

Essentially if a and b are say 62-bit integers, to compute

 a * b (mod m)

you generate intermediate values that are up to 124-bits long
which are not generally supported in Basic/Pascal/C/C++.  One
would have to resort to Ubasic or a Large Integer Package.

The beauty of Head's algorithm is that it allows you to compute
the modular product using only or two extra bits.  The example
in the book would allow you to compute the modular product of two
62-bit integers without intermediate values exceeding 64-bits.

As far as I know most Mersenne factoring programs use on some form
of Large Integer Package (LIP) to handle the size of numbers involved.

Regards

Alan Powell



Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm



Mersenne: Head's algorithm for multiplying mod n

1999-07-08 Thread Lucas Wiman

All,
In the book _Primes and Programming_ Head's method of multiplying
two numbers mod n is mentioned.  Is this actually more effiecient
than simply multiplying the two numbers and taking the modulus?

If so, is it implemented in the various mersenne factoring programs
in use?

Thankyou,
Lucas Wiman


Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm