On 7 Feb 2002, at 15:02, Bruce Leenstra wrote:

> All,
> 
> What this list needs right now is a nice juicy math debate, so here
> goes: I was reading the faq about P-1 factoring, and it talks about
> constructing a 'q' that is the product of all primes less than B1
> (with some multiples?) And then I read the "skip 2, test 2" comment
> about trial factoring and wondered:
> 
> Would it save any time to literally "skip 2, test 2" in trial
> factoring, by multiplying the two factors? (Does trial factoring use
> the FFT to do its squaring? The comment about breaking it into 32 bit
> chunks implies not. But if so, the modulo is free and this question is
> questionable : )

Ummm - there is a _big_ difference here.

The candidate factors for trial factoring are of the order of 2^64. For 
numbers of this sort of size, FFT is actually very expensive in 
comparison with "schoolboy long multiplication". The fact that we're 
always squaring allows us to get away with 3 inner products (the 
expensive operation) on a double-length calculation. Also, the 
modulus operation for trial factoring is modulo the candidate factor - 
not modulo 2^p-1 - this makes the "free modulus operation" totally 
irrelevant.

On a 32-bit system, when candidate factors are < 2^64, testing two 
together would make each squaring cost 10 inner products instead 
of 2 x 3, so it would clearly be expensive. For candidate factors up 
to 2^80, the cost would be 15 inner products instead of 2 x 6, so 
again testing two at once would be more expensive. This doesn't 
even start to take into account pressure on registers, which would 
slow the longer method down even more by forcing intermediate 
results out to memory. The other operations involved are at best 
proportional to the working length, so there's nothing to get back 
from those.

The other point is that it's rare that we'd want to test both 
candidates in a pair. Usually one or the other would be sieved out 
rather than tested; the proper factors we're looking for are 
themselves prime, so we can straight away exclude any 
candidates which are themselves divisible by a small prime.
> 
> Right now Prime95 constructs a list of possible factors, filters it
> (before hand, or on-the-fly) and then uses the powering algorithm to
> calculate 2^p - 1 (mod q) for each factor.  If p is 24 bits, the
> powering algorithm requires 24 squarings, from 1 to 24 doublings, and
> 24 comparisons -- resulting in up to 8 to 10 modulo's (not sure what
> the average is).

Eh?

We can straight away "write down" 2^n mod q when 2^n < q. This 
saves about 6 whole loops through the code.

However we need to do a full modulo operation after each squaring.
(Unless we're prepared to let the numbers double in length, which  
is inefficient, as I've demonstrated above). Doubling doesn't need a 
full modulo operation, compare magnitude & subtract is sufficient. 
Comparison of two multi-precision numbers is _very_ cheap 
compared with multiplying, taking modulus or even adding - in fact, 
just one single length compare is often sufficient.

One thing that did occur to me is that, at 2^64 where we're forced 
from double to triple word operations, with a consequent efficiency 
loss, it might be worth deepening the sieveing so that more 
candidates are rejected without having to be explicitly tried as 
factors. The deeper sieveing would cost more, but rejecting more 
candidates would regain that expense, up to a certain point.

Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to