On 7 Feb 2002, at 15:02, Bruce Leenstra wrote: > All, > > What this list needs right now is a nice juicy math debate, so here > goes: I was reading the faq about P-1 factoring, and it talks about > constructing a 'q' that is the product of all primes less than B1 > (with some multiples?) And then I read the "skip 2, test 2" comment > about trial factoring and wondered: > > Would it save any time to literally "skip 2, test 2" in trial > factoring, by multiplying the two factors? (Does trial factoring use > the FFT to do its squaring? The comment about breaking it into 32 bit > chunks implies not. But if so, the modulo is free and this question is > questionable : )
Ummm - there is a _big_ difference here. The candidate factors for trial factoring are of the order of 2^64. For numbers of this sort of size, FFT is actually very expensive in comparison with "schoolboy long multiplication". The fact that we're always squaring allows us to get away with 3 inner products (the expensive operation) on a double-length calculation. Also, the modulus operation for trial factoring is modulo the candidate factor - not modulo 2^p-1 - this makes the "free modulus operation" totally irrelevant. On a 32-bit system, when candidate factors are < 2^64, testing two together would make each squaring cost 10 inner products instead of 2 x 3, so it would clearly be expensive. For candidate factors up to 2^80, the cost would be 15 inner products instead of 2 x 6, so again testing two at once would be more expensive. This doesn't even start to take into account pressure on registers, which would slow the longer method down even more by forcing intermediate results out to memory. The other operations involved are at best proportional to the working length, so there's nothing to get back from those. The other point is that it's rare that we'd want to test both candidates in a pair. Usually one or the other would be sieved out rather than tested; the proper factors we're looking for are themselves prime, so we can straight away exclude any candidates which are themselves divisible by a small prime. > > Right now Prime95 constructs a list of possible factors, filters it > (before hand, or on-the-fly) and then uses the powering algorithm to > calculate 2^p - 1 (mod q) for each factor. If p is 24 bits, the > powering algorithm requires 24 squarings, from 1 to 24 doublings, and > 24 comparisons -- resulting in up to 8 to 10 modulo's (not sure what > the average is). Eh? We can straight away "write down" 2^n mod q when 2^n < q. This saves about 6 whole loops through the code. However we need to do a full modulo operation after each squaring. (Unless we're prepared to let the numbers double in length, which is inefficient, as I've demonstrated above). Doubling doesn't need a full modulo operation, compare magnitude & subtract is sufficient. Comparison of two multi-precision numbers is _very_ cheap compared with multiplying, taking modulus or even adding - in fact, just one single length compare is often sufficient. One thing that did occur to me is that, at 2^64 where we're forced from double to triple word operations, with a consequent efficiency loss, it might be worth deepening the sieveing so that more candidates are rejected without having to be explicitly tried as factors. The deeper sieveing would cost more, but rejecting more candidates would regain that expense, up to a certain point. Regards Brian Beesley _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers