Recently I've been translating Prime95's P-1 limit-choosing algorithm
into another language for a P-1 utility of mine. Let me offer some
further explanation. (George, please point out any errors!)
Inputs to the algorithm: exponent of the Mersenne number to be factored,
how far (as a power of 2) trial factoring of that Mnumber has reached,
whether the P-1 effort precedes a first-time LL test or a double-check,
how much memory is to be available during P-1 factoring, plus
empirically-derived constants and fudge factors in various formulas.
The algorithm is independent of specific systems or CPUs. That is, it
does not consider or estimate actual time needed to perform an
operation, or consider a given CPU's relative performance on integer vs.
floating-point instructions. Its basic unit of measure for the cost of
a procedure is the FFT "squaring" (= a transform, a convolution, then
another transform). For GCDs, which do FFT multiplications that are not
squarings, the algorithm estimates the number of FFT transforms needed
by the GCD, then divides by two to get the equivalent number of
squarings.
If a P-1 factoring run with particular B1 and B2 bounds will require
175,000 FFT squarings in stage 1, a GCD that will take as much time as
3,000 FFT squarings, and 122,000 FFT squarings in stage 2, the algorithm
considers that P-1 run's total cost to be 300,000 -- it is assumed that
the time required for other parts of the run can be neglected in
comparison.
(George -- Doesn't a two-stage P-1 require two GCDs, not just one? And
it looks like some of the cost comparisons don't include even one GCD
... was that a deliberate omission, or am I overlooking something?)
Dickman's function (see Knuth Vol. 2, or previous discussion on this
list) is used to calculate the probability of the P-1 procedure's
finding a factor.
Perhaps you've wondered what role the how-far-trial-factored input
plays. Well, if a Mnumber has been trial-factored to 2^63 (with no
success, or else we wouldn't be considering P-1 now), then P-1 will not
find any factors of 63 bits or less. But if the Mnumber had been
trial-factored to only 2^57, then there would still be some chance of
finding a factor between 2^57 and 2^63 during a P-1 run, in addition to
the chance of finding a factor above 2^63. The algorithm takes this
into account.
Continuing the example from above -- If that 300,000-squaring P-1 run
has a 1.25% probability of finding a factor, then it would balance an
L-L test costing 300,000/.0125 = 24,000,000. L-L tests are considered
to cost 1.015 times their number of FFT squarings, to balance the
historically observed 1.5% rate of erroneous L-L results, so an L-L test
on an exponent of, say, 22,000,000 would be considered to cost
22,000,000 * 1.015 = 22,330,000 squarings. In this case, the P-1
limit-choosing algorithm would decide that a P-1 cost of 300,000 with
probability of 0.0125 would not be worthwhile on an exponent of
22,000,000, but would be worth a try on an exponent of, say, 25,000,000
because 300,000/.0125 is less than 25,000,000 * 1.015.
The above example is somewhat backwards, though, because the algorithm
actually considers only one given exponent's L-L cost versus the costs
and probabilities for several combinations of B1 and B2 bounds, and
picks out the best combination of P-1 cost and probability for that
exponent.
Prime95 considers B1 values starting at 10,000 then going up in
increments of 5,000. For each B1, Prime95 considers B2 values starting
with B2 = B1, then going up in increments of B1*0.25. So, for B1 =
10,000, Prime95 calculates costs and probabilities for B2 = 10,000,
12,500, 15,000, 17,500, and so on. Then for B1 = 15,000, it considers
B2 = 15,000, 18,750, 22,500, 26,250, 30,000, and so on.
When does the algorithm stop incrementing B1 and B2 values it examines?
Either (1) when the B2 stage would require too much memory (one of the
algorithm inputs was how much memory is to be available during P-1
factoring), or (2) when the relative worth of the P-1 run falls below
90% of the most worthy P-1 bounds combination so far considered. By
"relative worth" and "worthy", I mean the amount by which [L-L cost
multiplied by probability of P-1 success] exceeds [P-1 cost]. The shape
of Dickman's function guarantees that criterion (2) is eventually
satisfied at some increment of both B1 and B2.
Where the first-time/double-check matters: For a double-check, the L-L
cost is as illustrated in the example above. For a first-time L-L, the
L-L cost against which P-1 cost is compared is twice that value -- i.e.,
the algorithm multiplies the exponent by 1.015 * 2 = 2.03 to get the
number of L-L squarings. That's because a successful P-1 factoring
before a first-time test will eliminate the need for both the first-time
and double-check L-L tests rather than only the double-check L-L test.
Formulas for calculating the numbers of FFT squarings req