Mersenne: Optimal choice of E in P-1 computations

2003-03-09 Thread Daran
Some time ago I raised the question on this list, whether the client's
choice of the E parameter was optimal in P-1 calculations.  I gave a
somewhat handwavy argument in support of the claim that IF it is worthwhile
choosing E=4 over E=2, i.e., if the benefits in additional factors found
outweigh the cost in extra processing time[1], THEN it should also be
worthwhile choosing E=6, and maybe even E=8 or E=12.  I also argued on
empirical grounds that, for D=420 at least, E=4 and E=12 would need to yield
roughly 2% and 10% respectively more stage 2 factors than E=2 for this to be
worthwhile.[2]

From a theoretical point of view, Peter Montgomery's thesis, as suggested by
Alex Kruppa, is clearly relevant.  (I don't have the URL to hand, but
someone is sure to post it.) Unfortunately it's somewhat beyond my
mathematical ability to grasp.  Therefore, I have concentrated on attempting
to collect empirical evidence, as follows.

I have done a great many P-1s of doublecheck assignments with E=4.  Out of
the 35 stage 2 factors found[3], 1 was 'extended', i.e, would not have been
found using E=2.

In the hope of more quickly collecting data, I have also redone, to 'first
time test' limits, every entry in pminus1.txt which had previously done to
B1=B2=1000, 2000, and 3000.  For these exponents, all in the 1M-3M ranges,
the client was able to choose a plan with E=12.  Unfortunately, I found far
fewer factors in either stage 1 or stage 2 than I would expect, which
suggests to me that exponents in this range have had additional factoring
work (possibly ECM) not recorded in the file.  Of particular concern is the
possibility that in addition to reducing the number of factors available for
me to find, it may have upset the balance between 'normal' and 'extended'
P-1 factors - the very ratio I am trying to measure.  Consequently I am
inclined to exclude these results, though I report them for completeness:

Of the 10 stage 2 factors found, 2 were extended.  They are:-

P-1 found a factor in stage #2, B1=2, B2=395000.
UID: daran/1, M1231753 has a factor: 591108149622595096537
591108149622595096537-1 = 2^3*3*11*743*2689*909829*1231753

P-1 found a factor in stage #2, B1=3, B2=547500.
UID: daran/1, M2008553 has a factor: 9050052090266148529
9050052090266148529-1 = 2^4*3^2*7*71*79*796933*2008553

Finally, with George's permission, I have done a small number of P-1s of
doublechecking assignments with a client modified to use D=420, E=12 - a
plan not available with the standard clients.  So far, I have found only one
stage 2 factor, which was not extended.  I will continue to search for more.

Of particular interest with E=12 extended factors, is whether they would
have been found with a lower value of E.  E=12 will find all factors that
E=4 and E=6 would have found, and some not found by any lower E.  My
handwavy argument predicted that E=6 should yield on average twice as many
extended factors than E=4.  I'm hoping that someone (Alex Kruppa?) might
have a tool to analyse extended factors to determine their minimal E.  If
not, I will write one.

In conclusion, the evidence I have been able to gather, though statistically
insignificant, does not tend to exclude the hypothesis that a higher E would
be worthwhile.

[1]There is also a memory cost, but this is low in comparison with the costs
associated with the D parameter.  For example, for an exponent in the
7779000-9071000 range, in which I am working, D=420, E=4 consumes 446MB, and
because of the client's conservative programming, 465MB must be 'allowed'
before it will choose this plan.  The next level down is D=210, E=4 which
requires 299MB.  Using the modified client with E=12 adds an extra 37MB to
these requirements, which is memory available and going spare if the amount
allowed is between about 350MB and 465MB.

Another way to look at this is to say that there is no memory cost
associated with increasing E for a given value of D.  The memory is either
available, or it is not.

[2]Assuming that the current algorithm for determining optimal B1 and B2
values are accurate, and that this routine would be modified to make it
aware of the costs and benefits of differing values of E.

[3]This total includes both prime components of a composite factor found in
a single P-1 run, since neither was extended.

Regards

Daran
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers


Re: Mersenne: Optimal choice of E in P-1 computations

2003-03-09 Thread Brian J. Beesley
On Sunday 09 March 2003 12:24, Daran wrote:

 In the hope of more quickly collecting data, I have also redone, to 'first
 time test' limits, every entry in pminus1.txt which had previously done to
 B1=B2=1000, 2000, and 3000.  For these exponents, all in the 1M-3M ranges,
 the client was able to choose a plan with E=12.  Unfortunately, I found far
 fewer factors in either stage 1 or stage 2 than I would expect, which
 suggests to me that exponents in this range have had additional factoring
 work (possibly ECM) not recorded in the file.

1) What about factors which would be found with your P-1 limits but happened 
to fall out in trial factoring? (In fact a lot of the smaller exponents - 
completed before P-1 was incorporated in the client - seem to have been trial 
factored beyond the ecomonic depth.) In any case, if you're using very 
small values of B1  B2, I would _expect_ that a very high percentage of the 
accessible factors will be found during normal trial factoring.

2) It would not surprise me at all to find that there is a substantial amount 
of P-1 work being done which is not recorded in the database file. I've also 
had very bad luck when extending P-1 beyond limits recorded in the database 
file for exponents under 1 million. Eventually I gave up.

3) ECM stage 2 for exponents over 1 million takes a serious amount of memory 
(many times what P-1 can usefully employ), whilst running ECM stage 1 only is 
not very efficient at finding factors - lots of the power of ECM comes from 
the fact that stage 2 is very efficient (assuming you can find memory!)

 Of particular concern is the
 possibility that in addition to reducing the number of factors available
 for me to find, it may have upset the balance between 'normal' and
 'extended' P-1 factors - the very ratio I am trying to measure. 

One way to deal with this would be to deliberately forget previously reported 
work, i.e. take _all_ the prime exponents in the range you're interested in, 
trial factor to taste then run P-1. This way you can be sure that, though the 
vast majority of the factors you will find are rediscoveries, the 
distribution of the factors you find is not distorted by unreported negative 
results.

Regards
Brian Beesley
_
Unsubscribe  list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers