Brian J. Beesley writes:

   I put a development version on my anon ftp server about three days 
   ago. ftp://lettuce.edsc.ulst.ac.uk/gimps/DecaMega/factor95.c

Um, that's an unfortunate choice of name; George's P-1 factorer is
"Factor95" (though there's a newer version, renamed to Factor98).  Of
course, George's "95" was for the year he wrote it, I think, rather
than the number of bits, which I'd guess is your reason for that
name.:)

   This is the 95-bit-capable code I gave George about early April. It
   isn't fully optimised, it needs to be turned into MASM source,
   properly aligned & have removed the horrendous & slow kludge used
   to work around a serious MS VC++ inline assembler bug.

   The 63-bit-capable code I'm using is simply hacked down from this
   in an obvious way. I didn't even bother to look for any more
   optimization which would be possible as a result of the various
   temporary results being one 32-bit word shorter.

And here I thought it was quite fast enough already.:)

   The sieving I'm messing about with is embedded in the "engine" that 
   generates possible factors, rather than the factoring code itself.

Interesting.  That is also, as I recall, how Peter-Lawrence
Montgomery's trial factoring program does this.  Mersfacgmp does the
sieving and factoring in the same loop, which probably costs it quite
a bit in terms of locality in comparison.

   Well, against my code is that it stops as soon as it finds one factor 

So does George's (in each sieve pass), but not mersfacgmp (unless it's
told to on the command line) nor (I think) PLM's program, triald.

   Quite frequently one finds that the first factor one tries works
   (if p = 3 mod 4 and 2p+1 is prime, this is _guaranteed_, but
   obviously isn't worth coding as a special case).

Yup.

   Also my program reads a list of exponents from a file rather than
   generating them,

So does mersfacgmp, George's, and PLM's.  Though mersfacgmp can also
read from standard input; it doesn't have to be a file.  Mersfacgmp
can also read its own output, to continue from earlier work, whether
a factor was found or not.

   however I don't think there's much speed penalty involved in this
   either way. My code is written so that various multiple-precision
   operations will "early out" is the high word is zero, this happens
   a lot when the factors are < 2^32,

Yup.  Libgmp also has this, to an extent, since numbers are stored in
a variable length array.

   mind you I could write a one-stage version for factors < 2^31 which
   would be faster still.

Somehow I don't think we need that.:)  In fact, I could already
generate a list of all primes up to 2^32 and the Mersenne exponents,
composite or not, that they factor, using my "reverse" method.  That
might even be less than a day on my new P233 MMX.  The only reason I
haven't done it, actually, is that I _still_ don't have enough disk
space ... unless I save the data in some compact, binary format.  But
I may run it anyway, and then keep only the data for prime exponents
under 100 million or so.

   The code I'm using to output factors is inefficient, but I don't 
   think that's a major factor.

Almost certainly correct.  Though running it with profiling would tell
us for sure.

   The PII-350 timings are actually for one thread on a dual PII-350
   system, there were two instances of Prime95 running LL tests in the
   background. For this reason I think a fairer PII-350/P90 ratio
   would be 4.5 rather than the usual 5. (4 is definitely low...)

Probably, but recall that I fudged some the other way because there
was probably another program running on my machine as well.  And, if
there was, it was at the same priority, not (relatively) nice'd.  Not
that it really matters; your code is obviously faster.

   Will change the "engine" to keep going to 2^33 after finding the 
   first factor & report the results from that.

Thanks; I've already got and unpacked the data and it looks fine.
I'll compare to what mersfacgmp has done - it continued from your
first factors to the next power of two as part of my update scripts -
to double check everything.

   Meanwhile, ran again to 2^40 with sieveing of first 10 primes & got
   266,072,661 calls in 3530 secs. Bearing in mind sieving overheads
   etc. it looks like I'm getting better than 10 uS per pass of the
   factoring routine. I don't know how this relates to Prime95's
   internal reporting, the same system gets "0.036 sec per iteration"
   factoring (using the routine that works to 2^62) running Prime95,
   but I don't know what that's measuring.

I'd guess George will let you know.

   There's definitely a few percent to be wrung out of my code, when I 
   get a chance.

Even better.:)

                                                        Will
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

Reply via email to