Brian J. Beesley writes:
I put a development version on my anon ftp server about three days
ago. ftp://lettuce.edsc.ulst.ac.uk/gimps/DecaMega/factor95.c
Um, that's an unfortunate choice of name; George's P-1 factorer is
"Factor95" (though there's a newer version, renamed to Factor98). Of
course, George's "95" was for the year he wrote it, I think, rather
than the number of bits, which I'd guess is your reason for that
name.:)
This is the 95-bit-capable code I gave George about early April. It
isn't fully optimised, it needs to be turned into MASM source,
properly aligned & have removed the horrendous & slow kludge used
to work around a serious MS VC++ inline assembler bug.
The 63-bit-capable code I'm using is simply hacked down from this
in an obvious way. I didn't even bother to look for any more
optimization which would be possible as a result of the various
temporary results being one 32-bit word shorter.
And here I thought it was quite fast enough already.:)
The sieving I'm messing about with is embedded in the "engine" that
generates possible factors, rather than the factoring code itself.
Interesting. That is also, as I recall, how Peter-Lawrence
Montgomery's trial factoring program does this. Mersfacgmp does the
sieving and factoring in the same loop, which probably costs it quite
a bit in terms of locality in comparison.
Well, against my code is that it stops as soon as it finds one factor
So does George's (in each sieve pass), but not mersfacgmp (unless it's
told to on the command line) nor (I think) PLM's program, triald.
Quite frequently one finds that the first factor one tries works
(if p = 3 mod 4 and 2p+1 is prime, this is _guaranteed_, but
obviously isn't worth coding as a special case).
Yup.
Also my program reads a list of exponents from a file rather than
generating them,
So does mersfacgmp, George's, and PLM's. Though mersfacgmp can also
read from standard input; it doesn't have to be a file. Mersfacgmp
can also read its own output, to continue from earlier work, whether
a factor was found or not.
however I don't think there's much speed penalty involved in this
either way. My code is written so that various multiple-precision
operations will "early out" is the high word is zero, this happens
a lot when the factors are < 2^32,
Yup. Libgmp also has this, to an extent, since numbers are stored in
a variable length array.
mind you I could write a one-stage version for factors < 2^31 which
would be faster still.
Somehow I don't think we need that.:) In fact, I could already
generate a list of all primes up to 2^32 and the Mersenne exponents,
composite or not, that they factor, using my "reverse" method. That
might even be less than a day on my new P233 MMX. The only reason I
haven't done it, actually, is that I _still_ don't have enough disk
space ... unless I save the data in some compact, binary format. But
I may run it anyway, and then keep only the data for prime exponents
under 100 million or so.
The code I'm using to output factors is inefficient, but I don't
think that's a major factor.
Almost certainly correct. Though running it with profiling would tell
us for sure.
The PII-350 timings are actually for one thread on a dual PII-350
system, there were two instances of Prime95 running LL tests in the
background. For this reason I think a fairer PII-350/P90 ratio
would be 4.5 rather than the usual 5. (4 is definitely low...)
Probably, but recall that I fudged some the other way because there
was probably another program running on my machine as well. And, if
there was, it was at the same priority, not (relatively) nice'd. Not
that it really matters; your code is obviously faster.
Will change the "engine" to keep going to 2^33 after finding the
first factor & report the results from that.
Thanks; I've already got and unpacked the data and it looks fine.
I'll compare to what mersfacgmp has done - it continued from your
first factors to the next power of two as part of my update scripts -
to double check everything.
Meanwhile, ran again to 2^40 with sieveing of first 10 primes & got
266,072,661 calls in 3530 secs. Bearing in mind sieving overheads
etc. it looks like I'm getting better than 10 uS per pass of the
factoring routine. I don't know how this relates to Prime95's
internal reporting, the same system gets "0.036 sec per iteration"
factoring (using the routine that works to 2^62) running Prime95,
but I don't know what that's measuring.
I'd guess George will let you know.
There's definitely a few percent to be wrung out of my code, when I
get a chance.
Even better.:)
Will
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm