Brian J. Beesley writes:
1) I don't see any particular need to make available files in human-
readable format, provided that we provide a specification for the
file, and also binary executables for common platforms for converting
a machine-readable file to human-readbale format.
I, personally, have no way of producing executables except for Intel
CPUs, and presently only for Linux (I just yesterday got a old P100
machine up running Win98 (and Prime95, of course:)).
I have thought about a binary data format off and on for a while now,
but mostly in terms of making updates easier by designing a format
that can be updated in place (at least most of the time), which is a
different problem.
On the other hand, it could be that "zipping" a human-readable
format file provides near-optimal compression.
Most likely, and this avoids having to provide executables to read it:
they're already out there.
2) Since factors are of the form 2kp+1, it is only neccessary to list
p k. This saves a significant amount of storage (compared with
listing p and 2kp+1) and the proper factor is easily recovered.
Unfortunately, it's only this simple when p is odd. For even p,
factors can be one more than any multiple of p, not just one more than
even multiples of p. And I'd really rather not have a different
format for even exponents ...
3) If the data was held in variable-length records in a format
similar to
p,n,k1,k2,k3...
where n is the number of known factors and k1, k2, k3,... are the
multipliers for each of the known factors in turn, this would save
much duplication compared with the current format (of factor.txt).
This is also a good idea, but a few of the lines would be _very_ long,
which some text editors and other programs will have problems with.
I'd also suggest dropping n; including it doesn't really add anything
and it can be computed quickly from the other data. Perhaps something
like:
n
p,pk1
,pk2
,pk3
Note that M(n) has no known factors.
4) For values of p which fit into a (integer) CPU register and
"small" values of k (up to approx. 64K), trial factoring is so fast
that I doubt it would be any faster to search a file than to repeat
the calculation - even if the file is already available locally.
Yes, but then there's no way to be sure whether the factor is already
known or not. There was a bug in an early post-GIMPS-start factorer
of mine that caused it to miss _exactly_ one factor, of M(47). If I
hadn't had an explicit list of factors to compare with, I would have
never found the bug.
Perhaps the file should contain a "flag" denoting that factors are
known, but have been omitted for this reason.
This might be enough, though. But note that this can already be done
using the DATABASE format, if you're willing to omit all factors. Hm;
so perhaps a DATABASE file with all the exponents and trial factoring
extents and a second, human readable, file, as above, is a possibility.
And I've long thought about a small program ( function) to do fast,
binary lookups in large Mersenne data files like these (human readable
or binary), akin to the common UNIX "look" program for dictionary
files, but, of course, doing a numeric search rather than a dictionary
search.
We should still have a large master file somewhere, but this need not
be uploaded frequently.
I update my local copy about twice a week usually; it takes only a
couple hours on my new PIII/450MHz machine and was well under a day
even on my old P233MMX system. These times include automatically
downloading data out there every few days to a month, as appropriate
for the particular file.
5) Following on from (4), it seems to make sense to record the
highest value of k tested by trial factoring, rather than the maximum
number of bits.
This is lacking in the DATABASE format, but my format implies this
info.
And, for a year or more, I've actually been treating the "distance" in
k's to get to the next power of two as a "gap" in the trial factoring
data and thus most of it is just above powers of two. Note also that
exponents with known factors can be worked on by Prime95 again, after
mersfacgmp or some other factorer pushes the extent past the next
power of two above a factor.
6) PrimeNet trial factoring has moved well into the 10 millions,
however George's factor.zip file contains data only up to 10 million.
Yup. This also means that my factors and his for exponents above 10
million have not been compared for some time.
I know there is a problem with uploading the large files concerned;
hopefully, the suggestions above will help to reduce the size of
the file, sufficiently to make it feasible to expand the exponent
range to cover (at least) the active Primenet assignments for long
enough for cheap, high-speed Internet connectivity to become
widespread.
Or perhaps George should simply shift to the factors of the exponents