Re: Mersenne: Re: splitting up 10m digit primes

1999-10-19 Thread Ken Kriesel

At 07:49 AM 1999/10/19 +0100, Nick Craig-Wood [EMAIL PROTECTED] wrote:
On Tue, Oct 19, 1999 at 12:28:52AM -0500, Ken Kriesel wrote:
 How would you extend this concept to P-1, ECM, and factoring save files?

As far as I'm aware prime95 is the only program to do P-1, ECM or
factoring so I wasn't intending that the save files should include
these.  Maybe they should?

Surely there are other programs.  There must be various factoring efforts
running on non-Intel processors under non-Microsoft operating systems.

 Would revisions to the standard format be handled within the 8-byte
 Ascii file identifier, or in a separate field?

There was a 4 byte number for revisions at the start of the header.

I took the 4-byte number present in your first recent post to be the
version information of the program that generated the file, not the 
file standard version.  After rereading Brian's comments and your reply
to him, I think I get it; Brian's modification adds the program version
levels, and the original had the standard's version.

(Feeling fuzzy from the flu, please pardon the increased noise level.)


Ken
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: splitting up 10m digit primes

1999-10-17 Thread Nick Craig-Wood

On Sun, Oct 17, 1999 at 02:13:33AM +0200, Steinar H. Gunderson wrote:
 On Sat, Oct 16, 1999 at 03:46:30PM +0100, Chris Jefferson wrote:
 However, proving that they has used the
 interim file generated by this particular program might be difficult...
 
 Not very. Save files are incompatible from program to program. Or at least
 so I think...

Some time ago I proposed a standard format for save files.  It also
had the advantage that it only saved the actual bits used so would
make the save files 1MByte rather than 1.8 MByte for an 8,000,000
exponent in prime95.

 Converting a savefile is also going to be a lot of work :-)

It needs to be done exactly right - hence by the program which
generated it probably.

Here is what I wrote :-



Reading in other threads about save files it seems to me that it would be
worth while standardising the format of save files so that they can be ported
across platforms.

At the moment I think both Prime95 and the unix programs write the floating
point values of the FFT array to a file which is inherently non-portable
across machine architectures.  It also makes the save files quite a bit
larger than necessary (an important consideration if we ever want to hold a
central repository of these files).

I'd like to propose for discussion a simple cross-platform save file format
as below.

Width   Value   Description

8   "MersSave"  in ASCII as a file identifier
4   int32   version number of file
16  ASCII   space filled name of creating program
4   int32   exponent being tested (n)
4   int32   iteration that this residue represents  
4   int32   length of platform specific area x words
4x  int32   platform specific
len int32   binary data for the residue
len = ((n + 0x1F)  ~ 0x1F)  3
4   int32   CCITT CRC-32 of the entire file 

All integers should be represented as little-endian in the file (the majority
of the clients are little-endian).  The residue should be little endian with
the unused 1-31 bits at the end set to 0.

A file format like this would have the following advantages

  1) You could take a save file from one platform/program to another (via a
 central repository at primenet maybe)
  2) You could take a save file from one FFT size to another (imagine a
 software upgrade where the FFT limits have changed)
  3) They are much smaller (eg 600k vs 2M)

It would have the disadvantages

  1) Slightly slower and more complex to read and write.  However this is an
 infrequent operation.  (That said time to write a save file may be IO
 limited in which case a shorter save file will be quicker.)
  2) We've all got to agree on it ;-)

Discuss!

-- 
Nick Craig-Wood
[EMAIL PROTECTED]
http://www.axis.demon.co.uk/
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: splitting up 10m digit primes

1999-10-17 Thread Brian J. Beesley

On 17 Oct 99, at 8:22, Nick Craig-Wood wrote:

 Some time ago I proposed a standard format for save files.  It also
 had the advantage that it only saved the actual bits used so would
 make the save files 1MByte rather than 1.8 MByte for an 8,000,000
 exponent in prime95.

Yes - this idea is useful ...

  Converting a savefile is also going to be a lot of work :-)
 
 It needs to be done exactly right - hence by the program which
 generated it probably.
 
If it's worthwhile, there is of course source for George's programs 
(which represent the vast majority of clients). So there's existing 
source code for reading Prime95/mprime format save files.

 Here is what I wrote :-
 
 I'd like to propose for discussion a simple cross-platform save file format
 as below.
 
 Width   Value   Description
 
 8   "MersSave"  in ASCII as a file identifier
 4   int32   version number of file
 16  ASCII   space filled name of creating program

I want this longer,  to include version major/minor/release  build 
numbers as well. 32 bytes?

 4   int32   exponent being tested (n)
 4   int32   iteration that this residue represents  
 4   int32   length of platform specific area x words
 4x  int32   platform specific

e.g. George's programs would want to store the offset here.

Would it be too much to ask to have an int32 count of "platform 
specific" bytes following in a variable length field here? Some 
platforms will require none (just a zero count), others might want to 
embed a (currently unknown) vector of arbitary length at this point.

 len int32   binary data for the residue
 len = ((n + 0x1F)  ~ 0x1F)  3
 4   int32   CCITT CRC-32 of the entire file 
 
 All integers should be represented as little-endian in the file (the majority
 of the clients are little-endian).  The residue should be little endian with
 the unused 1-31 bits at the end set to 0.

Since the "majority of clients" which are "little endian" are 
overwhelmingly IA32 processors, which have byte reversal instructions 
specifically designed to do the job, I'd like to propose that the 
integers be stored in "big-endian" format. Or, alternatively, that we 
devote _two_ bits of the file version number - both the MS and LS 
bits - to flags, set to 0 for little-endian format or 1 for big-
endian format. Still leaves 30 bits for file version number info, and 
allows clients to write whatever format is easiest. Write operations 
outnumber read operations by a big margin!


Regards
Brian Beesley
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: splitting up 10m digit primes

1999-10-17 Thread Nick Craig-Wood

On Sun, Oct 17, 1999 at 03:21:49PM +0100, Brian J. Beesley wrote:
 On 17 Oct 99, at 8:22, Nick Craig-Wood wrote:
  I'd like to propose for discussion a simple cross-platform save file format
  as below.
  
  Width   Value   Description
  
  8   "MersSave"  in ASCII as a file identifier
  4   int32   version number of file
  16  ASCII   space filled name of creating program
 
 I want this longer,  to include version major/minor/release  build 
 numbers as well. 32 bytes?

Yes sounds like a good idea.  Perhaps we should formalise the version
number like this?

1int8Major
1int8Minor
1int8Micro
1int8Patch level

  4   int32   exponent being tested (n)
  4   int32   iteration that this residue represents  
  4   int32   length of platform specific area x words
  4x  int32   platform specific
 
 e.g. George's programs would want to store the offset here.

Ah, offset!  I wrote this before offset was introduced. This would
need to be part of the save file I think because different offset =
different save file data.  I think offset would fit into an int32?

 Would it be too much to ask to have an int32 count of "platform 
 specific" bytes following in a variable length field here? Some 
 platforms will require none (just a zero count), others might want to 
 embed a (currently unknown) vector of arbitary length at this point.

That was what I intended by the last two entries above, but I may not
have been clear enough.

4   int32   length of platform specific area x words
4x  int32   platform specific (x words of data may be 0 length)

  len int32   binary data for the residue
  len = ((n + 0x1F)  ~ 0x1F)  3
  4   int32   CCITT CRC-32 of the entire file 
  
  All integers should be represented as little-endian in the file (the majority
  of the clients are little-endian).  The residue should be little endian with
  the unused 1-31 bits at the end set to 0.
 
 Since the "majority of clients" which are "little endian" are 
 overwhelmingly IA32 processors, which have byte reversal instructions 
 specifically designed to do the job, I'd like to propose that the 
 integers be stored in "big-endian" format. Or, alternatively, that we 
 devote _two_ bits of the file version number - both the MS and LS 
 bits - to flags, set to 0 for little-endian format or 1 for big-
 endian format. Still leaves 30 bits for file version number info, and 
 allows clients to write whatever format is easiest. Write operations 
 outnumber read operations by a big margin!

This is sensible.  I like the big/little endian flag idea.  I've tried
to design the header so that it can be represented by a C struct and
just (f)write()-en straight to the file.  Having big/little endian
flag is an excellent idea here.

Also it might just be better to store the actual numeric data as
int8's rather than int32s - this way the endian ness of the processors
doesn't matter.

-- 
Nick Craig-Wood
[EMAIL PROTECTED]
http://www.axis.demon.co.uk/
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: splitting up 10m digit primes

1999-10-16 Thread Chris Jefferson

On Sat, 16 Oct 1999, Jukka Santala wrote:

 "Brian J. Beesley" wrote:
 
  On 14 Oct 99, at 18:15, Chris Jefferson wrote:
  Surely this isn't really an issue. PrimeNet would surely recognise a
  result submitted by a "poacher" as such  either disqualify it
  automatically, or credit the actual owner of the assignment instead
  of the "poacher".
 
 Except that PrimeNet doesn't control the prize. This is the error everybody is doing.
 EFF is adminstrating the competition and prize, given by anonymous donaters to
 advance distributed computing / mathemathical algorithms on computers. PrimeNet is
 just one of the organizations (With largest changes known!) to get that prize, but it
 doesn't decide upon who gets it. The first person to present a prime filling the
 requirements will - and that's why result-files "few iterations short" will be worth
 more than their weight in gold.
 

Yes, but the licence for Prim95/NT specifies that anyone who uses this
program must obey it's licence. Someone using a partly completed file
would also be liable under this. However, proving that they has used the
interim file generated by this particular program might be difficult...



_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: splitting up 10m digit primes

1999-10-16 Thread Brian J. Beesley

On 16 Oct 99, at 7:35, Jukka Santala wrote:

 [ ... snip ... ]
 Except that PrimeNet doesn't control the prize. This is the error everybody is doing.
 EFF is adminstrating the competition and prize, given by anonymous donaters to
 advance distributed computing / mathemathical algorithms on computers. PrimeNet is
 just one of the organizations (With largest changes known!) to get that prize, but it
 doesn't decide upon who gets it. The first person to present a prime filling the
 requirements will - and that's why result-files "few iterations short" will be worth
 more than their weight in gold.

I think, if someone tried to pull this particular stunt, PrimeNet 
could inform EFF of the claimant's theft of the right to the 
discovery (which is what it amounts to). Another unfortunate example 
of the way in which large ca$h prizes can lead to unpleasantries, I'm 
afraid.
 
 You said... about making the repository write-only. And you seem to completely miss
 the point; the discussion here has centered around how to solve the problem that
 PrimeNet doesn't have the bandwidth or storage space to backup the intermediary
 files! So my suggestion was "distributed storage". This will solve the backup
 problem, but yes the above-mentioned prize-claim and double-check problem remains.
 Unfortunately, I think in that respect we're stuck.

Well, I sort of _did_ miss that particular point. But I have a 
solution. If you have a distributed filestore, controlled by a number 
of different people, then you encrypt  hack up the file you're 
saving and scatter pieces amongst the remote filestores. If you (the 
assignment owner) use a method based on a key otherwise known only to 
PrimeNet to scramble  scatter the file, the small chunk of data 
stored on any other individual's system is useless to them for the 
purposes of hijacking your work.

I think PrimeNet should know users' keys so that data can be 
recovered in the event of the user defaulting or accidentally 
destroying their own key (so that work done isn't lost). Presumably 
we trust the people running PrimeNet!

An obvious extension to this idea is to use a RAID type technique 
e.g. if you store each bit of a byte on a seperate host, so that the 
file is scattered over 8 hosts, you could send the parity bit to a 
ninth host. Then the distributed filestore is immune to loss of one 
of its hosts (by reason of network failure or unsecured loss of its 
filestore).

As for bandwidth and total filestore space needed - the first is, and 
will probably remain for some time, a problem, at least for most home 
users. 10,000 x 8 Mbyte save files is a lot of data by today's 
standards; nevertheless, 4 20GB drives at $200 each would just about 
cover it. (A random dip into a UK magazine shows Samsung 20.3GB 
UDMA66 IDE at £144 (sterling) each, so US$10 per GB should be 
achievable)


Regards
Brian Beesley
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: splitting up 10m digit primes

1999-10-15 Thread Lucas Wiman

  Personally, I think the big problem with regards to this is not people
  quitting so much as the possibility of major hard-drive failure etc. on the
  testers. I doubt many of them keep good backups 
 
 NO EXCUSE! A Zip drive or a CD-R is inexpensive and effective; one 
 will service many networked systems, they don't all have to be 
 running the same OS.

True, also Pkzip 2.04g (I don't know about WinZip) had the ability
to span a zip archive across floppy disks.  Even on a LL test of 
33mil would take up under 3 floppies.  (This wouldn't be reasonable
for large networks, but it would be easily achievable for the home
user...)

-Lucas
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Re: splitting up 10m digit primes

1999-10-14 Thread Chris Jefferson

 Also one would have to ask what would be the incencitive for someone to act
 as a backup server... or prevent them from "stealing the work" as it were, by
 using high-speed computers to finish the test a month before the main person
 does in hopes of getting the prize.

I think that the new licence could stop people from doing this, or could
quickly be altered to say so, as I assume that a new version would need to
be bought out to support it possibly... or at least make it easier...

In my personal opinion, the best way of doing this would be to set up 3
computers in a 'loop' all doing the same exponent. Then they could
communicate at regular intervals. Although it would mean working at the
slowest speed, it might be a good idea to, along with backing up, get the
to cmpare exponents at the same time. This would fix the problem of one of
them quitting, another could be 'bought in', and also would allow
double=checking as the check went along. However, I suspect the
programming would be quite difficult.

Chris


 
  -Donwulff
 
 _
 Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
 Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers
 

_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers