Re: Mersenne: Re: splitting up 10m digit primes
At 07:49 AM 1999/10/19 +0100, Nick Craig-Wood [EMAIL PROTECTED] wrote: On Tue, Oct 19, 1999 at 12:28:52AM -0500, Ken Kriesel wrote: How would you extend this concept to P-1, ECM, and factoring save files? As far as I'm aware prime95 is the only program to do P-1, ECM or factoring so I wasn't intending that the save files should include these. Maybe they should? Surely there are other programs. There must be various factoring efforts running on non-Intel processors under non-Microsoft operating systems. Would revisions to the standard format be handled within the 8-byte Ascii file identifier, or in a separate field? There was a 4 byte number for revisions at the start of the header. I took the 4-byte number present in your first recent post to be the version information of the program that generated the file, not the file standard version. After rereading Brian's comments and your reply to him, I think I get it; Brian's modification adds the program version levels, and the original had the standard's version. (Feeling fuzzy from the flu, please pardon the increased noise level.) Ken _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: splitting up 10m digit primes
On Sun, Oct 17, 1999 at 02:13:33AM +0200, Steinar H. Gunderson wrote: On Sat, Oct 16, 1999 at 03:46:30PM +0100, Chris Jefferson wrote: However, proving that they has used the interim file generated by this particular program might be difficult... Not very. Save files are incompatible from program to program. Or at least so I think... Some time ago I proposed a standard format for save files. It also had the advantage that it only saved the actual bits used so would make the save files 1MByte rather than 1.8 MByte for an 8,000,000 exponent in prime95. Converting a savefile is also going to be a lot of work :-) It needs to be done exactly right - hence by the program which generated it probably. Here is what I wrote :- Reading in other threads about save files it seems to me that it would be worth while standardising the format of save files so that they can be ported across platforms. At the moment I think both Prime95 and the unix programs write the floating point values of the FFT array to a file which is inherently non-portable across machine architectures. It also makes the save files quite a bit larger than necessary (an important consideration if we ever want to hold a central repository of these files). I'd like to propose for discussion a simple cross-platform save file format as below. Width Value Description 8 "MersSave" in ASCII as a file identifier 4 int32 version number of file 16 ASCII space filled name of creating program 4 int32 exponent being tested (n) 4 int32 iteration that this residue represents 4 int32 length of platform specific area x words 4x int32 platform specific len int32 binary data for the residue len = ((n + 0x1F) ~ 0x1F) 3 4 int32 CCITT CRC-32 of the entire file All integers should be represented as little-endian in the file (the majority of the clients are little-endian). The residue should be little endian with the unused 1-31 bits at the end set to 0. A file format like this would have the following advantages 1) You could take a save file from one platform/program to another (via a central repository at primenet maybe) 2) You could take a save file from one FFT size to another (imagine a software upgrade where the FFT limits have changed) 3) They are much smaller (eg 600k vs 2M) It would have the disadvantages 1) Slightly slower and more complex to read and write. However this is an infrequent operation. (That said time to write a save file may be IO limited in which case a shorter save file will be quicker.) 2) We've all got to agree on it ;-) Discuss! -- Nick Craig-Wood [EMAIL PROTECTED] http://www.axis.demon.co.uk/ _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: splitting up 10m digit primes
On 17 Oct 99, at 8:22, Nick Craig-Wood wrote: Some time ago I proposed a standard format for save files. It also had the advantage that it only saved the actual bits used so would make the save files 1MByte rather than 1.8 MByte for an 8,000,000 exponent in prime95. Yes - this idea is useful ... Converting a savefile is also going to be a lot of work :-) It needs to be done exactly right - hence by the program which generated it probably. If it's worthwhile, there is of course source for George's programs (which represent the vast majority of clients). So there's existing source code for reading Prime95/mprime format save files. Here is what I wrote :- I'd like to propose for discussion a simple cross-platform save file format as below. Width Value Description 8 "MersSave" in ASCII as a file identifier 4 int32 version number of file 16 ASCII space filled name of creating program I want this longer, to include version major/minor/release build numbers as well. 32 bytes? 4 int32 exponent being tested (n) 4 int32 iteration that this residue represents 4 int32 length of platform specific area x words 4x int32 platform specific e.g. George's programs would want to store the offset here. Would it be too much to ask to have an int32 count of "platform specific" bytes following in a variable length field here? Some platforms will require none (just a zero count), others might want to embed a (currently unknown) vector of arbitary length at this point. len int32 binary data for the residue len = ((n + 0x1F) ~ 0x1F) 3 4 int32 CCITT CRC-32 of the entire file All integers should be represented as little-endian in the file (the majority of the clients are little-endian). The residue should be little endian with the unused 1-31 bits at the end set to 0. Since the "majority of clients" which are "little endian" are overwhelmingly IA32 processors, which have byte reversal instructions specifically designed to do the job, I'd like to propose that the integers be stored in "big-endian" format. Or, alternatively, that we devote _two_ bits of the file version number - both the MS and LS bits - to flags, set to 0 for little-endian format or 1 for big- endian format. Still leaves 30 bits for file version number info, and allows clients to write whatever format is easiest. Write operations outnumber read operations by a big margin! Regards Brian Beesley _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: splitting up 10m digit primes
On Sun, Oct 17, 1999 at 03:21:49PM +0100, Brian J. Beesley wrote: On 17 Oct 99, at 8:22, Nick Craig-Wood wrote: I'd like to propose for discussion a simple cross-platform save file format as below. Width Value Description 8 "MersSave" in ASCII as a file identifier 4 int32 version number of file 16 ASCII space filled name of creating program I want this longer, to include version major/minor/release build numbers as well. 32 bytes? Yes sounds like a good idea. Perhaps we should formalise the version number like this? 1int8Major 1int8Minor 1int8Micro 1int8Patch level 4 int32 exponent being tested (n) 4 int32 iteration that this residue represents 4 int32 length of platform specific area x words 4x int32 platform specific e.g. George's programs would want to store the offset here. Ah, offset! I wrote this before offset was introduced. This would need to be part of the save file I think because different offset = different save file data. I think offset would fit into an int32? Would it be too much to ask to have an int32 count of "platform specific" bytes following in a variable length field here? Some platforms will require none (just a zero count), others might want to embed a (currently unknown) vector of arbitary length at this point. That was what I intended by the last two entries above, but I may not have been clear enough. 4 int32 length of platform specific area x words 4x int32 platform specific (x words of data may be 0 length) len int32 binary data for the residue len = ((n + 0x1F) ~ 0x1F) 3 4 int32 CCITT CRC-32 of the entire file All integers should be represented as little-endian in the file (the majority of the clients are little-endian). The residue should be little endian with the unused 1-31 bits at the end set to 0. Since the "majority of clients" which are "little endian" are overwhelmingly IA32 processors, which have byte reversal instructions specifically designed to do the job, I'd like to propose that the integers be stored in "big-endian" format. Or, alternatively, that we devote _two_ bits of the file version number - both the MS and LS bits - to flags, set to 0 for little-endian format or 1 for big- endian format. Still leaves 30 bits for file version number info, and allows clients to write whatever format is easiest. Write operations outnumber read operations by a big margin! This is sensible. I like the big/little endian flag idea. I've tried to design the header so that it can be represented by a C struct and just (f)write()-en straight to the file. Having big/little endian flag is an excellent idea here. Also it might just be better to store the actual numeric data as int8's rather than int32s - this way the endian ness of the processors doesn't matter. -- Nick Craig-Wood [EMAIL PROTECTED] http://www.axis.demon.co.uk/ _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: splitting up 10m digit primes
On Sat, 16 Oct 1999, Jukka Santala wrote: "Brian J. Beesley" wrote: On 14 Oct 99, at 18:15, Chris Jefferson wrote: Surely this isn't really an issue. PrimeNet would surely recognise a result submitted by a "poacher" as such either disqualify it automatically, or credit the actual owner of the assignment instead of the "poacher". Except that PrimeNet doesn't control the prize. This is the error everybody is doing. EFF is adminstrating the competition and prize, given by anonymous donaters to advance distributed computing / mathemathical algorithms on computers. PrimeNet is just one of the organizations (With largest changes known!) to get that prize, but it doesn't decide upon who gets it. The first person to present a prime filling the requirements will - and that's why result-files "few iterations short" will be worth more than their weight in gold. Yes, but the licence for Prim95/NT specifies that anyone who uses this program must obey it's licence. Someone using a partly completed file would also be liable under this. However, proving that they has used the interim file generated by this particular program might be difficult... _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: splitting up 10m digit primes
On 16 Oct 99, at 7:35, Jukka Santala wrote: [ ... snip ... ] Except that PrimeNet doesn't control the prize. This is the error everybody is doing. EFF is adminstrating the competition and prize, given by anonymous donaters to advance distributed computing / mathemathical algorithms on computers. PrimeNet is just one of the organizations (With largest changes known!) to get that prize, but it doesn't decide upon who gets it. The first person to present a prime filling the requirements will - and that's why result-files "few iterations short" will be worth more than their weight in gold. I think, if someone tried to pull this particular stunt, PrimeNet could inform EFF of the claimant's theft of the right to the discovery (which is what it amounts to). Another unfortunate example of the way in which large ca$h prizes can lead to unpleasantries, I'm afraid. You said... about making the repository write-only. And you seem to completely miss the point; the discussion here has centered around how to solve the problem that PrimeNet doesn't have the bandwidth or storage space to backup the intermediary files! So my suggestion was "distributed storage". This will solve the backup problem, but yes the above-mentioned prize-claim and double-check problem remains. Unfortunately, I think in that respect we're stuck. Well, I sort of _did_ miss that particular point. But I have a solution. If you have a distributed filestore, controlled by a number of different people, then you encrypt hack up the file you're saving and scatter pieces amongst the remote filestores. If you (the assignment owner) use a method based on a key otherwise known only to PrimeNet to scramble scatter the file, the small chunk of data stored on any other individual's system is useless to them for the purposes of hijacking your work. I think PrimeNet should know users' keys so that data can be recovered in the event of the user defaulting or accidentally destroying their own key (so that work done isn't lost). Presumably we trust the people running PrimeNet! An obvious extension to this idea is to use a RAID type technique e.g. if you store each bit of a byte on a seperate host, so that the file is scattered over 8 hosts, you could send the parity bit to a ninth host. Then the distributed filestore is immune to loss of one of its hosts (by reason of network failure or unsecured loss of its filestore). As for bandwidth and total filestore space needed - the first is, and will probably remain for some time, a problem, at least for most home users. 10,000 x 8 Mbyte save files is a lot of data by today's standards; nevertheless, 4 20GB drives at $200 each would just about cover it. (A random dip into a UK magazine shows Samsung 20.3GB UDMA66 IDE at £144 (sterling) each, so US$10 per GB should be achievable) Regards Brian Beesley _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: splitting up 10m digit primes
Personally, I think the big problem with regards to this is not people quitting so much as the possibility of major hard-drive failure etc. on the testers. I doubt many of them keep good backups NO EXCUSE! A Zip drive or a CD-R is inexpensive and effective; one will service many networked systems, they don't all have to be running the same OS. True, also Pkzip 2.04g (I don't know about WinZip) had the ability to span a zip archive across floppy disks. Even on a LL test of 33mil would take up under 3 floppies. (This wouldn't be reasonable for large networks, but it would be easily achievable for the home user...) -Lucas _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Re: splitting up 10m digit primes
Also one would have to ask what would be the incencitive for someone to act as a backup server... or prevent them from "stealing the work" as it were, by using high-speed computers to finish the test a month before the main person does in hopes of getting the prize. I think that the new licence could stop people from doing this, or could quickly be altered to say so, as I assume that a new version would need to be bought out to support it possibly... or at least make it easier... In my personal opinion, the best way of doing this would be to set up 3 computers in a 'loop' all doing the same exponent. Then they could communicate at regular intervals. Although it would mean working at the slowest speed, it might be a good idea to, along with backing up, get the to cmpare exponents at the same time. This would fix the problem of one of them quitting, another could be 'bought in', and also would allow double=checking as the check went along. However, I suspect the programming would be quite difficult. Chris -Donwulff _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers