RE: Mersenne: Multiple residues - enhancing double-checking
> > Yes, they can and do. In the field of computational number theory, several > participants in the Cunningham project have been active much longer than my > decade. Sam Wagstaff has been co-ordinating it longer than that. To be > honest, I don't know when he started. Perhaps Peter Mon tgomery (another > long-time participant) can inform us, as I know he's also on the Mersenne > list. > I have been working on a PhD to prove primes for ten years of sundays, Paul :-) Preda > In other fields, it's not unusual to commit to a decade or more. Consider > the deep space missions such as Voyager, or the astronomers who monitor a > particular class of objects for most of their working lives. I know some > amateurs who have been observing particular variable stars since the 1960's; > George Alcock has been searching for comets and novae since the late > fifties. > > > Paul > _ > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm > Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
Paul Leyland (whom I'll meet for the second time later this month) writes: > > From: Brian J. Beesley [mailto:[EMAIL PROTECTED]] > > > A couple of points here: > > (1) Can anyone honestly commit to a project for a whole decade? > Yes, they can and do. In the field of computational number theory, several > participants in the Cunningham project have been active much longer than my > decade. Sam Wagstaff has been co-ordinating it longer than that. To be > honest, I don't know when he started. Perhaps Peter Mon tgomery (another > long-time participant) can inform us, as I know he's also on the Mersenne > list. > In other fields, it's not unusual to commit to a decade or more. Consider > the deep space missions such as Voyager, or the astronomers who monitor a > particular class of objects for most of their working lives. I know some > amateurs who have been observing particular variable stars since the 1960's; > George Alcock has been searching for comets and novae since the late > fifties. Ten years for one computation seems a long time. Yes, I have been contributing to the Cunningham project for a long time, using machines at multiple institutions, but we can hardly commit for ten years. Jobs change today, at least in the USA. A home machine may be stolen or damaged by fire or floods. My family wouldn't know what to do if I died. For the very long LL computations, it is reasonable for two sites A and B to start the computation, and do a checkpoint perhaps every 1000 iterations. When A reaches its 7000-th iteration (or other multiple of 1000), it sends the checkpoint file to a central site, and can continue for another 1000 iterations if B has submitted a consistent checkpoint for the 6000-th iteration, with different programs used for iterations 5001-6000. Otherwise A is blocked (i.e., told to do something else until B catches up). If A and B differ at the 6000-th iteration checkpoint, or if one times out (say six months with no activity), a third site can restart from the (agreed) 5000-th iteration checkpoint It is important that the different programs (e.g., Prime95, MacLucas, LucasMayer) agree on a common checkpoint format. Not only must the central server be able to compare them, but a contributor might migrate from a Pentium in 1996 to an Alpha today to a Merced in 2002. Of course this data at the central site will be voluminous, if each exponent around 100 M averages 2.5 13 Mb checkpoint files (for iterations 5000 and 6000 and possibly 7000 in the example). If there are 10 LL tests in progress, then the central site will need 3250 Gb. A tetrabyte is huge today but may be common in 15 years. In the 1970's much data was on seven-track tapes. I recall each held 1.9 M 60-bit words (14 Mb) Today the NFS data for M619 is being processed on a filesystem with 16 Gb, a 1100-fold increase over 25 years. Optimizations can reduce the size stored. For example, store the full 5000-th iteration checkpoint, but only hash(6000-th iteration) and hash(7000-th iteration) where the hash function might return only 1024 bits. When the second site sends a 6000-th iteration checkpoint, verify that it gives the same hash value. If yes, then we assume B's full file agrees with A's, and store the 6000-th iteration checkpoint while deleting the 5000-th. If no, the full 5000-th remains available for a third site to use. Peter _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
> From: Brian J. Beesley [mailto:[EMAIL PROTECTED]] > A couple of points here: > (1) Can anyone honestly commit to a project for a whole decade? Yes, they can and do. In the field of computational number theory, several participants in the Cunningham project have been active much longer than my decade. Sam Wagstaff has been co-ordinating it longer than that. To be honest, I don't know when he started. Perhaps Peter Mon tgomery (another long-time participant) can inform us, as I know he's also on the Mersenne list. In other fields, it's not unusual to commit to a decade or more. Consider the deep space missions such as Voyager, or the astronomers who monitor a particular class of objects for most of their working lives. I know some amateurs who have been observing particular variable stars since the 1960's; George Alcock has been searching for comets and novae since the late fifties. Paul _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
On 6 Aug 99, at 2:47, Lucas Wiman wrote: > The main way to speed this up is to have system C continue from the last > residue that it is known that system A and B agree upon. The way I looked at it, the PrimeNet server only has to store _one_ (64 bit) residue for each exponent in progress. As others have pointed out, shifting files >> 10 MB in size across slow dial-up links isn't going to be popular, but you can't have C continue from the last residue agreed by A and B, unless the agreed residue is known to the PrimeNet server. 1 users by 16 MB per file = 160 GB of disk space - I think Scott might have to invest in a new disk array! > Or, possibly > even better (though more error prone) would be to have A and B recalculate > from last agreed upon residue, and see if they now agree. Now, that's a possibility. But suppose they don't, again. It's possible that this could happen if one of the systems was sufferring from an esoteric hardware bug (possibly common to both systems, but they'd be running different offsets, so the FFT data would differ). My "inefficient" scheme also allows reasonably quick recovery from one of the systems disconnecting from the project. > > Won't this mean systems constantly getting interupted in the middle of an > exponent segment? Or would the systems only continue on one segment when > they finish the current one? No - if I've exponent x suspended & I'm working on exponent y ( > x), then I just wait until I'm checking in y anyway, when I may get a message to continue x (or some other exponent(s)). If I don't, and y is now suspended too, I need another exponent to start work on. If I find I have several exponents all of which are now OK to continue, I just pick the numerically lowest one to work on next. > > This makes sense, though it might not be needed. This scheme would work > best on people willing to stay with GIMPS for the decade required to > finish the range. Many of these people have (fast) networks with lots of > computers on them. The check/double check computers might be right next > to each other, owned, and run by the same person. A couple of points here: (1) Can anyone honestly commit to a project for a whole decade? (2) I honestly think it's better if the two computers working on a particular exponent are _not_ owned/operated by the same person. This precaution removes the possibility of one deranged individual "contaminating" the project results by "forging" results. Regards Brian Beesley _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
> If B's checkpoint info doesn't match A's, the exponent is issued to > another system C which calculates to the first checkpoint & check in, > as before. If we now have two matching residuals, the two systems > supplying them are told to continue and the system with the "wrong" > result is told to abandon that exponent. If all three differ, issue > the exponent to system D ... The main way to speed this up is to have system C continue from the last residue that it is known that system A and B agree upon. Or, possibly even better (though more error prone) would be to have A and B recalculate from last agreed upon residue, and see if they now agree. This would waste at most 2 P90 months, without having to transfer it over the network, or have system D doing half the agreed upon work. I think it's safe to say that if system A and B can agree on the residue at that point, then it is the correct residue. > As well as minimising wasted work, this scheme results in very little > extra server traffic, since the systems all need to check in > occasionally anyway. If Prime95 is altered so that it always works on > the _smallest_ exponent it has "in hand", work should be completed > reasonably quickly and in reasonable order. (Especially bearing in > mind that this scheme completes double-checking at the same time as > first tests!) Won't this mean systems constantly getting interupted in the middle of an exponent segment? Or would the systems only continue on one segment when they finish the current one? > When a prime is found, the discovery credit (& any prize money) > should obviously be split between the two users who ran the last > iteration on the exponent in question. This makes sense, though it might not be needed. This scheme would work best on people willing to stay with GIMPS for the decade required to finish the range. Many of these people have (fast) networks with lots of computers on them. The check/double check computers might be right next to each other, owned, and run by the same person. This will not matter for some time. Does anyone have current data for % mistakes? -Lucas _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
On 5 Aug 99, at 9:05, Eric Hahn wrote: > Based on previous messages in this thread, I'm going > to throw in my 2 cents... (and avoid a lot of > quoting!) And I'm doing the same. I'd probably have reacted sooner if I hadn't been offline (getting a kidney stone fixed). Several months ago I suggested a method which does require some work on both client and server, but which really does reduce wasted work to a minimum. The way it works is as follows: Assign each exponent initially to two systems A and B. The software has a table of checkpoints built in (these can be very sparse for smaller exponents, becoming denser as the exponent increases, so as to maintain about 1 P90 month between checkpoints). Suppose A reaches the first checkpoint before B. A stops working on the exponent & moves on to some other work already queued. It reports the iteration number and the low 64 bits of the residue to the PrimeNet server. When B reaches the first checkpoint, it also stops working on the exponent & gets on with something else. It also reports the iteration number & the low 64 bits of the residue to the PrimeNet server. The PrimeNet server then checks to see if the checkpoint information matches that sent in by another system. If it does, it immediately sends a message to B instructing B that it can continue with the checked exponent up to the next checkpoint. It also tags a message to be sent to A instructing A to proceed with the checked exponent, which is communicated to A next time A checks into the server. If B's checkpoint info doesn't match A's, the exponent is issued to another system C which calculates to the first checkpoint & check in, as before. If we now have two matching residuals, the two systems supplying them are told to continue and the system with the "wrong" result is told to abandon that exponent. If all three differ, issue the exponent to system D ... The last iteration (completing the run) is treated in the same way as an intermediate checkpoint, except that there is no need for the PrimeNet server to send a "continue" or "abort" command - though this might provide useful feedback to the user, if it was done anyway. As well as minimising wasted work, this scheme results in very little extra server traffic, since the systems all need to check in occasionally anyway. If Prime95 is altered so that it always works on the _smallest_ exponent it has "in hand", work should be completed reasonably quickly and in reasonable order. (Especially bearing in mind that this scheme completes double-checking at the same time as first tests!) When a prime is found, the discovery credit (& any prize money) should obviously be split between the two users who ran the last iteration on the exponent in question. > I think as the time increases for each LL test, there > would be much more time savings in attempting to do > higher trial-factoring. Yes, trial factoring to a higher limit does help, but, unfortunately, as well as each extra bit of factoring depth costing much time as all the previous work, the chance of finding a factor (in a "window" of any particular fixed size) decreases steadily as the factoring depth increases. There is also going to be a jump in factoring time somewhere around 62/63/64 bits due to hardware limitations of the Intel architecture. Don't forget, factoring time is O(2^d/p) whereas LL testing time is "only" O(p^2 * log p), where p is the exponent and d is the factoring depth in bits. Finally, although save files for exponents > 33 million are going to be a considerable size (14 MBytes for exponents just big enough to have 10 million decimal digits), it might be worth considering an _option_ where users (with fast network connections) could deposit save files for "safe keeping" (as a backup). As has been pointed out elsewhere, this would save time in the event that a user goes AWOL. Clearly it isn't reasonable to expect users with dial-in connectivity to transmit files > 10MBytes long with any frequency, but, for users with high-speed connections, it doesn't represent much of a problem. The other issue is finding enough disk space to store a large number of these files ... they don't compress very well, either! Regards Brian Beesley _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Multiple residues - enhancing double-checking
I agree with everything that Lucas said. For the project to benefit, the participants would have to AGREE to restructure the work. Savings would result from stopping the diverging work earlier. (And if "passing around" intermediate files was accepted, by starting any triple-check work later.) In essence, the idea is to take that 1_year_to_complete LL test of an exponent and BREAK IT DOWN into (n) discrete serial work units. mikus Proposed procedural steps: (b) When 'Tester 1' requests work, assign ONLY work unit (1) to him. Howewver, the project RESERVES exponent () to 'Tester 1' - only *he* will (in the future) be assigned initial testing of further work units within exponent (). (c) Eventually, Tester 1 finishes that work unit, and begins to work on some other work unit on some other exponent. HOWEVER, 'Tester 1' saves the intermediate file (of how far he got with exponent ). Somewhere along the line, exponent () work unit (1) becomes available to be assigned to a double-checker. (d) Now, when 'Checker 2' requests work, assign exponent () work unit (1) to him for double-checking. Since this is work unit (1), there is as yet no intermediate file for exponent (). If, however, this was a subsequent work unit, and Checker 2 was *not* the person who performed the preceding work unit's test/check, then Checker 2 might OBTAIN the intermediate file from whoever did that preceding work unit. (e) When Checker 2 finishes work unit (1) of exponent (), his (work unit "final") residue is compared to the residue reported for work unit (1) by Tester 1. HOWEVER, 'Checker 2' saves the intermediate file (of how far he got with exponent ). (f) If the residues match, exponent () work unit (2) becomes available to be assigned to 'Tester 1'. Now, 'Tester 1' and 'Checker X' repeat steps (b) - (e), until the residues of the FINAL work unit on exponent () agree. If the residues of the most recently tested AND checked work unit of exponent () do not match, then that work unit becomes available to be assigned to a triple-checker. (g) Now, when 'Checker 3' requests work, assign the diverging work unit within exponent () to him for re-checking. Checker 3 might OBTAIN from Tester 1 the intermediate file for exponent () as of two work units back. Checker 3 will perform the first of those two work units (i.e., the one previous to the divergence) to verify that his (work unit "final") residue matches the (known good) one from Tester 1. (h) Checker 3 will then go on to triple-check the diverging work unit. When Checker 3 finishes that work unit of exponent (), his (work unit "final") residue is compared to the residue reported by Tester 1. (i) If the residues match, the next work unit of exponent () becomes available to be assigned to 'Tester 1', and testing and checking continue the same as with a successful step (f). If Checker 3's (work unit "final") residue does not match Tester 1's, but does match Checker 2's, then exponent () is un-reserved from 'Tester 1', and is instead RESERVED to 'Checker 2'. Testing and checking of the work units of () continue with step (i), where 'Checker 2' replaces 'Tester 1'. (k) If Checker 3's (work unit "final") residue on the diverging work unit of exponent () matches neither Tester 1's nor Checker 2's, maybe the best thing to do is to start all over with a new set of participants. Comment: If the breaking down of a long LL computation into discrete units of work is adopted, thought might be given to having prime95 generate a "universal interchange" intermediate file at the end of a unit of work - in other words, special output that would be acceptable as input across a number of operating systems and/or hardware chip types. Note: For the sake of "fairness", becoming the 'reserved owner' on a 10-unit exponent might require 10 'credits' earned through doing checking. In article <[EMAIL PROTECTED]>, Lucas Wiman <[EMAIL PROTECTED]> wrote: > > This idea is rather obvious, and no, I don't remember seeing it either. > > This had been discussed earlier. Brian and I talked about it for a little > while, he came up with the original idea. > > > I think the idea has definite merit. If an error does occur, it's equally > > likely to happen at any step along the way, statistically. Errors are every > > bit as likely to happen on the very first iteration as they are during the > > 50% mark, or the 32.6% mark, or on the very last iteration. > > True, but if the system is malfunctioning then the errors should start > early. > > > Especially as the exponents get larger and larger, I see a *definite* > > possibility to reduce double check t
Re: Mersenne: Multiple residues - enhancing double-checking, test factoring
- Original Message - From: Eric Hahn <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, August 05, 1999 9:05 AM Subject: RE: Mersenne: Multiple residues - enhancing double-checking > > The only thing this scheme would accomplish is to > allow a third test to begin (if a mismatch is found > somewhere along the way) before the second one > finishes. If the third test did match the first, > there would be savings in discarding the second. This thought may sound awfully like poaching, but here goes anyway... Some folks want to do first time LL tests. Some folks ask for best suited tasks -- I do. If I have a machine doing double-checks I really do not know or care if it happens to start the double-check the same day as the first time LL test starts. Primenet can track if I am the first LL or the check LL. If the check finishes before the first LL on a prime Mp, George will need to set the rules. Perhaps both parties get co-discoverer status rather than discoverer status. This could make concurrent double-checking a lot more attractive. Concurrent double checking might speed up some of the activities, especially finding the need for triple checks. >Current trial-factoring saves 12.5% of the total time >that would be spent without any trial-factoring. > >If we were to trail-factor to say, 2^70 (which should >cause factor time to increase to 1 month), without any >additional improvement to exponents being factored this >will cause a drop to about 8.8% overall time saved. Another thought: does the factor test already use Goldbach's conjecture? Someone with more cleverness and time than I have right now might think about using this fact: Given p,q integers and r = p+q. Then Mr = Mp.Mq + Mp + Mq This is just: Mp.Mq = (2^p - 1)(2^q - 1) = 2^(p+q) - 2^p - 2^q + 1 = Mr - Mp - Mq So what, right? Suppose we want to factor test Mr. We can (by trial and error) find primes p and q with r = p+q. Usually there are lots. (Hey, if we don't then we just found a counterexample to Goldbach's conjecture. That would be big news, too.) Mr, Mp and Mq are relatively prime in pairs, so none of the factors of Mp or Mq divide Mr. Now, if F is a test factor in the range, suppose we keep a database of Fp = Mp mod F. This is big, but once we have r=p+q, we just calculate Fp.Fq + Fp +Fq mod F. If 0, then F divides Mr. One practical question is how bloody big is this collection of Fp's? Could this make test factoring run faster? Joth _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
Based on previous messages in this thread, I'm going to throw in my 2 cents... (and avoid a lot of quoting!) While the reporting of partial residues at specific points, say every 10% (or 5% over 20M), isn't a bad idea, it won't necessarily save a whole lot of time. The only thing this scheme would accomplish is to allow a third test to begin (if a mismatch is found somewhere along the way) before the second one finishes. If the third test did match the first, there would be savings in discarding the second. What if, however, the third test matched the second? Both tests would have to complete, and some safeguard would have to be a place should the third test complete before the second. This will increase the logistics a lot (but not make it impossible). The client would be required to report most of the information to the server, without receiving much from the server (a stop, go, or discard maybe). A checksum figure as part of the work file, as somebody suggested (shown below), wouldn't work well if the latter case above occurred (2nd and 3rd test match). >Double-Check= >M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA. Much more than the logistics, however, would be the confusion facter. This would likely increase exponentially, especially when attempting to determine the current overall status of the project. We might even have to have George committed eventually, when he starts running his fingers up and down across his lips or rubs the side of his head to the bone with his fingers or palm of his hand... :-( I think as the time increases for each LL test, there would be much more time savings in attempting to do higher trial-factoring. Let's look at some figures using the following: 1) We assume to be using a PII 233MHz. 2) an LL test takes a year (around 19.3M) (AND HEY! This is 4 to 5 years away!!) 3) current trial-factoring to 2^64 takes 2 days if no factor is found. 4) currently, only about 13% of exponents are actually factored, with the rest requiring an LL test. 5) 2 LL tests are done for each exponent (an original and a double-check) Current trial-factoring saves 12.5% of the total time that would be spent without any trial-factoring. If we were to trail-factor to say, 2^70 (which should cause factor time to increase to 1 month), without any additional improvement to exponents being factored this will cause a drop to about 8.8% overall time saved. However, for each % improvement in exponents being factored, overall time saved would increase by a little over 1%. As a result, if we were to improve to 20% of the exponents being factored, we would save close to 16% of the overall time spent without any factoring. Of course, even more time would be saved by factoring, if an exponent that would have been factored had it been trial-factored to 2^70, was to be determined to need a triple-check during LL testing because it wasn't factored during a trail-factor process to 2^64 In conclusion, even with a new algorithm that halves the time for an LL test, 1 month to factor an exponent still beats a year to perform an LL test and its corresponding double-check. In addition, it should be easier to speed up the factoring process than the LL testing process. _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
From: Lucas Wiman <[EMAIL PROTECTED]> Subject: RE: Mersenne: Multiple residues - enhancing double-checking To: [EMAIL PROTECTED] Date sent: Wed, 4 Aug 1999 19:19:56 -0400 (EDT) > > >Double-Check: > >M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA. > > This scheme makes almost no sense for normal double checking. This is becuase > it would save *no* time at all. Think about it, even if you identify that an > error ocurred in the second week of a 3-month test, The errors can occur with the same probability during de double-check. So, when a mismatch is found, the double-check stops this exponent and a triple check is started on another computer (hopefully with another program). The triple check decides who is wrong. If the double check were wrong you saved a partial LLTest. If the first check is wrong, the double check continues just when it stopped, this time as a first time check. In any case, the triple check turns to be a double check. Check it ! :-) After all, maybe the major advantage of all this is the ability to catch bugs early, when some people volunteers to run simultaneous LLTests with different programs. As time passes, the coordinator can compare results quickly. Note that some bugs can be very subtle. Nothing replaces the testing on the field. I would like to know how George notified the v17 bug. > you still have to run it > to completion, and a third test must also be run. (So 3 LL tests must still > be run if an error ocurrs). Correct. > >This schema makes possible simultaneous checking, > > though. But the start-stop mechanism you describe has little > > sense. Sorry, I mean "has little sense given the individualism of certain GIMPS members". A lot of people says "It's _my_ exponent. Dare to work on it before I finish !" Practical or not, I respect this attitude. Saludos Oscar Fuentes _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
At 03:01 PM 1999/08/04 -0600, "Aaron Blosser" <[EMAIL PROTECTED]> wrote: >I think we could keep it simple by just saving every x% iteration's residue >(in truncated form). Using a WAG of saving it every 10% along the way, >you'd only have 9 partial and 1 full residue when all is said and done. > >So for an exponent like 8027219, you'd save the partial residue at the 10%, >or 802722th iteration (rounding up or down as normal). Of course, the >number of iterations varies just slightly from the exponent, but >whatever...you get the idea. > >These partial residues could be sent (for the first LL test) during the >check-ins, or saved up and all sent at once when the test is done. The 2^n least significant bits of the residue make a good check value, since once the modulo operation begins to have an effect, the lsb's quickly change. It is essential to have agreement among different programs, on exactly which iterations' residue LSB's are to be saved. This encompasses both the save interval in iterations, and agreeing on whether the starting value of 4 is iteration 0, 1, 2, etc., as well as agreeing on the starting value, which can be other values besides 4 according to the math. It is more efficient but not necessary to have agreement on how many bits to save. Even n=4, 2 bytes, gives a fairly good error detection rate, but n=6, 8 bytes is much better! I suggest that the number of iterations between residue saves be made a function of the FFT length, since those scale with the runtime for a given cpu. As we move to exponents that require months even on fast machines, we will see an increase in the rate of erroneous results per exponent on the same hardware and same small error rate per execution hour. We will eventually reach exponents with runtimes long enough that the system hardware reliability drops significantly before an exponent completes, due to hardware aging! My feeling is that a P-90-month is about the right interval at which to kick out intermediate residues of 64-bits or so. In a case where two runs are taken to completion, for a large exponent, such that dozens of interim residues are recorded for the first LLtest and for the second, the third tie-breaker need only be run to the point where the first two diverge and the third agrees with one but not the other. On average this will save about half the third-test, which now we need about 1% of the time, but in a few years we may need much more often. By shortening the third-test duration before we can make a comparison, it halves the probability of the third-test being in error and so lessens the amount of time spent on fourth-tests. This saving will also become more significant at higher exponents. Total runtimes are going up drastically. When I joined GIMPS in mid 1996, runtimes per exponent were 12 hours (572k on a P-120). Soon we will have the capability to run exponents requiring more than 12 months on PIII-500's. >> Might the v.17 problem have been trapped with something like >> this? I do not >> recall enough of the discussion to know and the ensuing belly-aching >> overshadowed the real content of finding/fixing/reworking. (I know I am >> never going to rise high on the list, so I do not worry a whole lot about >> how much my report shows.) There is another trick for dealing with the v17 shift bug, which I've suggested to George Woltman. Before the modulo kicks in, the rightmost 2n+2 bits should follow a set pattern that is calculatable for n< log(2)p or so. Math wizards, please be kind re errors in what follows. In hexadecimal, the first several iterations for any value P of current interest follows a predictable pattern. The pattern persists until the result of an iteration grows larger than Mp, and so is independent of P for a time. L(n+1)=Ln^2-2; (using the convention here that we iterate from 2 to n) n hex Ln binary Ln 2 4 3 E 4 C2 1100 0010 5 93021001 0011 0010 6 5468 4C020101 0100 0110 1000 0100 1100 0010 7 1BD6 96D9 F03D 3002 ... 1101 0011 0010 least significant 64-bits only below: 8 8cc8 8407 a9f4 c002 9 5559 9f9d 37d3 0002 10 f460 d65d df4c 0002 11 e180 d807 7d30 0002 12 9bdb 491d f4c0 0002 13 4ceb b477 d300 0002 etc for all n I found to check, where Mp> Ln, & Ln<=2^(2^(n-1)); Mp>2^(2^(n-1)) 2^p>2^(2^(n-1))+1 p>2^(n-1) log(2)p+1>n log(2)33,219,281=24.985>n log(2)5,000,000=22.25>n (In practice I've found the limits at low n occur at slightly lower p than expected.) That is, the rightmost 2(n+1) bits begin with binary 0011, then in the middle there are 2(n-2) zero bits, then a 1 and another 0. For exponents 16,777,215http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
>> That is to say when >> one computer finishes to X%, it reports its 64-bit residue to primenet, and >> waits for the second computer working on the same LL test to do the same. >> Until the other (slower) computer reports in, the (faster) computer works on >> another exponent. >Not at all. The first-time check goes its way, but reporting >partial residues to coordinator / primenet from time to time. Later, >often when first LLTest was finished long time ago, somebody >receives: >Double-Check: >M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA. This scheme makes almost no sense for normal double checking. This is becuase it would save *no* time at all. Think about it, even if you identify that an error ocurred in the second week of a 3-month test, you still have to run it to completion, and a third test must also be run. (So 3 LL tests must still be run if an error ocurrs). >This schema makes possible simultaneous checking, > though. But the start-stop mechanism you describe has little > sense. The method that you describe would only allow simultanious checking if the computers were of equal speed, or if one kept working on the same exponent, and the other computer kept getting further behind. The scheme that I described (and Brian thought up) would allow the computers to run at the same exponent/time, while still keeping busy. Sorry about my bad english, and it's even my first language! -Lucas _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
> I like the thread of saving multiple residues at various checkpoints along > the way. George suggested a % completion series. I might suggest a > specific series of points -- like every L(1000k). This might be > simpler to > track in a database although the number of entries grows linearly > with p so > the data storage might grow with p^2, depending. Another series like > k*floor(p/s) would work just as well and keep the data needs smaller as it > would have just s+1 checkpoints (s can be fixed for all p). I think we could keep it simple by just saving every x% iteration's residue (in truncated form). Using a WAG of saving it every 10% along the way, you'd only have 9 partial and 1 full residue when all is said and done. So for an exponent like 8027219, you'd save the partial residue at the 10%, or 802722th iteration (rounding up or down as normal). Of course, the number of iterations varies just slightly from the exponent, but whatever...you get the idea. These partial residues could be sent (for the first LL test) during the check-ins, or saved up and all sent at once when the test is done. > Might the v.17 problem have been trapped with something like > this? I do not > recall enough of the discussion to know and the ensuing belly-aching > overshadowed the real content of finding/fixing/reworking. (I know I am > never going to rise high on the list, so I do not worry a whole lot about > how much my report shows.) I don't think so, since it produced the wrong data right from the start, regardless. A double-check on a different platform (non GIMPS code like MacLucas) would have caught it earlier though I suppose. _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Multiple residues - enhancing double-checking
It seems like there may be three (or more) kinds of problem: - round-off error gets too large and creates an error (and other unknown software issues) - specific hardware failures producing errors - random occurrences producing errors mprime95 and the numerous ports and alternatives in GIMPS (I am not trying to be callous towards Mac or Unix/Linux users -- I am too ignorant to be callous) seem well able to catch several intermittant bugs in software. I know that I am frequently (well, several times a month on one machine or another) hit with SUMOUT errors. mprime95 just picks up at an earlier point and restarts. Occasionally the sumout error repeats but it is rarely at the same iteration. Something goes wrong there, but I cannot tell precisely what. Often the problem seems to crop up when some other piece of software is misbehaving -- perhaps there is some memory violation that Windows does not trap (ah, gee, ya think?). It seems likely that each of these events are pretty much independent. Both expected specific hardware failures and random occurrences seem proportional to running time. Each machine and environment would have its own probabilities (hard to know in advance -- probably hard to know at all). The possible software glitches seem to be proportional to iterations and as several comments observe, the machine (or processor) specific failure rates may increase with CPU speed. Still, I thought I saw that mprime95 algorithms for Mp run in something on the order of p^2 ( log p) steps. When p doubles, the LL runtime should go up by a factor slightly higher than 4 (asymptotically 4, and less than 4.2 at p>8 million). Stirring this melange together, if we quadruple p (to get from a lot of current testing up to 33 million) and double processor/RAM speed (well, my fastest machine is a P-II 266Mhz with a 66Mhz bus -- moving to 550/100 might double the throughput with available parts), the runtime would increase by about a factor of 8. I would naively expect the failure rate to increase by a factor between 8 and 16, although it could be higher or lower because those pesky probabilities all change. Also, if the probability of a failure in any step is r and there are n steps, then the probability of a clean run is really 1-(1-r)^n which has lead term nr (and the probability is less than nr, of course). If we expect a 1% failure rate at 8M and r stays fixed, then we expect about a 4% failure at around 33M. (.99^4 is about .96 = 1 - .04). I like the thread of saving multiple residues at various checkpoints along the way. George suggested a % completion series. I might suggest a specific series of points -- like every L(1000k). This might be simpler to track in a database although the number of entries grows linearly with p so the data storage might grow with p^2, depending. Another series like k*floor(p/s) would work just as well and keep the data needs smaller as it would have just s+1 checkpoints (s can be fixed for all p). All of the steps saved should be saved for both 1st and 2nd run, as George suggested. There is no point to stopping a 2nd run at the first difference although there may be great value in starting a 3rd run as soon as possible after the 2nd fails to match the first. If the third run pops up different from both the 1st and 2nd run, primenet should send someone a cry for help: too many mismatches suggest something strangely wrong. Might the v.17 problem have been trapped with something like this? I do not recall enough of the discussion to know and the ensuing belly-aching overshadowed the real content of finding/fixing/reworking. (I know I am never going to rise high on the list, so I do not worry a whole lot about how much my report shows.) One way of testing a new version would be by double checking current and prior version data. In fact, I would expect that the quality assurance group plans to use double-checking as a post-beta test stage. The data base saves could let a lot of us help out on that last stage before a full release. I know I would be happy to let my double-checking machines do new version testing. Joth - Original Message - From: Aaron Blosser <[EMAIL PROTECTED]> To: Mersenne@Base. Com <[EMAIL PROTECTED]> Sent: Wednesday, August 04, 1999 6:18 AM Subject: RE: Mersenne: Multiple residues - enhancing double-checking > > This had been discussed earlier. Brian and I talked about it for a little > > while, he came up with the original idea. > > Doh! Curse my memory! :-) > > > > I think the idea has definite merit. If an error does occur, > > it's equally > > > likely to happen at any step along the way, statistically. > > Errors are every > > > bit as likely to happen on the very first iteration as they are > > during the > > > 50% mark, or the 32.6% mark, or on the very last iteration. > > > > True, but if the system is malfu
RE: Mersenne: Multiple residues - enhancing double-checking
> This had been discussed earlier. Brian and I talked about it for a little > while, he came up with the original idea. Doh! Curse my memory! :-) > > I think the idea has definite merit. If an error does occur, > it's equally > > likely to happen at any step along the way, statistically. > Errors are every > > bit as likely to happen on the very first iteration as they are > during the > > 50% mark, or the 32.6% mark, or on the very last iteration. > > True, but if the system is malfunctioning then the errors should start > early. Even more reason why it makes sense. > > Just for example, every 10% along the way, it'll send it's > current residue > > to the Primenet server. > > I'm guessing that you mean a certain amount of the residue. Sending in > 10 2meg files for *each* exponent in the 20,000,000 range would get very > unwieldy, and inconvenient for people and primenet. Just a partial residue, like the one sent at the end of the test. Even smaller ones, like a 32 bit instead of 64 bit residue seems like it would do the job splendidly. > > I forget the numbers being tossed around, > > but you'd only save 50% of (the error rate) of the > > checking time. > > As I pointed out above, the error rate should increase with the > square of the > exponent (plus change). This means that if 1% have errors at > 7mil, 22% will > have errors at 30mil. Frightening to think so. Are you sure the error rate increases? Errors seem like they'd show up more as a result of faulty hardware, to my thinking. I'd imagine that if a certain machine ran through about 10 10M exponent error free, it has a very high likelihood of running a single 20M exponent error free. _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
From: Lucas Wiman <[EMAIL PROTECTED]> Subject: RE: Mersenne: Multiple residues - enhancing double-checking Date sent: Wed, 4 Aug 1999 03:26:48 -0400 (EDT) > True, but if the system is malfunctioning then the errors should start > early. If the program is buggy, too. (v17 ghost is here) > > Just for example, every 10% along the way, it'll send it's current residue > > to the Primenet server. > > I'm guessing that you mean a certain amount of the residue. Sending in > 10 2meg files for *each* exponent in the 20,000,000 range would get very > unwieldy, and inconvenient for people and primenet. Aaron has said the same as me, using better grammar :-) . Obviously, only a few bytes are needed. > Of course, this would only help if we were running more one test for the > same exponent at the same time [...] The simultaneous checks are interesting only on the first stages of v19 or other "new" testers. When a complete double check will last 6 months in a new FFT size, while the author waits for a result, a lot of people will be using it. [...] > That is to say when > one computer finishes to X%, it reports its 64-bit residue to primenet, and > waits for the second computer working on the same LL test to do the same. > Until the other (slower) computer reports in, the (faster) computer works on > another exponent. Not at all. The first-time check goes its way, but reporting partial residues to coordinator / primenet from time to time. Later, often when first LLTest was finished long time ago, somebody receives: Double-Check: M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA. The partial residues corresponds to X iterations first, 2*X iterations second, etc. X is fixed for all participants. Or use percentages instead, as Aaron said. An absence of a residue between two colons means that is unknown. With a so long and sensitive data streams (200 bytes typical) the CRC is a must to detect accidental modifications of the Worktodo file. This schema makes possible simultaneous checking, though. But the start-stop mechanism you describe has little sense. > This would speed up the entire project, but it would slow down the individual > exponent, which would make people mad :(. My schema only would speed double checking. The first- time LLTest process is untouched, except of the reporting of intermediate partial residues. > As I pointed out above, the error rate should increase with the square of the > exponent (plus change). This means that if 1% have errors at 7mil, 22% will > have errors at 30mil. That's the matter with the idea: an estimate of time saved (on double checking!) should be made. The authors of tester apps should estimate the early bug catching features of this schema, too. Then, decide if its worth or not. I'm doing double checking. I think about my computer running for 6 months when the error is on the second week. Argh! Saludos Oscar Fuentes. P.D.: I promise to improve my English ;-) _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
> This idea is rather obvious, and no, I don't remember seeing it either. This had been discussed earlier. Brian and I talked about it for a little while, he came up with the original idea. > I think the idea has definite merit. If an error does occur, it's equally > likely to happen at any step along the way, statistically. Errors are every > bit as likely to happen on the very first iteration as they are during the > 50% mark, or the 32.6% mark, or on the very last iteration. True, but if the system is malfunctioning then the errors should start early. > Especially as the exponents get larger and larger, I see a *definite* > possibility to reduce double check times by having first time LL tests > report residues at certain "percentages" along the way. Yeah. The error rate should be proportional to the runtime which is increases with the square of the exponent (ouch!). > Just for example, every 10% along the way, it'll send it's current residue > to the Primenet server. I'm guessing that you mean a certain amount of the residue. Sending in 10 2meg files for *each* exponent in the 20,000,000 range would get very unwieldy, and inconvenient for people and primenet. Of course, this would only help if we were running more one test for the same exponent at the same time (otherwise, this would just be a pointless way to do a triple check). They would either have to be coordinated (running at the same time, logistical knightmare), or (as Brian suggested) have a "pool" of exponents running on one computer. That is to say when one computer finishes to X%, it reports its 64-bit residue to primenet, and waits for the second computer working on the same LL test to do the same. Until the other (slower) computer reports in, the (faster) computer works on another exponent. This would speed up the entire project, but it would slow down the individual exponent, which would make people mad :(. > I forget the numbers being tossed around, > but you'd only save 50% of (the error rate) of the > checking time. As I pointed out above, the error rate should increase with the square of the exponent (plus change). This means that if 1% have errors at 7mil, 22% will have errors at 30mil. -Lucas Wiman _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking
> This idea is rather obvious, but I don't remember anybody > mentioned it. This idea is rather obvious, and no, I don't remember seeing it either. > The schema (exponent, residue) were good when a double > check last a few days. But now, a lot of time can be saved on triple > checks if the partial residue of every X iterations were logged. X > could be a million or so. To avoid hard disk and network traffic > bloating, take only 16 bits. The final residue would be as always, of > course. > > When a discrepance is found, the double check stops (and > starts another exponent) while other machine (with different > sofware?) begins the triple check up the offending point, where we > knows (hopefully) who made the mistake. If was the first check, the > double check machine continues the LLTest were it stopped, using > the intermediate files. If the double check machine is wrong, the > triple check now turns to be a double check. > > Furthermore, it's reasonable to think that most errors on > LLTests occurs early, due to buggy hardware. So, the average time- > saving would be more than 50% for faulty results. I think the idea has definite merit. If an error does occur, it's equally likely to happen at any step along the way, statistically. Errors are every bit as likely to happen on the very first iteration as they are during the 50% mark, or the 32.6% mark, or on the very last iteration. Especially as the exponents get larger and larger, I see a *definite* possibility to reduce double check times by having first time LL tests report residues at certain "percentages" along the way. Just for example, every 10% along the way, it'll send it's current residue to the Primenet server. During the double check of this exponent, it's residues along the way are compared to the residues from the first time check (which could presumably be "checked out" along with the exponent itself). What happens when there's a mismatch? Well, first of all you've saved yourself, I suppose on average, 50% of the time needed to run a full test. Sometimes you'll notice a mismatch in the first 10% of the iterations, saving alot of time, sometime you might not notice until the very last iteration, but you get the idea. Now, the question is, what do you do when there is a mismatch? I'd guess that the current double-check be put on hold, reported back to Primenet, and reassigned to a different person for a "triple check" (different person ensures a different machine runs the 3rd test...maybe not necessary, but a darn good idea IMO). It will check up to the point of mismatch, see which one (the 1st or 2nd) it agrees with (perhaps neither...quadruple check time), then continue on. If the 3rd check agrees with the 1st, then the 3rd machine should finish up and make sure there were no more errors in the 1st check. However, if the 3rd check agrees with the 2nd check instead, then both the 2nd and 3rd checkers should finish all iterations and check for total agreement. Maybe I'm overcomplicating things (definitely), but that's a rough guess as to how it might work. Obviously, this needs thinking, and Primenet would need to handle this sort of stuff, along with the clients. But just think how much time could be saved when both first and second tests disagree. If all goes well, the 1st and 2nd tests will match and it took no longer than it takes now. But in those sort of rare cases where a 3rd test is needed, you've saved ALOT of time by finding the problem early on. Like I said, as the exponents get larger, the payoff for doing this, in terms of CPU time, will MORE than make up for the hassles of reorganizing how Primenet and the clients work. Agree? Disagree? Comments would be nice. I like the idea, personally. Good thinking Oscar. Aaron _ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers