Mersenne Digest Wednesday, August 4 1999 Volume 01 : Number 609 ---------------------------------------------------------------------- Date: Tue, 03 Aug 1999 12:57:28 -0400 From: Jud McCranie <[EMAIL PROTECTED]> Subject: Mersenne: error 12029 This may have been answered before, but I got Primenet error 12029 when connecting manually. What is this? +----------------------------------------------+ | Jud "program first and think later" McCranie | +----------------------------------------------+ _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 3 Aug 1999 11:36:29 -0700 From: "Joth Tupper" <[EMAIL PROTECTED]> Subject: Re: Mersenne: error 12029 Same here, I think the messages are not getting through. This is intermittant so I think that any of the usual suspects could apply (server down, server too busy, weather too nice....) I think of this the way I thought of GPF's under Windows 3.0: "Oh, ###%^&#!!" Found this in the FAQ (hey, I did not think to look either until I saw your question): <<What does PrimeNet error 12029 mean? Can't find PrimeNet server. If you are using PrimeNet for the first time through a proxy server or firewall, check your web browser settings for the proxy address and port number, and make sure the primenet.ini file settings match this. Most often the 'http://' part of the ProxyHost URL is omitted, or the port # is wrong, or the wrong proxy server is named. If the error still happens, there may some other issue preventing your computer from connecting to PrimeNet. E-mail us at [EMAIL PROTECTED] about it.>> Right now I am trying to figure out why one computer (just a P5MMX-166) locks up every 5 minutes in one room and works fine (10.5 hours since last boot running mprime95) on my test bench. Not getting the primenet connection to work reliably is really confusing. (Is it real or is it memo-wrecked?) Joth - ----- Original Message ----- From: Jud McCranie <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, August 03, 1999 9:57 AM Subject: Mersenne: error 12029 > This may have been answered before, but I got Primenet error 12029 when > connecting manually. What is this? > > +----------------------------------------------+ > | Jud "program first and think later" McCranie | > +----------------------------------------------+ > > > _________________________________________________________________ > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm > Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers > _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 3 Aug 1999 13:38:12 -0600 From: "Aaron Blosser" <[EMAIL PROTECTED]> Subject: RE: Mersenne: error 12029 > Same here, I think the messages are not getting through. This is > intermittant so I think that any of the usual suspects could apply (server > down, server too busy, weather too nice....) The entropia.com site seemed to be down some earlier today. Back up now, for the time anyway. I think that's what done caused it :-) > Right now I am trying to figure out why one computer (just a P5MMX-166) > locks up every 5 minutes in one room and works fine (10.5 hours since last > boot running mprime95) on my test bench. Not getting the primenet > connection to work reliably is really confusing. (Is it real or is it > memo-wrecked?) Believe it or not, I've had countless problems with that and, remarkably, it ended up being the flourescent light fixtures near the desk. It had me going there, wondering why it works okay in the lab, but when it's on that guy's desk, no go. I've even seen problems with some other guy's computer in the next cube was leaking so much RF that it must have been interfering or something. All I knew in that case was that moving the turning off the "suspicious" computer made the other one more reliable. Weird. What's that Oriental art of placing things in a room or house to achieve harmony? Feng Shui or something like that? :-) Aaron _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 3 Aug 1999 14:01:47 -0700 From: "Joth Tupper" <[EMAIL PROTECTED]> Subject: Re: Mersenne: error 12029 - ----- Original Message ----- From: Aaron Blosser <[EMAIL PROTECTED]> To: Mersenne@Base. Com <[EMAIL PROTECTED]> Sent: Tuesday, August 03, 1999 12:38 PM Subject: RE: Mersenne: error 12029 > > Same here, I think the messages are not getting through. This is > > intermittant so I think that any of the usual suspects could apply (server > > down, server too busy, weather too nice....) > > The entropia.com site seemed to be down some earlier today. Back up now, > for the time anyway. I think that's what done caused it :-) Sounds good. > > Right now I am trying to figure out why one computer (just a P5MMX-166) > > locks up every 5 minutes in one room and works fine (10.5 hours since last > > boot running mprime95) on my test bench. Not getting the primenet > > connection to work reliably is really confusing. (Is it real or is it > > memo-wrecked?) > > Believe it or not, I've had countless problems with that and, remarkably, it > ended up being the flourescent light fixtures near the desk. It had me > going there, wondering why it works okay in the lab, but when it's on that > guy's desk, no go. Wild. For me, the P5 is working when it is near my flourescent. But point taken -- the interference could be anything. I have been thinking that it might be either room temperature or bad line filtering or a flaky power supply or boards separating (linked to room temp again). All are possible, but the list is open-ended. > I've even seen problems with some other guy's computer in the next cube was > leaking so much RF that it must have been interfering or something. All I > knew in that case was that moving the turning off the "suspicious" computer > made the other one more reliable. Weird. Maybe it is the monitor -- an old IBM 14" (or smaller), generally pretty nice, but never know. The P5 parts and the system board in the nearby computer where the hangs occur used to be in cases on top of each other in my office. They were my main desktop units for about a year. The P5 had glitches turning on, but once it turned on the Video and HDD, all was generally well. I think the video and HDD not turning on might be system board related, but the hanging is new. Maybe they want to be closer together! (Symbiotic processing?) Could also be as prosaic as some minor corrosion that maybe I have futzed enough to chip. Well, I guess I will just have to run it back to the "remote" location and see if it takes up its old habits. My wife is developing a sincere distaste for my interior decorating sense: Oh, you want the case attached to the chassis? But it does so nicely as an accent piece either alongside or upside-down on top... > What's that Oriental art of placing things in a room or house to achieve > harmony? Feng Shui or something like that? :-) > > Aaron Sounds good. My old references describe feng-shui ("wind-water") as a belief that we can calculate the good or evil in things by examining the topology of the land. Sounds like a good beginning to microwave circuitry to me. Thanks. "Ah, sweet mystery of life..." Joth > > _________________________________________________________________ > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm > Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers > _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 4 Aug 1999 00:03:04 +2 From: "Oscar Fuentes" <[EMAIL PROTECTED]> Subject: Mersenne: Multiple residues - enhancing double-checking Hello all. This idea is rather obvious, but I don't remember anybody mentioned it. The schema (exponent, residue) were good when a double check last a few days. But now, a lot of time can be saved on triple checks if the partial residue of every X iterations were logged. X could be a million or so. To avoid hard disk and network traffic bloating, take only 16 bits. The final residue would be as always, of course. When a discrepance is found, the double check stops (and starts another exponent) while other machine (with different sofware?) begins the triple check up the offending point, where we knows (hopefully) who made the mistake. If was the first check, the double check machine continues the LLTest were it stopped, using the intermediate files. If the double check machine is wrong, the triple check now turns to be a double check. Furthermore, it's reasonable to think that most errors on LLTests occurs early, due to buggy hardware. So, the average time- saving would be more than 50% for faulty results. Saludos Oscar Fuentes _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 3 Aug 1999 17:31:15 -0600 From: "Blosser, Jeremy" <[EMAIL PROTECTED]> Subject: RE: Mersenne: error 12029 > > > Right now I am trying to figure out why one computer > (just a P5MMX-166) > > > locks up every 5 minutes in one room and works fine (10.5 > hours since > last > > > boot running mprime95) on my test bench. Not getting the primenet > > > connection to work reliably is really confusing. (Is it > real or is it > > > memo-wrecked?) > > > > Believe it or not, I've had countless problems with that > and, remarkably, > it > > ended up being the flourescent light fixtures near the > desk. It had me > > going there, wondering why it works okay in the lab, but > when it's on that > > guy's desk, no go. > > Wild. For me, the P5 is working when it is near my > flourescent. But point > taken -- > the interference could be anything. I have been thinking > that it might be > either room > temperature or bad line filtering or a flaky power supply or boards > separating (linked to > room temp again). All are possible, but the list is open-ended. > Well, if its hot, I'd say room temp. Might want to check the fans etc. I had a motherboard that refused to boot once, and it was because of a flaky power supply, so thats always a possibility. > > My wife is developing a sincere distaste for my interior > decorating sense: > Oh, you want > the case attached to the chassis? But it does so nicely as > an accent piece > either alongside > or upside-down on top... > Heh, hey Aaron, remember when I had that old 486 mounted on the wall in my old "office"? The entire computer was mounted on the wall, power supply, disk drives, etc. It was kinda funny, I think we did it cuz we were bored one day. Oh well, that just reminded me of that... > > What's that Oriental art of placing things in a room or > house to achieve > > harmony? Feng Shui or something like that? :-) > > > > Aaron > > Sounds good. My old references describe feng-shui ("wind-water") as a > belief > that we can calculate the good or evil in things by examining > the topology > of the land. > > Sounds like a good beginning to microwave circuitry to me. > > Thanks. "Ah, sweet mystery of life..." > > Joth > Yeh, Feng-Shui is the art of placement of furniture and such. Like having your desk facing out across the building its in since it is a position of "power" (you are overlooking the building that way). And other things like hanging chimes in windows to ward off the bad spirits that get in thru the windows, and the weird sand box thingies as well as the cool little fountains and stuff. >From an aesthetic sense, I like Feng Shui... _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 3 Aug 1999 23:32:31 -0600 From: "Aaron Blosser" <[EMAIL PROTECTED]> Subject: RE: Mersenne: Multiple residues - enhancing double-checking > This idea is rather obvious, but I don't remember anybody > mentioned it. This idea is rather obvious, and no, I don't remember seeing it either. > The schema (exponent, residue) were good when a double > check last a few days. But now, a lot of time can be saved on triple > checks if the partial residue of every X iterations were logged. X > could be a million or so. To avoid hard disk and network traffic > bloating, take only 16 bits. The final residue would be as always, of > course. > > When a discrepance is found, the double check stops (and > starts another exponent) while other machine (with different > sofware?) begins the triple check up the offending point, where we > knows (hopefully) who made the mistake. If was the first check, the > double check machine continues the LLTest were it stopped, using > the intermediate files. If the double check machine is wrong, the > triple check now turns to be a double check. > > Furthermore, it's reasonable to think that most errors on > LLTests occurs early, due to buggy hardware. So, the average time- > saving would be more than 50% for faulty results. I think the idea has definite merit. If an error does occur, it's equally likely to happen at any step along the way, statistically. Errors are every bit as likely to happen on the very first iteration as they are during the 50% mark, or the 32.6% mark, or on the very last iteration. Especially as the exponents get larger and larger, I see a *definite* possibility to reduce double check times by having first time LL tests report residues at certain "percentages" along the way. Just for example, every 10% along the way, it'll send it's current residue to the Primenet server. During the double check of this exponent, it's residues along the way are compared to the residues from the first time check (which could presumably be "checked out" along with the exponent itself). What happens when there's a mismatch? Well, first of all you've saved yourself, I suppose on average, 50% of the time needed to run a full test. Sometimes you'll notice a mismatch in the first 10% of the iterations, saving alot of time, sometime you might not notice until the very last iteration, but you get the idea. Now, the question is, what do you do when there is a mismatch? I'd guess that the current double-check be put on hold, reported back to Primenet, and reassigned to a different person for a "triple check" (different person ensures a different machine runs the 3rd test...maybe not necessary, but a darn good idea IMO). It will check up to the point of mismatch, see which one (the 1st or 2nd) it agrees with (perhaps neither...quadruple check time), then continue on. If the 3rd check agrees with the 1st, then the 3rd machine should finish up and make sure there were no more errors in the 1st check. However, if the 3rd check agrees with the 2nd check instead, then both the 2nd and 3rd checkers should finish all iterations and check for total agreement. Maybe I'm overcomplicating things (definitely), but that's a rough guess as to how it might work. Obviously, this needs thinking, and Primenet would need to handle this sort of stuff, along with the clients. But just think how much time could be saved when both first and second tests disagree. If all goes well, the 1st and 2nd tests will match and it took no longer than it takes now. But in those sort of rare cases where a 3rd test is needed, you've saved ALOT of time by finding the problem early on. Like I said, as the exponents get larger, the payoff for doing this, in terms of CPU time, will MORE than make up for the hassles of reorganizing how Primenet and the clients work. Agree? Disagree? Comments would be nice. I like the idea, personally. Good thinking Oscar. Aaron _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 04 Aug 1999 02:18:57 EDT From: "Ethan M. O'Connor" <[EMAIL PROTECTED]> Subject: Mersenne: Intermediate Residues and doublechecking It seems to me that recording intermediate residues for comparison during double checking is not going to save much time; although you'll save on average 50% of the doublechecking time for exponents that are mistested (assuming even and uncorrelated distribution of the point in checking that the error occurs), mistested exponents are only a small fraction of the double checking work.. I forget the numbers being tossed around, but you'd only save 50% of (the error rate) of the checking time. HOWEVER, there is a potential use for the intermediate residues anyway... as larger and larger exponenets start getting tested, and algorithms start having to be modified to support the testing of those big exponents, we may run into a situation where a widespread subset of machines with some obscure hardware bug test a range of exponents wrongly... using intermediate residues during double checking would halve the time that would pass before the widespread error became obvious, and would reduce the amount of wasted work performed. - -Ethan O'Connor [EMAIL PROTECTED] _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 4 Aug 1999 03:26:48 -0400 (EDT) From: Lucas Wiman <[EMAIL PROTECTED]> Subject: RE: Mersenne: Multiple residues - enhancing double-checking > This idea is rather obvious, and no, I don't remember seeing it either. This had been discussed earlier. Brian and I talked about it for a little while, he came up with the original idea. > I think the idea has definite merit. If an error does occur, it's equally > likely to happen at any step along the way, statistically. Errors are every > bit as likely to happen on the very first iteration as they are during the > 50% mark, or the 32.6% mark, or on the very last iteration. True, but if the system is malfunctioning then the errors should start early. > Especially as the exponents get larger and larger, I see a *definite* > possibility to reduce double check times by having first time LL tests > report residues at certain "percentages" along the way. Yeah. The error rate should be proportional to the runtime which is increases with the square of the exponent (ouch!). > Just for example, every 10% along the way, it'll send it's current residue > to the Primenet server. I'm guessing that you mean a certain amount of the residue. Sending in 10 2meg files for *each* exponent in the 20,000,000 range would get very unwieldy, and inconvenient for people and primenet. Of course, this would only help if we were running more one test for the same exponent at the same time (otherwise, this would just be a pointless way to do a triple check). They would either have to be coordinated (running at the same time, logistical knightmare), or (as Brian suggested) have a "pool" of exponents running on one computer. That is to say when one computer finishes to X%, it reports its 64-bit residue to primenet, and waits for the second computer working on the same LL test to do the same. Until the other (slower) computer reports in, the (faster) computer works on another exponent. This would speed up the entire project, but it would slow down the individual exponent, which would make people mad :(. > I forget the numbers being tossed around, > but you'd only save 50% of (the error rate) of the > checking time. As I pointed out above, the error rate should increase with the square of the exponent (plus change). This means that if 1% have errors at 7mil, 22% will have errors at 30mil. - -Lucas Wiman _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 4 Aug 1999 14:31:24 +2 From: "Oscar Fuentes" <[EMAIL PROTECTED]> Subject: RE: Mersenne: Multiple residues - enhancing double-checking From: Lucas Wiman <[EMAIL PROTECTED]> Subject: RE: Mersenne: Multiple residues - enhancing double-checking Date sent: Wed, 4 Aug 1999 03:26:48 -0400 (EDT) > True, but if the system is malfunctioning then the errors should start > early. If the program is buggy, too. (v17 ghost is here) > > Just for example, every 10% along the way, it'll send it's current residue > > to the Primenet server. > > I'm guessing that you mean a certain amount of the residue. Sending in > 10 2meg files for *each* exponent in the 20,000,000 range would get very > unwieldy, and inconvenient for people and primenet. Aaron has said the same as me, using better grammar :-) . Obviously, only a few bytes are needed. > Of course, this would only help if we were running more one test for the > same exponent at the same time [...] The simultaneous checks are interesting only on the first stages of v19 or other "new" testers. When a complete double check will last 6 months in a new FFT size, while the author waits for a result, a lot of people will be using it. [...] > That is to say when > one computer finishes to X%, it reports its 64-bit residue to primenet, and > waits for the second computer working on the same LL test to do the same. > Until the other (slower) computer reports in, the (faster) computer works on > another exponent. Not at all. The first-time check goes its way, but reporting partial residues to coordinator / primenet from time to time. Later, often when first LLTest was finished long time ago, somebody receives: Double-Check: M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA. The partial residues corresponds to X iterations first, 2*X iterations second, etc. X is fixed for all participants. Or use percentages instead, as Aaron said. An absence of a residue between two colons means that is unknown. With a so long and sensitive data streams (200 bytes typical) the CRC is a must to detect accidental modifications of the Worktodo file. This schema makes possible simultaneous checking, though. But the start-stop mechanism you describe has little sense. > This would speed up the entire project, but it would slow down the individual > exponent, which would make people mad :(. My schema only would speed double checking. The first- time LLTest process is untouched, except of the reporting of intermediate partial residues. > As I pointed out above, the error rate should increase with the square of the > exponent (plus change). This means that if 1% have errors at 7mil, 22% will > have errors at 30mil. That's the matter with the idea: an estimate of time saved (on double checking!) should be made. The authors of tester apps should estimate the early bug catching features of this schema, too. Then, decide if its worth or not. I'm doing double checking. I think about my computer running for 6 months when the error is on the second week. Argh! Saludos Oscar Fuentes. P.D.: I promise to improve my English ;-) _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 4 Aug 1999 07:18:02 -0600 From: "Aaron Blosser" <[EMAIL PROTECTED]> Subject: RE: Mersenne: Multiple residues - enhancing double-checking > This had been discussed earlier. Brian and I talked about it for a little > while, he came up with the original idea. Doh! Curse my memory! :-) > > I think the idea has definite merit. If an error does occur, > it's equally > > likely to happen at any step along the way, statistically. > Errors are every > > bit as likely to happen on the very first iteration as they are > during the > > 50% mark, or the 32.6% mark, or on the very last iteration. > > True, but if the system is malfunctioning then the errors should start > early. Even more reason why it makes sense. > > Just for example, every 10% along the way, it'll send it's > current residue > > to the Primenet server. > > I'm guessing that you mean a certain amount of the residue. Sending in > 10 2meg files for *each* exponent in the 20,000,000 range would get very > unwieldy, and inconvenient for people and primenet. Just a partial residue, like the one sent at the end of the test. Even smaller ones, like a 32 bit instead of 64 bit residue seems like it would do the job splendidly. > > I forget the numbers being tossed around, > > but you'd only save 50% of (the error rate) of the > > checking time. > > As I pointed out above, the error rate should increase with the > square of the > exponent (plus change). This means that if 1% have errors at > 7mil, 22% will > have errors at 30mil. Frightening to think so. Are you sure the error rate increases? Errors seem like they'd show up more as a result of faulty hardware, to my thinking. I'd imagine that if a certain machine ran through about 10 10M exponent error free, it has a very high likelihood of running a single 20M exponent error free. _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 4 Aug 1999 09:32:56 -0400 From: "R. Kevin Moore" <[EMAIL PROTECTED]> Subject: Mersenne: intermediate checks Here are some ideas: 1) You don't need to mail back all the intermediate residues to see if they are matching - you only need to send a checksum, which could be as small as a few hundred bytes! 2) Users could elect how often to save the residue, by % or by iteration #, depending on their free hard drive space. 3) Users would send an intermediate residue -if- a checksum mismatch is found during a double check. (This only adds one additional mailing of a residual to the total net-traffic since work on that iteration would halt on the double check machine until a triple-checking of the exponent reaches the iteration in question. Then the residue [if any] matches one of the other two, then the testing or double-checking would continue - depending on which residue matched which. 4) In the future, local groups who manage exponents with local servers could be improved & optimized assuming local groups are connected with a network, and the server has plenty of hard drive space. The idea would be for the local groups to self-coordinate the factoring, LL & double-checking of exponents or groups of exponents, making use of intermediate residues (which will be fraught with more & more errors as exponents get larger). The local server would assign work according to speed & reliability, and would be able to pull a machine off a job to expedite double or triple checking. It would also be able to take additional work on other exponents, and would know what type of work can best be added into its work group based on it's total speed & reliability. I believe the total average savings (in my 3rd point) would be substantial - especially for large exponents. Note that this SLOWS the completion of an exponent, since it requires a wait for an assignment of a triple check to point X iteration of an exponent, and for results to be received back, until it's known if double checking (or the original LL test - or neither) is able to restart). However - the savings comes in efficiently using people's machines - machines would continue to work on other assignments in the meantime, and there is where the savings comes in. The savings from my 4th point works into this idea - it would eliminate the delay in assigning & starting the mid-point triple-check (assuming the work group had a free / almost-free machine), so total time for completed exponents wouldn't be hurt too much. This process could help save work from mid-point gimp quitters, by letting them at least send in their mid-point residue. (Or is this already done - I've never quit!) Comments? - -Kevin Moore PS. someone just sent out a note with a similar comment, so I'll leave it at that. :) _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 4 Aug 1999 10:13:29 -0700 From: "Joth Tupper" <[EMAIL PROTECTED]> Subject: Re: Mersenne: Multiple residues - enhancing double-checking It seems like there may be three (or more) kinds of problem: - - round-off error gets too large and creates an error (and other unknown software issues) - - specific hardware failures producing errors - - random occurrences producing errors mprime95 and the numerous ports and alternatives in GIMPS (I am not trying to be callous towards Mac or Unix/Linux users -- I am too ignorant to be callous) seem well able to catch several intermittant bugs in software. I know that I am frequently (well, several times a month on one machine or another) hit with SUMOUT errors. mprime95 just picks up at an earlier point and restarts. Occasionally the sumout error repeats but it is rarely at the same iteration. Something goes wrong there, but I cannot tell precisely what. Often the problem seems to crop up when some other piece of software is misbehaving -- perhaps there is some memory violation that Windows does not trap (ah, gee, ya think?). It seems likely that each of these events are pretty much independent. Both expected specific hardware failures and random occurrences seem proportional to running time. Each machine and environment would have its own probabilities (hard to know in advance -- probably hard to know at all). The possible software glitches seem to be proportional to iterations and as several comments observe, the machine (or processor) specific failure rates may increase with CPU speed. Still, I thought I saw that mprime95 algorithms for Mp run in something on the order of p^2 ( log p) steps. When p doubles, the LL runtime should go up by a factor slightly higher than 4 (asymptotically 4, and less than 4.2 at p>8 million). Stirring this melange together, if we quadruple p (to get from a lot of current testing up to 33 million) and double processor/RAM speed (well, my fastest machine is a P-II 266Mhz with a 66Mhz bus -- moving to 550/100 might double the throughput with available parts), the runtime would increase by about a factor of 8. I would naively expect the failure rate to increase by a factor between 8 and 16, although it could be higher or lower because those pesky probabilities all change. Also, if the probability of a failure in any step is r and there are n steps, then the probability of a clean run is really 1-(1-r)^n which has lead term nr (and the probability is less than nr, of course). If we expect a 1% failure rate at 8M and r stays fixed, then we expect about a 4% failure at around 33M. (.99^4 is about .96 = 1 - .04). I like the thread of saving multiple residues at various checkpoints along the way. George suggested a % completion series. I might suggest a specific series of points -- like every L(1000k). This might be simpler to track in a database although the number of entries grows linearly with p so the data storage might grow with p^2, depending. Another series like k*floor(p/s) would work just as well and keep the data needs smaller as it would have just s+1 checkpoints (s can be fixed for all p). All of the steps saved should be saved for both 1st and 2nd run, as George suggested. There is no point to stopping a 2nd run at the first difference although there may be great value in starting a 3rd run as soon as possible after the 2nd fails to match the first. If the third run pops up different from both the 1st and 2nd run, primenet should send someone a cry for help: too many mismatches suggest something strangely wrong. Might the v.17 problem have been trapped with something like this? I do not recall enough of the discussion to know and the ensuing belly-aching overshadowed the real content of finding/fixing/reworking. (I know I am never going to rise high on the list, so I do not worry a whole lot about how much my report shows.) One way of testing a new version would be by double checking current and prior version data. In fact, I would expect that the quality assurance group plans to use double-checking as a post-beta test stage. The data base saves could let a lot of us help out on that last stage before a full release. I know I would be happy to let my double-checking machines do new version testing. Joth - ----- Original Message ----- From: Aaron Blosser <[EMAIL PROTECTED]> To: Mersenne@Base. Com <[EMAIL PROTECTED]> Sent: Wednesday, August 04, 1999 6:18 AM Subject: RE: Mersenne: Multiple residues - enhancing double-checking > > This had been discussed earlier. Brian and I talked about it for a little > > while, he came up with the original idea. > > Doh! Curse my memory! :-) > > > > I think the idea has definite merit. If an error does occur, > > it's equally > > > likely to happen at any step along the way, statistically. > > Errors are every > > > bit as likely to happen on the very first iteration as they are > > during the > > > 50% mark, or the 32.6% mark, or on the very last iteration. > > > > True, but if the system is malfunctioning then the errors should start > > early. > > Even more reason why it makes sense. > > > > Just for example, every 10% along the way, it'll send it's > > current residue > > > to the Primenet server. > > > > I'm guessing that you mean a certain amount of the residue. Sending in > > 10 2meg files for *each* exponent in the 20,000,000 range would get very > > unwieldy, and inconvenient for people and primenet. > > Just a partial residue, like the one sent at the end of the test. Even > smaller ones, like a 32 bit instead of 64 bit residue seems like it would do > the job splendidly. > > > > I forget the numbers being tossed around, > > > but you'd only save 50% of (the error rate) of the > > > checking time. > > > > As I pointed out above, the error rate should increase with the > > square of the > > exponent (plus change). This means that if 1% have errors at > > 7mil, 22% will > > have errors at 30mil. > > Frightening to think so. Are you sure the error rate increases? Errors > seem like they'd show up more as a result of faulty hardware, to my > thinking. I'd imagine that if a certain machine ran through about 10 10M > exponent error free, it has a very high likelihood of running a single 20M > exponent error free. > > _________________________________________________________________ > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm > Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers > _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 4 Aug 1999 13:44:38 -0600 From: "Aaron Blosser" <[EMAIL PROTECTED]> Subject: RE: Mersenne: intermediate checks > Here are some ideas: > > 1) You don't need to mail back all the intermediate residues to see if > they are matching - you only need to send a checksum, which could be > as small as a few hundred bytes! or just 4 bytes for a 32 bit residue. Or why not 8 bytes for a nice, "safe", 64 bit residue. > 2) Users could elect how often to save the residue, by % or by > iteration #, depending on their free hard drive space. Better not let the user pick how often. Better to just let the client software "hardcode" at what points it saves residues, just to make sure everyone is doing it the same way. > This process could help save work from mid-point gimp quitters, by > letting them at least send in their mid-point residue. (Or is this > already done - I've never quit!) For someone to be able to pick up where someone else left off, it'd have to send the whole intermediate file to some server. Those files are getting larger and larger... _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 4 Aug 1999 15:01:39 -0600 From: "Aaron Blosser" <[EMAIL PROTECTED]> Subject: RE: Mersenne: Multiple residues - enhancing double-checking > I like the thread of saving multiple residues at various checkpoints along > the way. George suggested a % completion series. I might suggest a > specific series of points -- like every L(1000k). This might be > simpler to > track in a database although the number of entries grows linearly > with p so > the data storage might grow with p^2, depending. Another series like > k*floor(p/s) would work just as well and keep the data needs smaller as it > would have just s+1 checkpoints (s can be fixed for all p). I think we could keep it simple by just saving every x% iteration's residue (in truncated form). Using a WAG of saving it every 10% along the way, you'd only have 9 partial and 1 full residue when all is said and done. So for an exponent like 8027219, you'd save the partial residue at the 10%, or 802722th iteration (rounding up or down as normal). Of course, the number of iterations varies just slightly from the exponent, but whatever...you get the idea. These partial residues could be sent (for the first LL test) during the check-ins, or saved up and all sent at once when the test is done. > Might the v.17 problem have been trapped with something like > this? I do not > recall enough of the discussion to know and the ensuing belly-aching > overshadowed the real content of finding/fixing/reworking. (I know I am > never going to rise high on the list, so I do not worry a whole lot about > how much my report shows.) I don't think so, since it produced the wrong data right from the start, regardless. A double-check on a different platform (non GIMPS code like MacLucas) would have caught it earlier though I suppose. _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 4 Aug 1999 19:19:56 -0400 (EDT) From: Lucas Wiman <[EMAIL PROTECTED]> Subject: RE: Mersenne: Multiple residues - enhancing double-checking >> That is to say when >> one computer finishes to X%, it reports its 64-bit residue to primenet, and >> waits for the second computer working on the same LL test to do the same. >> Until the other (slower) computer reports in, the (faster) computer works on >> another exponent. > Not at all. The first-time check goes its way, but reporting >partial residues to coordinator / primenet from time to time. Later, >often when first LLTest was finished long time ago, somebody >receives: >Double-Check: >M23780981,64,863FF87,678676AA,FF637BC,[...],CRC:9923FDA. This scheme makes almost no sense for normal double checking. This is becuase it would save *no* time at all. Think about it, even if you identify that an error ocurred in the second week of a 3-month test, you still have to run it to completion, and a third test must also be run. (So 3 LL tests must still be run if an error ocurrs). > This schema makes possible simultaneous checking, > though. But the start-stop mechanism you describe has little > sense. The method that you describe would only allow simultanious checking if the computers were of equal speed, or if one kept working on the same exponent, and the other computer kept getting further behind. The scheme that I described (and Brian thought up) would allow the computers to run at the same exponent/time, while still keeping busy. Sorry about my bad english, and it's even my first language! - -Lucas _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ End of Mersenne Digest V1 #609 ******************************