Mersenne: Overclocking - bad for project?
There are occasional announcements about overclocking various processors, and I know that some Mersenne contributors describe their clock speed as xxx@yyy where yyyxxx obviously. However, surely this project is one where overclockers do more harm than good? When you're running your favourite game, it doesn't matter if a couple of incorrect calculations creep in, but the Mersenne project involves very long calculations with basically a boolean answer at the end. One wrong result during this time could ruin the answer. Now I know that the algorithms include a lot of error catching, but once the processor is run to the point of instability there could easily be errors in the error protection. (I'll try a probability analysis later... Basically we need the probability of one error occurring within a certain number of instructions of a previous error.) My opinion is that it's better to have fewer correct results than to have the central database poisoned by loads of "don't think it's prime, but the user was overclocking" results, which of course cannot be distinguished from perfect answers. I'd trade two unreliable answers for one honest result. (What ends up happening is even worse. Mismatching checksums mean that the tests must be repeated until a consensus is reached.) A high score table is brilliant, and excites all contributors, but unfortunately a few seem more interested in climbing the table than in what the project is about. If people want to run overclocked, they should work on a project which isn't so sensitive to noise, such as SETI (okay, hardly an original suggestion here). SETI takes a noisy input to begin with, and introducing the odd bit of noise won't harm the results that much. People whose machines show any sign of instability at all should really stick to factoring, although these are just the sort of people who'll be issued with primality tests because of the apparently high performance. I'm tempted to say: go and find another high score table to climb. So after all that, here's a suggestion: How about an error counting system in mprime/prime95? (Okay there might already be one but I haven't seen it mentioned anywhere.) Every time an error is detected, a counter is incremented, and the final result sent back to the server. An answer coming back with 200 errors might be considered less reliable than one with no errors at all. Yours, === Gareth Randall === _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Overclocking - bad for project?
At 10:31 AM 12/23/2000 +, Gareth Randall wrote: However, surely this project is one where overclockers do more harm than good? When you're running your favourite game, it doesn't matter if a couple of incorrect calculations creep in, but the Mersenne project involves very long calculations with basically a boolean answer at the end. ... I agree. I've never overclocked my computers because I think it is more important to be confident in the results. +-+ | Jud McCranie| | | | Programming Achieved with Structure, Clarity, And Logic | +-+ _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Overclocking - bad for project?
My opinion is that it's better to have fewer correct results than to have the central database poisoned by loads of "don't think it's prime, but the user was overclocking" results, which of course cannot be distinguished from perfect answers. I'd trade two unreliable answers for one honest result. (What ends up happening is even worse. Mismatching checksums mean that the tests must be repeated until a consensus is reached.) at one time I had a number of those ILLEGAL SUMOUT errors, it turned out to be caused by an errant internet multimedia plugin (Crescendo MIDI) which was somehow interfering with the pentium-II's FPU. This problem was specific to Windows95 too, I think, and went away with a later release of the kernel (win98 or 98SE fixed it, I think... it definately is not a problem in NT or Win2000). I think we all decided it was related to this plugin doing MMX processing at a interrupt basis without properly notifying the kernel or something similar to this. Anyways, I suspect the probability of a hardware error causing erroneous results without triggering MASSIVE numbers of check errors is slim-to-none. How many mismatched checksums does primenet have to reconcile on a ongoing basis? -jrp _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Mersenne: Re: Overclocking - bad for project?
On Sat, Dec 23, 2000 at 11:08:47AM -0500, Jud McCranie wrote: I agree. I've never overclocked my computers because I think it is more important to be confident in the results. As long as even George overclocks, I don't feel really guilty about my 400@448 machine (that has successfully completed a 72-hour-torture test before I put it into PrimeNet use)... /* Steinar */ -- Homepage: http://members.xoom.com/sneeze/ _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Mersenne: P-1 factoring only
L.S., I would enjoy it if it would be possible to have a prime95 version with a P-1 factoring assignments only option. My pc starts to crunch (literally, really, it starts to peep intermittantly like as if it needs lubricant) if I switch to LL testing. It would save time for the people prefering LL tests, just like the regular factoring only option does now. YotN, Henk _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Overclocking - bad for project?
On 23 Dec 00, at 10:31, Gareth Randall wrote: My opinion is that it's better to have fewer correct results than to have the central database poisoned by loads of "don't think it's prime, but the user was overclocking" results, which of course cannot be distinguished from perfect answers. I'd trade two unreliable answers for one honest result. (What ends up happening is even worse. Mismatching checksums mean that the tests must be repeated until a consensus is reached.) This is basically true. However, most systems are conservatively engineered; if care is taken (especially with regard to cooling) overclocking need not neccessarily result in unreliable systems. Systems which are not overclocked can still be unreliable for a number of reasons. IMHO undercooling (perhaps because a fan has stopped running, possibly without the knowledge of the user) represents at least as serious a problem as overclocking. Damage by static discharge to components (especially memory, and usually due to mishandling during assembly) is another possible cause of systems running less than perfectly reliably. On 23 Dec 00, at 10:15, John R Pierce wrote: at one time I had a number of those ILLEGAL SUMOUT errors, it turned out to be caused by an errant internet multimedia plugin (Crescendo MIDI) which was somehow interfering with the pentium-II's FPU. This problem was specific to Windows95 too, I think, and went away with a later release of the kernel (win98 or 98SE fixed it, I think... it definately is not a problem in NT or Win2000). I think we all decided it was related to this plugin doing MMX processing at a interrupt basis without properly notifying the kernel or something similar to this. Yes, software problems are a real possibility, especially in Win 9x, because the memory model used does not protect process memory properly. Anyways, I suspect the probability of a hardware error causing erroneous results without triggering MASSIVE numbers of check errors is slim-to-none. Unfortunately lack of errors in normal operation of Prime95 is not a good indicator of a really reliable system. Because the error check is run every iteration (instead of once every 128 iterations) and because the result is compared with a known value, the 16-hour self test, or the torture test, is better as a hardware reliability tool than running LL tests in "production" mode. A couple of years ago I had an instance of a P100 system with a blown CPU fan. It seems to have run for months with no detected errors. Eventually it did throw a couple of wobblies, but meanwhile it had submitted a couple of results which later turned out to be bad (mixed in with a pile of others which were OK). My only other known error was during a QA test involving a run on a very large exponent. It turned out that the system had glitched, probably only once - rerunning in segments revealed a discrepancy between iterations 8.3 million 8.4 million, but the other 16 of 17 million iteration segments were clean. This was on a well-cooled Athlon 650 system running Win 2K Professional, using ECC RAM. I don't know the cause. Possibly you can just get a hardware glitch something of the order of once a year. The point is that there was nothing in the log to show that the result might be suspect. Also bear in mind that there is no certainty that there is no undetected bug in either the software or the CPU. There seems little point in excluding results from systems which may possibly be less than perfectly reliable so long as other sources of error may exist. The double-checking mechanism _does_ work! How many mismatched checksums does primenet have to reconcile on a ongoing basis? George would need to answer this one, but the incidence of "bad" results submitted is something of the order of 1%. Maybe a bit higher, and maybe tending to rise with increasing exponent size (or increasing run times?) I do have a feeling that current systems are less conservatively engineered than they used to be years ago - the market is more competitive, and there is more consumer pressure for ever higher "performance numbers" than there once was. Seasonal felicitations Brian Beesley _ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers