Mersenne: Overclocking - bad for project?

2000-12-23 Thread Gareth Randall


There are occasional announcements about overclocking various processors, and I know 
that some Mersenne contributors describe their clock speed as xxx@yyy where yyyxxx 
obviously.

However, surely this project is one where overclockers do more harm than good? When 
you're running your favourite game, it doesn't matter if a couple of incorrect 
calculations creep in, but the Mersenne project involves very long calculations with 
basically a boolean answer at the end. One wrong result during this time could ruin 
the answer. Now I know that the algorithms include a lot of error catching, but once 
the processor is run to the point of instability there could easily be errors in the 
error protection. (I'll try a probability analysis later... Basically we need the 
probability of one error occurring within a certain number of instructions of a 
previous error.)


My opinion is that it's better to have fewer correct results than to have the central 
database poisoned by loads of "don't think it's prime, but the user was overclocking" 
results, which of course cannot be distinguished from perfect answers. I'd trade two 
unreliable answers for one honest result. (What ends up happening is even worse. 
Mismatching checksums mean that the tests must be repeated until a consensus is 
reached.)


A high score table is brilliant, and excites all contributors, but unfortunately a few 
seem more interested in climbing the table than in what the project is about. If 
people want to run overclocked, they should work on a project which isn't so sensitive 
to noise, such as SETI (okay, hardly an original suggestion here). SETI takes a noisy 
input to begin with, and introducing the odd bit of noise won't harm the results that 
much.


People whose machines show any sign of instability at all should really stick to 
factoring, although these are just the sort of people who'll be issued with primality 
tests because of the apparently high performance. I'm tempted to say: go and find 
another high score table to climb.


So after all that, here's a suggestion: How about an error counting system in 
mprime/prime95? (Okay there might already be one but I haven't seen it mentioned 
anywhere.) Every time an error is detected, a counter is incremented, and the final 
result sent back to the server. An answer coming back with 200 errors might be 
considered less reliable than one with no errors at all.


Yours,

=== Gareth Randall ===
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Overclocking - bad for project?

2000-12-23 Thread Jud McCranie

At 10:31 AM 12/23/2000 +, Gareth Randall wrote:

However, surely this project is one where overclockers do more harm than 
good? When you're running your favourite game, it doesn't matter if a 
couple of incorrect calculations creep in, but the Mersenne project 
involves very long calculations with basically a boolean answer at the end.

...

I agree.  I've never overclocked my computers because I think it is more 
important to be confident in the results.


+-+
| Jud McCranie|
| |
| Programming Achieved with Structure, Clarity, And Logic |
+-+


_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Overclocking - bad for project?

2000-12-23 Thread John R Pierce

 My opinion is that it's better to have fewer correct results than to have
the central database poisoned by loads of "don't think it's prime, but the
user was overclocking" results, which of course cannot be distinguished from
perfect answers. I'd trade two unreliable answers for one honest result.
(What ends up happening is even worse. Mismatching checksums mean that the
tests must be repeated until a consensus is reached.)

at one time I had a number of those ILLEGAL SUMOUT errors, it turned out to
be caused by an errant internet multimedia plugin (Crescendo MIDI) which was
somehow interfering with the pentium-II's FPU.  This problem was specific to
Windows95 too, I think, and went away with a later release of the kernel
(win98 or 98SE fixed it, I think... it definately is not a problem in NT or
Win2000).  I think we all decided it was related to this plugin doing MMX
processing at a interrupt basis without properly notifying the kernel or
something similar to this.

Anyways, I suspect the probability of a hardware error causing erroneous
results without triggering MASSIVE numbers of check errors is slim-to-none.

How many mismatched checksums does primenet have to reconcile on a ongoing
basis?

-jrp


_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Re: Overclocking - bad for project?

2000-12-23 Thread Steinar H. Gunderson

On Sat, Dec 23, 2000 at 11:08:47AM -0500, Jud McCranie wrote:
I agree.  I've never overclocked my computers because I think it is more 
important to be confident in the results.

As long as even George overclocks, I don't feel really guilty about my
400@448 machine (that has successfully completed a 72-hour-torture test
before I put it into PrimeNet use)...

/* Steinar */
-- 
Homepage: http://members.xoom.com/sneeze/
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: P-1 factoring only

2000-12-23 Thread Henk Stokhorst

L.S.,

I would enjoy it if it would be possible to have a prime95 version with
a P-1 factoring assignments only option. My pc starts to crunch
(literally, really, it starts to peep intermittantly like as if it needs
lubricant) if I switch to LL testing. It would save time for the people
prefering LL tests, just like the regular factoring only option does
now.

YotN,

Henk



_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Overclocking - bad for project?

2000-12-23 Thread Brian J. Beesley

On 23 Dec 00, at 10:31, Gareth Randall wrote:

  My opinion is that it's better to have fewer correct results than to
  have
 the central database poisoned by loads of "don't think it's prime, but the
 user was overclocking" results, which of course cannot be distinguished
 from perfect answers. I'd trade two unreliable answers for one honest
 result. (What ends up happening is even worse. Mismatching checksums mean
 that the tests must be repeated until a consensus is reached.)
 

This is basically true. However, most systems are conservatively 
engineered; if care is taken (especially with regard to cooling)  
overclocking need not neccessarily result in unreliable systems.

Systems which are not overclocked can still be unreliable for a 
number of reasons.

IMHO undercooling (perhaps because a fan has stopped running, 
possibly without the knowledge of the user) represents at least as 
serious a problem as overclocking.

Damage by static discharge to components (especially memory, and 
usually due to mishandling during assembly) is another possible cause 
of systems running less than perfectly reliably.

On 23 Dec 00, at 10:15, John R Pierce wrote:

 at one time I had a number of those ILLEGAL SUMOUT errors, it turned out
 to be caused by an errant internet multimedia plugin (Crescendo MIDI)
 which was somehow interfering with the pentium-II's FPU.  This problem was
 specific to Windows95 too, I think, and went away with a later release of
 the kernel (win98 or 98SE fixed it, I think... it definately is not a
 problem in NT or Win2000).  I think we all decided it was related to this
 plugin doing MMX processing at a interrupt basis without properly
 notifying the kernel or something similar to this.

Yes, software problems are a real possibility, especially in Win 9x, 
because the memory model used does not protect process memory 
properly.
 
 Anyways, I suspect the probability of a hardware error causing erroneous
 results without triggering MASSIVE numbers of check errors is
 slim-to-none.

Unfortunately lack of errors in normal operation of Prime95 is not a 
good indicator of a really reliable system. Because the error check 
is run every iteration (instead of once every 128 iterations) and 
because the result is compared with a known value, the 16-hour self 
test, or the torture test, is better as a hardware reliability tool 
than running LL tests in "production" mode.

A couple of years ago I had an instance of a P100 system with a blown 
CPU fan. It seems to have run for months with no detected errors. 
Eventually it did throw a couple of wobblies, but meanwhile it had 
submitted a couple of results which later turned out to be bad (mixed 
in with a pile of others which were OK).

My only other known error was during a QA test involving a run on a 
very large exponent. It turned out that the system had glitched, 
probably only once - rerunning in segments revealed a discrepancy 
between iterations 8.3 million  8.4 million, but the other 16 of 17 
million iteration segments were clean. This was on a well-cooled 
Athlon 650 system running Win 2K Professional, using ECC RAM.
I don't know the cause. Possibly you can just get a hardware glitch 
something of the order of once a year. The point is that there was 
nothing in the log to show that the result might be suspect.

Also bear in mind that there is no certainty that there is no 
undetected bug in either the software or the CPU. There seems little 
point in excluding results from systems which may possibly be less 
than perfectly reliable so long as other sources of error may exist.

The double-checking mechanism _does_ work!
 
 How many mismatched checksums does primenet have to reconcile on a ongoing
 basis?
 
George would need to answer this one, but the incidence of "bad" 
results submitted is something of the order of 1%. Maybe a bit 
higher, and maybe tending to rise with increasing exponent size (or 
increasing run times?) I do have a feeling that current systems are 
less conservatively engineered than they used to be years ago - the 
market is more competitive, and there is more consumer pressure for 
ever higher "performance numbers" than there once was.

Seasonal felicitations
Brian Beesley
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers