RE: Mersenne: Prime 95 Error Messages/ Misc

Brian J Beesley Tue, 8 Jun 1999 00:35:37 -0700
"Ethan Hansen" <[EMAIL PROTECTED]> writes:

> This is especially important if you are using a Xeon
> processor, as there are interesting cache functionality problems that only
> appear when a certain percentage of the die is used.

Eh?

256K FFT = 2Mbyte work vector. Two copies needed, plus odds & 
sods of other memory (including executable code). There are going 
to be cache misses, even on a 2MB L2 cache Xeon, for any FFT 
size >= 128K. Which means all current LL test assignments. 
IMHO if you have a Xeon (or a PII/PIII > 400 MHz, for that matter) 
you would probably be better running LL test assignments, leave 
the double checking for people with older processors.

> It is also possible to have an overclocked CPU pass the full self test
> suite, but later exhibit problems.  The likely culprit is simple wearout --
> the CPU initially was barely functional at the overclocked speed, but slowed
> enough that it no longer runs.  If this happens, you usually can still run
> the CPU at the rated speed.

Or, the errors are there all the time but at a low rate e.g. on 
average 1 every day. This is very likely to pass the 1 hour "self 
test" but will show up in actual use, or on the continuous "torture 
test" (see the "Options" menu in Prime95).

If you've overclocked your system at all (_not_ reccomended, but I 
know it can be successful in some cases) then I suggest you let 
the torture test run for a couple of days before committing yourself 
to doing "real" work. It can happen that an overclocked system 
appears to run fine for "office" applications but causes problems 
with Prime95 because very few other applications use the floating-
point unit even half as intensively as Prime95 does.

Note that simple overheating can also cause problems, even if your 
system _isn't_ overclocked. Might be an idea to check that case 
and processor cooling fans are operating. They have been known to 
fail!

Finally (I think this is right - I'm sure George will chip in if not) there 
is a small but finite chance that you could get a very occasional 
"sum out error" even if your system is 100% perfect. This is due to 
abnormal combinations of data in the FFT triggering the "sanity 
check" in the code; the result may well be OK. The program should 
check that the "error" is "deterministic" (repeatable) rather than 
random and continue automatically if it is - though there will be an 
error log entry - PrimeNet uses this information to flag the result as 
"suspect", the exponent should then be re-assigned for an early 
double-check instead of waiting its turn as usual.


Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
RE: Mersenne: Prime 95 Error Messages/ Misc

Reply via email to