I've noticed one of the errors in your results file is:

> Iteration: 7752549/12962641, ERROR: SUM(INPUTS) != SUM(OUTPUTS),
1524171845291614 != 283103064441664.6

This is a definite hardware failure. If the PC is overclocked then you
probably need to increase the core or i/o voltage or reduce the speed. If
its not overclocked then try swapping out the memory or the CPU.

Of course if your hardware has failed in some way it won't come as a suprise
that your PC does other wierd things, like Mprime crashing...

The fact that you say errors are common suggests your PC has always had some
problems. My work PC never shows any errors and after a bit of tweaking my
overclocked PC at home never shows any errors. Mprime is a pretty good
health test. At a long shot you might try removing any non-essential devices
to check its not a driver problem. Presumably a badly written Linux driver
can cause the same problems as a badly written Windows driver.

Ideally you don't want any errors at all. If you're getting errors as
commonly as you say then the accuracy of your results will be questionable.
Because of double-checking this won't be a problem for the group but its
going to be a bit of a waste of time for you.


----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, October 25, 2001 7:11 PM
Subject: Mersenne: Mprime crash


> Last night, one of the 2 mprime jobs I run on my Linux
> PC at work died. It apprently died due to an illegal
> sumout error. Now these are quite common, an normally
> appear in my results.txt file in the form
>
> Iteration: 1019208/10199069, ERROR: ILLEGAL SUMOUT
> Possible hardware failure, consult the readme file.
> Continuing from last save file.
>
> The one last night, OTOH, was of the form
>
> ERROR: ILLEGAL SUMOUT
> Possible hardware failure, consult the readme file.
>
> i.e. no iteration number, and no "Continuing from last
> save file." message following it - it just died at this
> point. There was also a file write error about 2 hours
> before the crash, this turned out to be due to a full
> user partition on my hard drive (which I've since fixed)
> and I don't know if it has anything to do with the
> sumout errors (it seemes they should not be related.)
>
> Here is the excerpt from the results file - you can see
> the first file write error on 10/24 around 16:35, then a
> checksum error at 17:45 which apparently was recovered
> from OK, then at 18:30 the ILLEGAL SUMOUT error which
> caused the crash. That is followed on 10/25 (i.e. after
> I came to work today) by two "FATAL ERROR: Writing to temp file."
messages, as I twice tried to restart,
> before realizing my disk was full. After clearing out
> a couple hundred MB I again tried to restart at 9:47,
> but again got a "ERROR: ILLEGAL SUMOUT" message.
>
> Has one of my CPUs gone flaky on me? Is it possible
> that both of the 2 savefiles (they're both there, and
> both of the proper size) are corrupt?
>
> Any help would be welcome,
>
> -Ernst
>
> Excerpt from results.txt file:
>
> [Wed Oct 24 16:24:51 2001]
> Iteration 7736000 / 12962641
> [Wed Oct 24 16:34:54 2001]
> Iteration 7738000 / 12962641
> Error writing intermediate file: rC962641
> [Wed Oct 24 16:45:00 2001]
> Iteration 7740000 / 12962641
> [Wed Oct 24 16:55:04 2001]
> Iteration 7742000 / 12962641
> [Wed Oct 24 17:05:04 2001]
> Iteration 7744000 / 12962641
> [Wed Oct 24 17:15:04 2001]
> Iteration 7746000 / 12962641
> [Wed Oct 24 17:24:59 2001]
> Iteration 7748000 / 12962641
> [Wed Oct 24 17:34:53 2001]
> Iteration 7750000 / 12962641
> [Wed Oct 24 17:44:46 2001]
> Iteration 7752000 / 12962641
> Iteration: 7752549/12962641, ERROR: SUM(INPUTS) != SUM(OUTPUTS),
1524171845291614 != 283103064441664.6
> Possible hardware failure, consult the readme file.
> Continuing from last save file.
> [Wed Oct 24 18:29:37 2001]
> ERROR: ILLEGAL SUMOUT
> Possible hardware failure, consult the readme file.
> [Thu Oct 25 09:21:07 2001]
> FATAL ERROR: Writing to temp file.
> [Thu Oct 25 09:28:29 2001]
> FATAL ERROR: Writing to temp file.
> [Thu Oct 25 09:47:40 2001]
> ERROR: ILLEGAL SUMOUT
> Possible hardware failure, consult the readme file.
>
>
> _________________________________________________________________________
> Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
>

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to