It could also indicate a problem with the power supply failing.  I've seen this 
a number of times and it often manifest as memory errors when testing the ram.  
 

Any number of things in the computer can fail in ways that may not be so 
obvious.  Substitution trouble shooting may be needed, i.e. try a known good 
power supply with known good memory, or take half the ram out to see if the 
problem persist, then check the other half of the ram.

It'd also a good also worth pulling and reseating the ram and any cards in it.  
I've got a big huger server that was having issues, it has a removable drawer 
for the cpu/memory, I pulled it out about 1/2 inch and reseated it and the 
errors stopped.  That was a couple of months ago.  Also probably a good idea to 
reseat the cpu as well.  Finally, you should also check the fans/dustiness of 
the computer in question, both of which can produce higher temps and random 
behavior.

And yes, it's a pain to properly test large amounts of ram, especially if you 
don't have a backup machine to work on while the other is testing.


--"Fascism begins the moment a ruling class, fearing the people may use their 
political democracy to gain economic democracy, begins to destroy political 
democracy in order to retain its power of exploitation and special privilege." 
Tommy Douglas




Jul 5, 2023, 11:50 by grant.b.edwa...@gmail.com:

> On 2023-07-05, Peter Humphrey <pe...@prh.myzen.co.uk> wrote:
>
>> This version of memtest86 ran to completion after going through the whole 
>> 64GB, and stopped with a success message.
>>
>
> That's a pretty good sign, but I have seen memory that made it through
> one complete test pass and failed on subsequent ones.
>
>> Over the last...oh, many months, I've noticed an occasional package in a 
>> large 
>> batch failing for no obvious reason, only to succeed on its own.
>>
>
> What sort of failure?  I've found that inconsistent/random gcc
> internal errors or gcc segfaults have usually been due to failing
> RAM. [Though in one case I remember, it was due to a failing SCSI disc
> controller card -- back when that was a thing.]
>
> It might also be due to a failing disk, but there are usually good
> indications of that in dmesg output and in SMART logs before it starts
> to affect other things.
>
> --
> Grant
>


Reply via email to