Re: bad RAM? prove it with a crash dump?

2010-05-06 Thread Sean

On 7/05/2010 2:50 AM, Sean C. Farley wrote:

On Thu, 6 May 2010, Atom Smasher wrote:


i suspect i've got bad RAM but memtest has run through several dozen
iterations without a problem. my (3 year old) laptop will run for a
few days or weeks and then crash/freeze/hang. i've enabled crash dumps
and i'm wondering if/how the dump might be able to (dis)prove that the
RAM is bad. any ideas?

thanks...


Do not discount other hardware problems: video cards, bad capacitors and
power supplies. Sadly, I mention these as a subset of my experience. :(
I have even had a faulty left mouse button that would lock my X server
(many years ago). While holding the button down (scrolling through a
menu), the mouse would release and acquire too quickly for the server.



And to add the most obscure, the USB wireless kb/mouse adapter (or the 
keyboard/mouse itself) randomly crashing any operating system.


Completely new PC with all components except that, reinstalled OS, 
change of OS, and it still crashed randomly. Swap the logitech adapter 
and the wireless keyboard/mouse that were locked to it, and all 
stability problems disappeared.




Unfortunately, it is harder to find the problem in a laptop where you
cannot easily (if at all) switch out pieces of hardware to find the
problem.

Have you investigated whether or not the laptop is overheating?

Sean


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: bad RAM? prove it with a crash dump?

2010-05-06 Thread Sean C. Farley

On Thu, 6 May 2010, Atom Smasher wrote:

i suspect i've got bad RAM but memtest has run through several dozen 
iterations without a problem. my (3 year old) laptop will run for a 
few days or weeks and then crash/freeze/hang. i've enabled crash dumps 
and i'm wondering if/how the dump might be able to (dis)prove that the 
RAM is bad.  any ideas?


thanks...


Do not discount other hardware problems:  video cards, bad capacitors 
and power supplies.  Sadly, I mention these as a subset of my 
experience.  :(  I have even had a faulty left mouse button that would 
lock my X server (many years ago).  While holding the button down 
(scrolling through a menu), the mouse would release and acquire too 
quickly for the server.


Unfortunately, it is harder to find the problem in a laptop where you 
cannot easily (if at all) switch out pieces of hardware to find the 
problem.


Have you investigated whether or not the laptop is overheating?

Sean
--
s...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


RE: bad RAM? prove it with a crash dump?

2010-05-06 Thread Nate Eldredge

On Thu, 6 May 2010, Andrew Duane wrote:

It is also useful to make sure that the garbage itself is different. As 
mentioned before, a single bit error in an otherwise valid value, or 
maybe a missing/scrambled byte, these are good indications of memory 
problems. If random places are often overwritten with something else, 
that could just be another piece of misbehaving code that is writing 
someplace it shouldn't. I've often found code that writes some buffer 
into e.g. a piece of memory it no longer owns that looks like memory 
corruption until you realize the garbage is always something specific 
like a vnode structure.


There are trickier things too.  I once had a machine with bad cache memory 
where once in a while you would get a cache line that had come from 
somewhere else in memory.  This was particularly vexing when it happened 
to an I/O buffer, and I wound up with a large zip file that had 32 bytes 
of libc.so somewhere in the middle... :-(


And of course, swapping out the RAM wouldn't have fixed it.

--

Nate Eldredge
n...@thatsmathematics.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


RE: bad RAM? prove it with a crash dump?

2010-05-06 Thread Andrew Duane
owner-freebsd-hack...@freebsd.org wrote:
> On Thu, 6 May 2010, Boris Kochergin wrote:
> 
>> My experience with bad memory is that if it causes the machine to
>> crash, it won't always happen while the machine is running the same
>> process (or kernel thread)--so look for it crashing in a wide
>> variety of places--and upon inspection of the core dump, a pointer
>> somewhere will be pointing to garbage.
> 
> 
> so really i'd need to collect two or more crash dumps, and if they
> point to different addresses then i can reasonably say the RAM is bad?
> 
> thanks...

It's not just that they point to different addresses, it is garbage in many 
completely independent places. For example, pulling bad registers/return 
addresses off the stack, or garbage in random unrelated 
buffers/structures/pointers. On the other hand, if you often have garbage in 
some structure's "foo" pointer, that indicates a problem (maybe locking) in how 
your code manages setting that foo pointer. It's a subtle difference.

It is also useful to make sure that the garbage itself is different. As 
mentioned before, a single bit error in an otherwise valid value, or maybe a 
missing/scrambled byte, these are good indications of memory problems. If 
random places are often overwritten with something else, that could just be 
another piece of misbehaving code that is writing someplace it shouldn't. I've 
often found code that writes some buffer into e.g. a piece of memory it no 
longer owns that looks like memory corruption until you realize the garbage is 
always something specific like a vnode structure.

/Andrew

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: bad RAM? prove it with a crash dump?

2010-05-06 Thread Atom Smasher

On Thu, 6 May 2010, Boris Kochergin wrote:

My experience with bad memory is that if it causes the machine to crash, 
it won't always happen while the machine is running the same process (or 
kernel thread)--so look for it crashing in a wide variety of places--and 
upon inspection of the core dump, a pointer somewhere will be pointing 
to garbage.



so really i'd need to collect two or more crash dumps, and if they point 
to different addresses then i can reasonably say the RAM is bad?


thanks...


--
...atom

 
 http://atom.smasher.org/
 762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808
 -

"You don't get everything you want. A dictatorship would
 be a lot easier."
-- George "dubya" Bush, describing what it's like
to be governor of Texas. (Governing Magazine 7/98)

"If this were a dictatorship, it would be a heck of a lot
 easier, just so long as I'm the dictator."
-- George "dubya" Bush
 http://www.cnn.com/TRANSCRIPTS/0012/18/nd.01.html
18 Dec 2000 CNN.com

"A dictatorship would be a heck of a lot easier, there's
 no question about it."
George "dubya" Bush, 27 Jul 2001 Associated Press

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: bad RAM? prove it with a crash dump?

2010-05-06 Thread John Baldwin
On Thursday 06 May 2010 4:57:05 am Atom Smasher wrote:
> i suspect i've got bad RAM but memtest has run through several dozen 
> iterations without a problem. my (3 year old) laptop will run for a few 
> days or weeks and then crash/freeze/hang. i've enabled crash dumps and i'm 
> wondering if/how the dump might be able to (dis)prove that the RAM is bad. 
> any ideas?

If you can find a bad pointer that has a single-bit error that can certainly 
point to bad memory.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: bad RAM? prove it with a crash dump?

2010-05-06 Thread Boris Kochergin

Atom Smasher wrote:
i suspect i've got bad RAM but memtest has run through several dozen 
iterations without a problem. my (3 year old) laptop will run for a 
few days or weeks and then crash/freeze/hang. i've enabled crash dumps 
and i'm wondering if/how the dump might be able to (dis)prove that the 
RAM is bad. any ideas?


thanks...

My experience with bad memory is that if it causes the machine to crash, 
it won't always happen while the machine is running the same process (or 
kernel thread)--so look for it crashing in a wide variety of places--and 
upon inspection of the core dump, a pointer somewhere will be pointing 
to garbage.


-Boris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"