Re: bad RAM? prove it with a crash dump?
On 7/05/2010 2:50 AM, Sean C. Farley wrote: On Thu, 6 May 2010, Atom Smasher wrote: i suspect i've got bad RAM but memtest has run through several dozen iterations without a problem. my (3 year old) laptop will run for a few days or weeks and then crash/freeze/hang. i've enabled crash dumps and i'm wondering if/how the dump might be able to (dis)prove that the RAM is bad. any ideas? thanks... Do not discount other hardware problems: video cards, bad capacitors and power supplies. Sadly, I mention these as a subset of my experience. :( I have even had a faulty left mouse button that would lock my X server (many years ago). While holding the button down (scrolling through a menu), the mouse would release and acquire too quickly for the server. And to add the most obscure, the USB wireless kb/mouse adapter (or the keyboard/mouse itself) randomly crashing any operating system. Completely new PC with all components except that, reinstalled OS, change of OS, and it still crashed randomly. Swap the logitech adapter and the wireless keyboard/mouse that were locked to it, and all stability problems disappeared. Unfortunately, it is harder to find the problem in a laptop where you cannot easily (if at all) switch out pieces of hardware to find the problem. Have you investigated whether or not the laptop is overheating? Sean ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: bad RAM? prove it with a crash dump?
On Thu, 6 May 2010, Atom Smasher wrote: i suspect i've got bad RAM but memtest has run through several dozen iterations without a problem. my (3 year old) laptop will run for a few days or weeks and then crash/freeze/hang. i've enabled crash dumps and i'm wondering if/how the dump might be able to (dis)prove that the RAM is bad. any ideas? thanks... Do not discount other hardware problems: video cards, bad capacitors and power supplies. Sadly, I mention these as a subset of my experience. :( I have even had a faulty left mouse button that would lock my X server (many years ago). While holding the button down (scrolling through a menu), the mouse would release and acquire too quickly for the server. Unfortunately, it is harder to find the problem in a laptop where you cannot easily (if at all) switch out pieces of hardware to find the problem. Have you investigated whether or not the laptop is overheating? Sean -- s...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
RE: bad RAM? prove it with a crash dump?
On Thu, 6 May 2010, Andrew Duane wrote: It is also useful to make sure that the garbage itself is different. As mentioned before, a single bit error in an otherwise valid value, or maybe a missing/scrambled byte, these are good indications of memory problems. If random places are often overwritten with something else, that could just be another piece of misbehaving code that is writing someplace it shouldn't. I've often found code that writes some buffer into e.g. a piece of memory it no longer owns that looks like memory corruption until you realize the garbage is always something specific like a vnode structure. There are trickier things too. I once had a machine with bad cache memory where once in a while you would get a cache line that had come from somewhere else in memory. This was particularly vexing when it happened to an I/O buffer, and I wound up with a large zip file that had 32 bytes of libc.so somewhere in the middle... :-( And of course, swapping out the RAM wouldn't have fixed it. -- Nate Eldredge n...@thatsmathematics.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
RE: bad RAM? prove it with a crash dump?
owner-freebsd-hack...@freebsd.org wrote: > On Thu, 6 May 2010, Boris Kochergin wrote: > >> My experience with bad memory is that if it causes the machine to >> crash, it won't always happen while the machine is running the same >> process (or kernel thread)--so look for it crashing in a wide >> variety of places--and upon inspection of the core dump, a pointer >> somewhere will be pointing to garbage. > > > so really i'd need to collect two or more crash dumps, and if they > point to different addresses then i can reasonably say the RAM is bad? > > thanks... It's not just that they point to different addresses, it is garbage in many completely independent places. For example, pulling bad registers/return addresses off the stack, or garbage in random unrelated buffers/structures/pointers. On the other hand, if you often have garbage in some structure's "foo" pointer, that indicates a problem (maybe locking) in how your code manages setting that foo pointer. It's a subtle difference. It is also useful to make sure that the garbage itself is different. As mentioned before, a single bit error in an otherwise valid value, or maybe a missing/scrambled byte, these are good indications of memory problems. If random places are often overwritten with something else, that could just be another piece of misbehaving code that is writing someplace it shouldn't. I've often found code that writes some buffer into e.g. a piece of memory it no longer owns that looks like memory corruption until you realize the garbage is always something specific like a vnode structure. /Andrew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: bad RAM? prove it with a crash dump?
On Thu, 6 May 2010, Boris Kochergin wrote: My experience with bad memory is that if it causes the machine to crash, it won't always happen while the machine is running the same process (or kernel thread)--so look for it crashing in a wide variety of places--and upon inspection of the core dump, a pointer somewhere will be pointing to garbage. so really i'd need to collect two or more crash dumps, and if they point to different addresses then i can reasonably say the RAM is bad? thanks... -- ...atom http://atom.smasher.org/ 762A 3B98 A3C3 96C9 C6B7 582A B88D 52E4 D9F5 7808 - "You don't get everything you want. A dictatorship would be a lot easier." -- George "dubya" Bush, describing what it's like to be governor of Texas. (Governing Magazine 7/98) "If this were a dictatorship, it would be a heck of a lot easier, just so long as I'm the dictator." -- George "dubya" Bush http://www.cnn.com/TRANSCRIPTS/0012/18/nd.01.html 18 Dec 2000 CNN.com "A dictatorship would be a heck of a lot easier, there's no question about it." George "dubya" Bush, 27 Jul 2001 Associated Press ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: bad RAM? prove it with a crash dump?
On Thursday 06 May 2010 4:57:05 am Atom Smasher wrote: > i suspect i've got bad RAM but memtest has run through several dozen > iterations without a problem. my (3 year old) laptop will run for a few > days or weeks and then crash/freeze/hang. i've enabled crash dumps and i'm > wondering if/how the dump might be able to (dis)prove that the RAM is bad. > any ideas? If you can find a bad pointer that has a single-bit error that can certainly point to bad memory. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: bad RAM? prove it with a crash dump?
Atom Smasher wrote: i suspect i've got bad RAM but memtest has run through several dozen iterations without a problem. my (3 year old) laptop will run for a few days or weeks and then crash/freeze/hang. i've enabled crash dumps and i'm wondering if/how the dump might be able to (dis)prove that the RAM is bad. any ideas? thanks... My experience with bad memory is that if it causes the machine to crash, it won't always happen while the machine is running the same process (or kernel thread)--so look for it crashing in a wide variety of places--and upon inspection of the core dump, a pointer somewhere will be pointing to garbage. -Boris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"