Re: Is this kernel related (signal 11)?

Russell King Mon, 22 Jan 2001 14:10:32 -0800
Rogier Wolff writes:
> Harware problems are normally not reproducable. Can you attach a
> debugger to your X server, and catch it when things go bad? (And
> give the Xfree86 people a backtrace)

Bad RAM can be extremely reproducable though, and can certainly produce
SEGVs.

Evidence: I recently had a bad 128MB SDRAM which *always* failed at byte
address 0x220068, which was the middle of the mem_map array.  All I
needed to do was 'dd if=/dev/hda of=/dev/null' and the machine would
die within 5 minutes due to an invalid buffer_head pointer.

The SDRAM naturally passed each and every single memory test I could
throw at it.  However, a new SDRAM fixed the problem.

It is quite common for SDRAMs to fail in this way - think about the
failure mode.  Some of the silicon in the SDRAM is damaged.  This isn't
going to move about, so its going to be in a fixed position.  A fixed
position means a specific set of transistors, gate, and therefore
memory location.

In answer to the original posters question, the first step would be
to grab a copy of memtest86 (iirc its a program that is run from floppy
disk) and run that on your system.  That /should/ (and I stress should
there) detect any RAM problems you have.

--
Russell King ([EMAIL PROTECTED])                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/
Re: Is this kernel related (signal 11)?

Reply via email to