On Fri, 25 Sep 2009 23:33:45 -0500
Paul Logasa Bogen II <p...@tamu.edu> wrote:
> 
> After a semi-random period of normal operation (anywhere from a few
> hours to a week) the machine will suddenly get a series of page
> allocation errors followed by a series of "soft lockup - CPU#[X]
> stuck" messages after which the machine is completely non responsive
> and has to be hard restarted.

Perhaps there's a resource leak of some type?  Can you try rebuilding
the kernels on these systems with CONFIG_DEBUG_KMEMLEAK?

This describes the process of inspecting for memory leaks:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/kmemleak.txt;h=34f6638aa5aceec30d290812fdc7fcebf3b86621;hb=HEAD

It would be useful to know the state of the system(s) prior to crash
(perhaps with a "while (sleep 600); do cat /sys/kernel/debug/kmemleak >
log; done" or something?)



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to