On Fri, 25 Sep 2009 23:33:45 -0500 Paul Logasa Bogen II <p...@tamu.edu> wrote: > > After a semi-random period of normal operation (anywhere from a few > hours to a week) the machine will suddenly get a series of page > allocation errors followed by a series of "soft lockup - CPU#[X] > stuck" messages after which the machine is completely non responsive > and has to be hard restarted.
Perhaps there's a resource leak of some type? Can you try rebuilding the kernels on these systems with CONFIG_DEBUG_KMEMLEAK? This describes the process of inspecting for memory leaks: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/kmemleak.txt;h=34f6638aa5aceec30d290812fdc7fcebf3b86621;hb=HEAD It would be useful to know the state of the system(s) prior to crash (perhaps with a "while (sleep 600); do cat /sys/kernel/debug/kmemleak > log; done" or something?) -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org