Bug#637085: linux-image-2.6.32-5-amd64: Hard hang following BUG: scheduling while atomic: swapper/0/0x10000100

2011-08-18 Thread Paul Elliott
A further update. We've successfully run 2.6.34-1~experimental.2 from snapshot.debian.org for 48 hours without any crashes (although we still see the UNDERRUN errors). I'll be testing 2.6.33 shortly. -- Paul Elliott, UNIX Systems Administrator York Neuroimaging Centre, University of York

Bug#637085: linux-image-2.6.32-5-amd64: Hard hang following BUG: scheduling while atomic: swapper/0/0x10000100

2011-08-15 Thread Paul Elliott
Thanks for looking into this Jonathan. We've spent the past week performing extensive tests both in a software sense and hardware sense. Here's the steps we've taken and the results obtained. 1) We've re-run our test script[1] on an ext4 file system provided by local 10k SAS disks. We used

Bug#637085: linux-image-2.6.32-5-amd64: Hard hang following BUG: scheduling while atomic: swapper/0/0x10000100

2011-08-08 Thread Paul Elliott
Package: linux-2.6 Version: 2.6.32-35 Severity: important We are experiencing hard lock ups when under heavy load. See below for the log entries we have managed to capture via remote syslog before the machine locks completely. The machine is a BL460c G7 and is performing multiple I/O stress

Bug#637085: linux-image-2.6.32-5-amd64: Hard hang following BUG: scheduling while atomic: swapper/0/0x10000100

2011-08-08 Thread Jonathan Nieder
Hi, Paul Elliott wrote: We are experiencing hard lock ups when under heavy load. See below for the log entries we have managed to capture via remote syslog before the machine locks completely. Thanks; this looks very useful. Let's see. The machine is a BL460c G7 and is performing

Bug#637085: linux-image-2.6.32-5-amd64: Hard hang following BUG: scheduling while atomic: swapper/0/0x10000100

2011-08-08 Thread Paul Elliott
Hi Jonathan, On 08/08/11 16:16, Jonathan Nieder wrote: I assume this is fairly reproducible even after a reboot? Is the Correct, we can reproduce the lock ups after a reboot following 5-60 minutes of high I/O load (900MB/s plus). stacktrace from the first sign of trouble in dmesg always

Bug#637085: linux-image-2.6.32-5-amd64: Hard hang following BUG: scheduling while atomic: swapper/0/0x10000100

2011-08-08 Thread Jonathan Nieder
Paul Elliott wrote: I'm no expert at reading these but I believe it is the same. Here's the trace after the next reboot/lock up cycle: kernel BUG at [...]/mm/slub.c:2969! invalid opcode: [#1] SMP last sysfs file: