Re: stripes of raid5s - crash
On Tue, Oct 19, 1999 at 11:16:47AM -0800, Christopher E. Brown wrote: Can you duplicate this using only one of the raid5 sets? I tried to cause the same behvior with a single raid5 set and it worked fine... but I did not layer raid on raid, perhaps this is where the issue is? When working with a 5 x 18G RAID5 (0 spare) using 2.2.12SMP + raid 2.2.11 (compiled in, not modules) I would get a endless stream about buffers when trying to mount the device, mke2fs and e2fsck worked fine. Seemed to happen when the array was in the beginning of the reconstruct. With 2.2.13pre15SMP + raid 2.2.11 I managed to get this a couple times, but only if I mount it right after reconstruct starts on a just mkraided array. If I wait till the reconstruct hits 2 - 3 % it mounts just fine. I have not seen this on arrays smaller than 50G (but this is not hard data, it could just be the faster reconstruct). I am still having these problems - 2 RAID5s with 3 drives each (10 MB per drive for fast rebuild) i get an instant "Got MD request, not good" kernel hang when doing an mke2fs on the stripe containing the both raid5s ... Very bad - This should work from the manpages but doesn (Reproducable with large and small stripe/raid sets) BTW: I am UP 2.2.12plain + raid patches ... BTW: The Hang does occur within rebuild and afterwards - Doesnt matter .. Flo -- Florian Lohoff [EMAIL PROTECTED] +49-5241-470566 ... The failure can be random; however, when it does occur, it is catastrophic and is repeatable ... Cisco Field Notice
RE: stripes of raid5s - crash
On Thu, 14 Oct 1999, Tom Livingston wrote: Florian Lohoff wrote: I did a bit further - Hung the machine - Couldnt log in (All Terms hang immediatly) - Tried to reboot and when it hung at "Unmounting file..." i got a term SysRq- Tand saw many processes stuck in the D state. Seems something produces a deadlock (ll_rw_blk ?) and all processes trying to access disk get stuck. Can you duplicate this using only one of the raid5 sets? I tried to cause the same behvior with a single raid5 set and it worked fine... but I did not layer raid on raid, perhaps this is where the issue is? When working with a 5 x 18G RAID5 (0 spare) using 2.2.12SMP + raid 2.2.11 (compiled in, not modules) I would get a endless stream about buffers when trying to mount the device, mke2fs and e2fsck worked fine. Seemed to happen when the array was in the beginning of the reconstruct. With 2.2.13pre15SMP + raid 2.2.11 I managed to get this a couple times, but only if I mount it right after reconstruct starts on a just mkraided array. If I wait till the reconstruct hits 2 - 3 % it mounts just fine. I have not seen this on arrays smaller than 50G (but this is not hard data, it could just be the faster reconstruct). --- As folks might have suspected, not much survives except roaches, and they don't carry large enough packets fast enough... --About the Internet and nuclear war.
Re: stripes of raid5s - crash
On Thu, Oct 14, 1999 at 04:11:31PM -0700, Tom Livingston wrote: Florian Lohoff wrote: I did a bit further - Hung the machine - Couldnt log in (All Terms hang immediatly) - Tried to reboot and when it hung at "Unmounting file..." i got a term SysRq- Tand saw many processes stuck in the D state. Seems something produces a deadlock (ll_rw_blk ?) and all processes trying to access disk get stuck. Can you duplicate this using only one of the raid5 sets? I tried to cause A stripe of ONE raid5 doesnt make sense ... the same behvior with a single raid5 set and it worked fine... but I did not layer raid on raid, perhaps this is where the issue is? Flo -- Florian Lohoff [EMAIL PROTECTED] +49-5241-470566 ... The failure can be random; however, when it does occur, it is catastrophic and is repeatable ... Cisco Field Notice
RE: stripes of raid5s - crash
Florian Lohoff: Can you duplicate this using only one of the raid5 sets? I tried to cause A stripe of ONE raid5 doesnt make sense ... If you say so. What I meant of course, is can you duplicate the same behavior using ONLY ONE /dev/mdX "disk" That is, only initialize /dev/md0, mke2fs it and cause the same problem? I could not. tom
Re: stripes of raid5s - crash
On Sat, Oct 16, 1999 at 04:12:54PM -0700, Tom Livingston wrote: Florian Lohoff: Can you duplicate this using only one of the raid5 sets? I tried to cause A stripe of ONE raid5 doesnt make sense ... If you say so. What I meant of course, is can you duplicate the same behavior using ONLY ONE /dev/mdX "disk" That is, only initialize /dev/md0, mke2fs it and cause the same problem? I could not. Me either - One raid5 works without problems. It also works writing to both raid5s simultanious but writing to a combined stripe (2x Raid5) the machine locks ... Flo -- Florian Lohoff [EMAIL PROTECTED] +49-5241-470566 ... The failure can be random; however, when it does occur, it is catastrophic and is repeatable ... Cisco Field Notice
Re: stripes of raid5s - crash
On Thu, Oct 14, 1999 at 05:46:36PM +0200, Florian Lohoff wrote: Hi, i am discovering reproduceable crashes with stripes of raid5s Kernel 2.2.12 + raid0145-19990824-2.2.11.gz raidtools 19990924 Message is "Got md request" and machine freezes hard ... Console switching works but no other action. BTW: This happens when doing an mke2fs on the stripe when both raid5s are still in "resync". This happens after 2-5 seconds when starting mke2fs Flo -- Florian Lohoff [EMAIL PROTECTED] +49-5241-470566 ... The failure can be random; however, when it does occur, it is catastrophic and is repeatable ... Cisco Field Notice
RE: stripes of raid5s - crash
Florian Lohoff wrote: On Thu, Oct 14, 1999 at 05:46:36PM +0200, Florian Lohoff wrote: Hi, i am discovering reproduceable crashes with stripes of raid5s Kernel 2.2.12 + raid0145-19990824-2.2.11.gz raidtools 19990924 Message is "Got md request" and machine freezes hard ... Console switching works but no other action. BTW: This happens when doing an mke2fs on the stripe when both raid5s are still in "resync". This happens after 2-5 seconds when starting mke2fs The machine crashes? With no OOPS? Is the machine SMP? If so, does the problem still happen if you run in UP mode? Either way, try compiling with the Magic SysRq feature (in kernel hacking) and when you get the lockup do the SysRq + O to cause an OOPS.. and then decode it... this will (hopefully?) show us where it's at. Tom
Re: stripes of raid5s - crash
On Thu, Oct 14, 1999 at 01:39:58PM -0700, Tom Livingston wrote: Florian Lohoff wrote: On Thu, Oct 14, 1999 at 05:46:36PM +0200, Florian Lohoff wrote: Hi, i am discovering reproduceable crashes with stripes of raid5s Kernel 2.2.12 + raid0145-19990824-2.2.11.gz raidtools 19990924 Message is "Got md request" and machine freezes hard ... Console switching works but no other action. BTW: This happens when doing an mke2fs on the stripe when both raid5s are still in "resync". This happens after 2-5 seconds when starting mke2fs The machine crashes? With no OOPS? Is the machine SMP? If so, does the Partly Yes, Yes, No Currently (i did a bit further - Included raid md into kernel instead module) the machine does not crash but gets stuck. I am just able to press CTRL-ALT-ENTF but it doesnt reboot - It gets stuck at "Unmounting filesystems" although i haven got anything mounted on raid. With modules the machine didnt even accept "CTRL-ALT-ENTF". problem still happen if you run in UP mode? Either way, try compiling with the Magic SysRq feature (in kernel hacking) and when you get the lockup do the SysRq + O to cause an OOPS.. and then decode it... this will (hopefully?) show us where it's at. Ill try that Flo -- Florian Lohoff [EMAIL PROTECTED] +49-5241-470566 ... The failure can be random; however, when it does occur, it is catastrophic and is repeatable ... Cisco Field Notice
Re: stripes of raid5s - crash
On Thu, Oct 14, 1999 at 01:39:58PM -0700, Tom Livingston wrote: Florian Lohoff wrote: On Thu, Oct 14, 1999 at 05:46:36PM +0200, Florian Lohoff wrote: Hi, i am discovering reproduceable crashes with stripes of raid5s Kernel 2.2.12 + raid0145-19990824-2.2.11.gz raidtools 19990924 Message is "Got md request" and machine freezes hard ... Console switching works but no other action. BTW: This happens when doing an mke2fs on the stripe when both raid5s are still in "resync". This happens after 2-5 seconds when starting mke2fs The machine crashes? With no OOPS? Is the machine SMP? If so, does the problem still happen if you run in UP mode? Either way, try compiling with the Magic SysRq feature (in kernel hacking) and when you get the lockup do the SysRq + O to cause an OOPS.. and then decode it... this will (hopefully?) show us where it's at. SysRq + O does not exist in 2.2.12 it seems. I did a bit further - Hung the machine - Couldnt log in (All Terms hang immediatly) - Tried to reboot and when it hung at "Unmounting file..." i got a term SysRq- Tand saw many processes stuck in the D state. Seems something produces a deadlock (ll_rw_blk ?) and all processes trying to access disk get stuck. Flo -- Florian Lohoff [EMAIL PROTECTED] +49-5241-470566 ... The failure can be random; however, when it does occur, it is catastrophic and is repeatable ... Cisco Field Notice
RE: stripes of raid5s - crash
Florian Lohoff wrote: I did a bit further - Hung the machine - Couldnt log in (All Terms hang immediatly) - Tried to reboot and when it hung at "Unmounting file..." i got a term SysRq- Tand saw many processes stuck in the D state. Seems something produces a deadlock (ll_rw_blk ?) and all processes trying to access disk get stuck. Can you duplicate this using only one of the raid5 sets? I tried to cause the same behvior with a single raid5 set and it worked fine... but I did not layer raid on raid, perhaps this is where the issue is? Tom