Re: stripes of raid5s - crash

1999-10-20 Thread Florian Lohoff

On Tue, Oct 19, 1999 at 11:16:47AM -0800, Christopher E. Brown wrote:

  Can you duplicate this using only one of the raid5 sets? I tried to cause
  the same behvior with a single raid5 set and it worked fine... but I did not
  layer raid on raid, perhaps this is where the issue is?
 
   When working with a 5 x 18G RAID5 (0 spare) using 2.2.12SMP +
 raid 2.2.11 (compiled in, not modules) I would get a endless stream
 about buffers when trying to mount the device, mke2fs and e2fsck
 worked fine.  Seemed to happen when the array was in the beginning of
 the reconstruct.
 
   With 2.2.13pre15SMP + raid 2.2.11 I managed to get this a
 couple times, but only if I mount it right after reconstruct starts on
 a just mkraided array.  If I wait till the reconstruct hits 2 - 3 % it
 mounts just fine.  I have not seen this on arrays smaller than 50G
 (but this is not hard data, it could just be the faster reconstruct).

I am still having these problems - 

2 RAID5s with 3 drives each (10 MB per drive for fast rebuild) i get
an instant "Got MD request, not good" kernel hang when doing an mke2fs on 
the stripe containing the both raid5s ... Very bad - This should
work from the manpages but doesn (Reproducable with large and small
stripe/raid sets)

BTW: I am UP 2.2.12plain + raid patches ...

BTW: The Hang does occur within rebuild and afterwards - Doesnt matter ..

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED]  +49-5241-470566
  ...  The failure can be random; however, when it does occur, it is
  catastrophic and is repeatable  ... Cisco Field Notice



RE: stripes of raid5s - crash

1999-10-19 Thread Christopher E. Brown

On Thu, 14 Oct 1999, Tom Livingston wrote:

 Florian Lohoff wrote:
  I did a bit further - Hung the machine - Couldnt log in (All Terms
  hang immediatly) - Tried to reboot and when it hung at
  "Unmounting file..."
  i got a term SysRq- Tand saw many processes stuck in the D state.
 
  Seems something produces a deadlock (ll_rw_blk ?)  and all processes
  trying to access disk get stuck.
 
 Can you duplicate this using only one of the raid5 sets? I tried to cause
 the same behvior with a single raid5 set and it worked fine... but I did not
 layer raid on raid, perhaps this is where the issue is?


When working with a 5 x 18G RAID5 (0 spare) using 2.2.12SMP +
raid 2.2.11 (compiled in, not modules) I would get a endless stream
about buffers when trying to mount the device, mke2fs and e2fsck
worked fine.  Seemed to happen when the array was in the beginning of
the reconstruct.


With 2.2.13pre15SMP + raid 2.2.11 I managed to get this a
couple times, but only if I mount it right after reconstruct starts on
a just mkraided array.  If I wait till the reconstruct hits 2 - 3 % it
mounts just fine.  I have not seen this on arrays smaller than 50G
(but this is not hard data, it could just be the faster reconstruct).

---
As folks might have suspected, not much survives except roaches, 
and they don't carry large enough packets fast enough...
--About the Internet and nuclear war.




Re: stripes of raid5s - crash

1999-10-16 Thread Florian Lohoff

On Thu, Oct 14, 1999 at 04:11:31PM -0700, Tom Livingston wrote:
 Florian Lohoff wrote:
  I did a bit further - Hung the machine - Couldnt log in (All Terms
  hang immediatly) - Tried to reboot and when it hung at
  "Unmounting file..."
  i got a term SysRq- Tand saw many processes stuck in the D state.
 
  Seems something produces a deadlock (ll_rw_blk ?)  and all processes
  trying to access disk get stuck.
 
 Can you duplicate this using only one of the raid5 sets? I tried to cause

A stripe of ONE raid5 doesnt make sense ...

 the same behvior with a single raid5 set and it worked fine... but I did not
 layer raid on raid, perhaps this is where the issue is?

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED]  +49-5241-470566
  ...  The failure can be random; however, when it does occur, it is
  catastrophic and is repeatable  ... Cisco Field Notice



RE: stripes of raid5s - crash

1999-10-16 Thread Tom Livingston

Florian Lohoff:
  Can you duplicate this using only one of the raid5 sets? I
 tried to cause

 A stripe of ONE raid5 doesnt make sense ...

If you say so.

What I meant of course, is can you duplicate the same behavior using ONLY
ONE /dev/mdX "disk"  That is, only initialize /dev/md0, mke2fs it and cause
the same problem?  I could not.

tom



Re: stripes of raid5s - crash

1999-10-16 Thread Florian Lohoff

On Sat, Oct 16, 1999 at 04:12:54PM -0700, Tom Livingston wrote:
 Florian Lohoff:
   Can you duplicate this using only one of the raid5 sets? I
  tried to cause
 
  A stripe of ONE raid5 doesnt make sense ...
 
 If you say so.
 
 What I meant of course, is can you duplicate the same behavior using ONLY
 ONE /dev/mdX "disk"  That is, only initialize /dev/md0, mke2fs it and cause
 the same problem?  I could not.

Me either - One raid5 works without problems. It also works 
writing to both raid5s simultanious but writing to a combined
stripe (2x Raid5) the machine locks ...

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED]  +49-5241-470566
  ...  The failure can be random; however, when it does occur, it is
  catastrophic and is repeatable  ... Cisco Field Notice



Re: stripes of raid5s - crash

1999-10-14 Thread Florian Lohoff

On Thu, Oct 14, 1999 at 05:46:36PM +0200, Florian Lohoff wrote:
 Hi,
 i am discovering reproduceable crashes with stripes of raid5s 
 
 Kernel 2.2.12  + raid0145-19990824-2.2.11.gz
 raidtools 19990924 
 
 Message is "Got md request" and machine freezes hard ... Console
 switching works but no other action.

BTW: This happens when doing an mke2fs on the stripe when
both raid5s are still in "resync".
This happens after 2-5 seconds when starting mke2fs

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED]  +49-5241-470566
  ...  The failure can be random; however, when it does occur, it is
  catastrophic and is repeatable  ... Cisco Field Notice



RE: stripes of raid5s - crash

1999-10-14 Thread Tom Livingston

Florian Lohoff wrote:
 On Thu, Oct 14, 1999 at 05:46:36PM +0200, Florian Lohoff wrote:
  Hi,
  i am discovering reproduceable crashes with stripes of raid5s
 
  Kernel 2.2.12  + raid0145-19990824-2.2.11.gz
  raidtools 19990924
 
  Message is "Got md request" and machine freezes hard ... Console
  switching works but no other action.

 BTW: This happens when doing an mke2fs on the stripe when
 both raid5s are still in "resync".
 This happens after 2-5 seconds when starting mke2fs

The machine crashes?  With no OOPS?  Is the machine SMP?  If so, does the
problem still happen if you run in UP mode?  Either way, try compiling with
the Magic SysRq feature (in kernel hacking) and when you get the lockup do
the SysRq + O to cause an OOPS.. and then decode it... this will
(hopefully?) show us where it's at.

Tom



Re: stripes of raid5s - crash

1999-10-14 Thread Florian Lohoff

On Thu, Oct 14, 1999 at 01:39:58PM -0700, Tom Livingston wrote:
 Florian Lohoff wrote:
  On Thu, Oct 14, 1999 at 05:46:36PM +0200, Florian Lohoff wrote:
   Hi,
   i am discovering reproduceable crashes with stripes of raid5s
  
   Kernel 2.2.12  + raid0145-19990824-2.2.11.gz
   raidtools 19990924
  
   Message is "Got md request" and machine freezes hard ... Console
   switching works but no other action.
 
  BTW: This happens when doing an mke2fs on the stripe when
  both raid5s are still in "resync".
  This happens after 2-5 seconds when starting mke2fs
 
 The machine crashes?  With no OOPS?  Is the machine SMP?  If so, does the

Partly Yes, Yes, No

Currently (i did a bit further - Included raid  md into kernel instead module)
the machine does not crash but gets stuck.
I am just able to press CTRL-ALT-ENTF but it doesnt reboot - It gets stuck
at "Unmounting filesystems" although i haven got anything mounted on raid.

With modules the machine didnt even accept "CTRL-ALT-ENTF".

 problem still happen if you run in UP mode?  Either way, try compiling with
 the Magic SysRq feature (in kernel hacking) and when you get the lockup do
 the SysRq + O to cause an OOPS.. and then decode it... this will
 (hopefully?) show us where it's at.

Ill try that 

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED]  +49-5241-470566
  ...  The failure can be random; however, when it does occur, it is
  catastrophic and is repeatable  ... Cisco Field Notice



Re: stripes of raid5s - crash

1999-10-14 Thread Florian Lohoff

On Thu, Oct 14, 1999 at 01:39:58PM -0700, Tom Livingston wrote:
 Florian Lohoff wrote:
  On Thu, Oct 14, 1999 at 05:46:36PM +0200, Florian Lohoff wrote:
   Hi,
   i am discovering reproduceable crashes with stripes of raid5s
  
   Kernel 2.2.12  + raid0145-19990824-2.2.11.gz
   raidtools 19990924
  
   Message is "Got md request" and machine freezes hard ... Console
   switching works but no other action.
 
  BTW: This happens when doing an mke2fs on the stripe when
  both raid5s are still in "resync".
  This happens after 2-5 seconds when starting mke2fs
 
 The machine crashes?  With no OOPS?  Is the machine SMP?  If so, does the
 problem still happen if you run in UP mode?  Either way, try compiling with
 the Magic SysRq feature (in kernel hacking) and when you get the lockup do
 the SysRq + O to cause an OOPS.. and then decode it... this will
 (hopefully?) show us where it's at.

SysRq + O does not exist in 2.2.12 it seems.

I did a bit further - Hung the machine - Couldnt log in (All Terms
hang immediatly) - Tried to reboot and when it hung at "Unmounting file..."
i got a term SysRq- Tand saw many processes stuck in the D state.

Seems something produces a deadlock (ll_rw_blk ?)  and all processes
trying to access disk get stuck.

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED]  +49-5241-470566
  ...  The failure can be random; however, when it does occur, it is
  catastrophic and is repeatable  ... Cisco Field Notice



RE: stripes of raid5s - crash

1999-10-14 Thread Tom Livingston

Florian Lohoff wrote:
 I did a bit further - Hung the machine - Couldnt log in (All Terms
 hang immediatly) - Tried to reboot and when it hung at
 "Unmounting file..."
 i got a term SysRq- Tand saw many processes stuck in the D state.

 Seems something produces a deadlock (ll_rw_blk ?)  and all processes
 trying to access disk get stuck.

Can you duplicate this using only one of the raid5 sets? I tried to cause
the same behvior with a single raid5 set and it worked fine... but I did not
layer raid on raid, perhaps this is where the issue is?

Tom