PROBLEM: raid5 hangs

2007-11-13 Thread Peter Magnusson

Hey.

[1.] One line summary of the problem:

raid5 hangs and use 100% cpu

[2.] Full description of the problem/report:

I have used 2.6.18 for 284 days or something until my powersupply died, no 
problem what so ever duing that time. After that forced reboot I did these 
changes; Put in 2 GB more memory so I have 3 GB instead of 1 GB, two disks 
in the raid5 got badblocks so I didnt trust them anymore so I bought new 
disks (I managed to save the raid5). I have 6x300 GB in a raid5. Two of 
them are now 320 GB so created a small raid1 also. That raid5 is encrypted 
with aes-cbc-plain. The raid1 is encrypted with aes-cbc-essiv:sha256.


I compiled linux-2.6.22.3 and started to use that. I used the same .config
as in default FC5, I think i just selected P4 cpu and preemptive kernel 
type.


After 11 or 12 days the computer froze, I wasnt home when it happend and
couldnt fix it for like 3 days. It was just to reboot it as it wasnt 
possible to login remotely or on console. It did respond to ping however.

After reboot it rebuilded the raid5.

Then it happend again after approx the same time, 11 or 12 days. I noticed
that the process md1_raid5 used 100% cpu all the time. After reboot it
rebuilded the raid5.

I compiled linux-2.6.23.

And then... it happend again... After about the same time as before.
md1_raid5 used 100% cpu. I also noticed that I wasnt able to save
anything in my homedir, it froze during save. I could read from it 
however. My homedir isnt on raid5 but its encrypted. Its not on any disk 
that has to do with raid. This problem didnt happend when I used 2.6.18. 
Currently I use 2.6.18 as I kinda need the computer stable.

After reboot it rebuilded the raid5.

top looked like this:

- 02:37:32 up 11 days,  2:00, 29 users,  load average: 21.06, 17.45, 9.38
Tasks: 284 total,   2 running, 282 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.1%us, 51.2%sy,  0.0%ni,  0.0%id, 46.6%wa,  0.0%hi,  0.0%si, 
0.0%st

Mem:   3114928k total,  2981720k used,   133208k free, 8244k buffers
Swap:  2096472k total,  252k used,  2096220k free,  1690196k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 2147 root  15  -5 000 R  100  0.0  80:25.80 md1_raid5
11328 iocc  20   0  536m 374m  28m S3 12.3 249:32.38 firefox-bin

After some time, just before I rebooted I had this load:

 02:48:36 up 11 days,  2:11, 29 users,  load average: 86.10, 70.80, 40.07

[3.] Keywords (i.e., modules, networking, kernel):

raid5, possible dm_mod

[4.] Kernel version (from /proc/version):

Not using 2.6.23 now but anyway...
Linux version 2.6.18 ([EMAIL PROTECTED]) (gcc version 4.1.1 
20060525 (Red Hat 4.1.1-1)) #1 SMP Sun Sep 24 12:58:16 CEST 2006


[5.] Output of Oops.. message (if applicable) with symbolic information
 resolved (see Documentation/oops-tracing.txt)

No oopses, doesnt log anything.

[6.] A small shell script or example program which triggers the
 problem (if possible)

-

[7.] Environment

Hmm..

FilesystemSize  Used Avail Use% Mounted on
/dev/sda1 7.8G  7.0G  761M  91% /<- unencrypted fs
tmpfs 1.5G 0  1.5G   0% /dev/shm
/dev/mapper/home   24G   23G  1.6G  94% /home<- encrypted fs
/dev/mapper/temp  1.4T  822G  555G  60% /temp<- encrypted fs,raid5
/dev/mapper/jb 18G   17G  1.2G  94% /mnt/jb  <- encrypted fs,raid1

[EMAIL PROTECTED] linux-2.6.23]# cryptsetup status home
/dev/mapper/home is active:
  cipher:  aes-cbc-plain
  keysize: 256 bits
  device:  /dev/sda3
  offset:  0 sectors
  size:50861790 sectors
  mode:read/write
[EMAIL PROTECTED] linux-2.6.23]# cryptsetup status temp
/dev/mapper/temp is active:
  cipher:  aes-cbc-plain
  keysize: 256 bits
  device:  /dev/md1
  offset:  0 sectors
  size:2930496000 sectors
  mode:read/write
[EMAIL PROTECTED] linux-2.6.23]# cryptsetup status jb
/dev/mapper/jb is active:
  cipher:  aes-cbc-essiv:sha256
  keysize: 256 bits
  device:  /dev/md0
  offset:  0 sectors
  size:37238528 sectors
  mode:read/write

[7.1.] Software (add the output of the ver_linux script here)

If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux flashdance.cx 2.6.18 #1 SMP Sun Sep 24 12:58:16 CEST 2006 i686 i686 
i386 GNU/Linux


Gnu C  4.1.1
Gnu make   3.80
binutils   2.16.91.0.6
util-linux 2.13-pre7
mount  2.13-pre7
module-init-tools  3.2.2
e2fsprogs  1.38
reiserfsprogs  3.6.19
quota-tools3.13.
PPP2.4.3
Linux C Library2.4
Dynamic linker (ldd)   2.4
Procps 3.2.7
Net-tools  1.60
Kbd1.12
oprofile   0.9.1
Sh-utils   5.97
udev   084
wireless-tools 28
Modules Loaded vfat fat usb_storage cdc_ether usbnet cdc_acm nfs 
sha256 aes d

Re: [stable] [PATCH 000 of 2] md: Fixes for md in 2.6.23

2007-11-13 Thread Dan Williams
On Nov 13, 2007 8:43 PM, Greg KH <[EMAIL PROTECTED]> wrote:
> >
> > Careful, it looks like you cherry picked commit 4ae3f847 "md: raid5:
> > fix clearing of biofill operations" which ended up misapplied in
> > Linus' tree,  You should either also pick up def6ae26 "md: fix
> > misapplied patch in raid5.c" or I can resend the original "raid5: fix
> > clearing of biofill operations."
> >
> > The other patch for -stable "raid5: fix unending write sequence" is
> > currently in -mm.
>
> Hm, I've attached the two patches that I have right now in the -stable
> tree so far (still have over 100 patches to go, so I might not have
> gotten to them yet if you have sent them).  These were sent to me by
> Andrew on their way to Linus.  if I should drop either one, or add
> another one, please let me know.
>

Drop md-raid5-fix-clearing-of-biofill-operations.patch and replace it
with the attached
md-raid5-not-raid6-fix-clearing-of-biofill-operations.patch (the
original sent to Neil).

The critical difference is that the replacement patch touches
handle_stripe5, not handle_stripe6.  Diffing the patches shows the
changes for hunk #3:

-@@ -2903,6 +2907,13 @@ static void handle_stripe6(struct stripe
+@@ -2630,6 +2634,13 @@ static void handle_stripe5(struct stripe_head *sh)

raid5-fix-unending-write-sequence.patch is in -mm and I believe is
waiting on an Acked-by from Neil?

> thanks,
>
> greg k-h

Thanks,
Dan
raid5: fix clearing of biofill operations

From: Dan Williams <[EMAIL PROTECTED]>

ops_complete_biofill() runs outside of spin_lock(&sh->lock) and clears the
'pending' and 'ack' bits.  Since the test_and_ack_op() macro only checks
against 'complete' it can get an inconsistent snapshot of pending work.

Move the clearing of these bits to handle_stripe5(), under the lock.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Tested-by: Joel Bertrand <[EMAIL PROTECTED]>
Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   17 ++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f96dea9..3808f52 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -377,7 +377,12 @@ static unsigned long get_stripe_work(struct stripe_head *sh)
 		ack++;
 
 	sh->ops.count -= ack;
-	BUG_ON(sh->ops.count < 0);
+	if (unlikely(sh->ops.count < 0)) {
+		printk(KERN_ERR "pending: %#lx ops.pending: %#lx ops.ack: %#lx "
+			"ops.complete: %#lx\n", pending, sh->ops.pending,
+			sh->ops.ack, sh->ops.complete);
+		BUG();
+	}
 
 	return pending;
 }
@@ -551,8 +556,7 @@ static void ops_complete_biofill(void *stripe_head_ref)
 			}
 		}
 	}
-	clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
-	clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
+	set_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
 
 	return_io(return_bi);
 
@@ -2630,6 +2634,13 @@ static void handle_stripe5(struct stripe_head *sh)
 	s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
 	/* Now to look around and see what can be done */
 
+	/* clean-up completed biofill operations */
+	if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
+		clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
+		clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
+		clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
+	}
+
 	rcu_read_lock();
 	for (i=disks; i--; ) {
 		mdk_rdev_t *rdev;
raid5: fix unending write sequence

From: Dan Williams <[EMAIL PROTECTED]>


handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
check 5: state 0x6 toread  read  write f800ffcffcc0 written 
check 4: state 0x6 toread  read  write f800fdd4e360 written 
check 3: state 0x1 toread  read  write  written 
check 2: state 0x1 toread  read  write  written 
check 1: state 0x6 toread  read  write f800ff517e40 written 
check 0: state 0x6 toread  read  write f800fd4cae60 written 
locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0
for sector 7629696, rmw=0 rcw=0


These blocks were prepared to be written out, but were never handled in
ops_run_biodrain(), so they remain locked forever.  The operations flags
are all clear which means handle_stripe() thinks nothing else needs to be
done.

This state suggests that the STRIPE_OP_PREXOR bit was sampled 'set' when it
should not have been.  This patch cleans up cases where the code looks at
sh->ops.pending when it should be looking at the consistent stack-based
snapshot of the operations flags.

Report from Joel:
	Resync done. Patch fix this bug.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Tested-by: Joel Bertrand <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
---

 drivers/md/raid5.c |   16 +---
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/

Re: [stable] [PATCH 000 of 2] md: Fixes for md in 2.6.23

2007-11-13 Thread Greg KH
On Tue, Nov 13, 2007 at 08:36:05PM -0700, Dan Williams wrote:
> On Nov 13, 2007 5:23 PM, Greg KH <[EMAIL PROTECTED]> wrote:
> > On Tue, Nov 13, 2007 at 04:22:14PM -0800, Greg KH wrote:
> > > On Mon, Oct 22, 2007 at 05:15:27PM +1000, NeilBrown wrote:
> > > >
> > > > It appears that a couple of bugs slipped in to md for 2.6.23.
> > > > These two patches fix them and are appropriate for 2.6.23.y as well
> > > > as 2.6.24-rcX
> > > >
> > > > Thanks,
> > > > NeilBrown
> > > >
> > > >  [PATCH 001 of 2] md: Fix an unsigned compare to allow creation of 
> > > > bitmaps with v1.0 metadata.
> > > >  [PATCH 002 of 2] md: raid5: fix clearing of biofill operations
> > >
> > > I don't see these patches in 2.6.24-rcX, are they there under some other
> > > subject?
> >
> > Oh nevermind, I found them, sorry for the noise...
> >
> 
> Careful, it looks like you cherry picked commit 4ae3f847 "md: raid5:
> fix clearing of biofill operations" which ended up misapplied in
> Linus' tree,  You should either also pick up def6ae26 "md: fix
> misapplied patch in raid5.c" or I can resend the original "raid5: fix
> clearing of biofill operations."
> 
> The other patch for -stable "raid5: fix unending write sequence" is
> currently in -mm.

Hm, I've attached the two patches that I have right now in the -stable
tree so far (still have over 100 patches to go, so I might not have
gotten to them yet if you have sent them).  These were sent to me by
Andrew on their way to Linus.  if I should drop either one, or add
another one, please let me know.

thanks,

greg k-h
>From [EMAIL PROTECTED] Mon Oct 22 20:45:35 2007
From: [EMAIL PROTECTED]
Date: Mon, 22 Oct 2007 20:45:11 -0700
Subject: md: fix an unsigned compare to allow creation of bitmaps with v1.0 metadata
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Message-ID: <[EMAIL PROTECTED]>


From: NeilBrown <[EMAIL PROTECTED]>

patch 85bfb4da8cad483a4e550ec89060d05a4daf895b in mainline.

As page->index is unsigned, this all becomes an unsigned comparison, which
 almost always returns an error.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 drivers/md/bitmap.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -274,7 +274,7 @@ static int write_sb_page(struct bitmap *
 			if (bitmap->offset < 0) {
 /* DATA  BITMAP METADATA  */
 if (bitmap->offset
-+ page->index * (PAGE_SIZE/512)
++ (long)(page->index * (PAGE_SIZE/512))
 + size/512 > 0)
 	/* bitmap runs in to metadata */
 	return -EINVAL;
>From [EMAIL PROTECTED] Mon Oct 22 20:45:44 2007
From: Dan Williams <[EMAIL PROTECTED]>
Date: Mon, 22 Oct 2007 20:45:11 -0700
Subject: md: raid5: fix clearing of biofill operations
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Message-ID: <[EMAIL PROTECTED]>

From: Dan Williams <[EMAIL PROTECTED]>

patch 4ae3f847e49e3787eca91bced31f8fd328d50496 in mainline.

ops_complete_biofill() runs outside of spin_lock(&sh->lock) and clears the
'pending' and 'ack' bits.  Since the test_and_ack_op() macro only checks
against 'complete' it can get an inconsistent snapshot of pending work.

Move the clearing of these bits to handle_stripe5(), under the lock.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Tested-by: Joel Bertrand <[EMAIL PROTECTED]>
Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 drivers/md/raid5.c |   17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -377,7 +377,12 @@ static unsigned long get_stripe_work(str
 		ack++;
 
 	sh->ops.count -= ack;
-	BUG_ON(sh->ops.count < 0);
+	if (unlikely(sh->ops.count < 0)) {
+		printk(KERN_ERR "pending: %#lx ops.pending: %#lx ops.ack: %#lx "
+			"ops.complete: %#lx\n", pending, sh->ops.pending,
+			sh->ops.ack, sh->ops.complete);
+		BUG();
+	}
 
 	return pending;
 }
@@ -551,8 +556,7 @@ static void ops_complete_biofill(void *s
 			}
 		}
 	}
-	clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
-	clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
+	set_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
 
 	return_io(return_bi);
 
@@ -2903,6 +2907,13 @@ static void handle_stripe6(struct stripe
 	s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
 	/* Now to look around and see what can be done */
 
+	/* clean-up completed biofill operations */
+	if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
+		clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
+		clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
+		clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
+	}
+
 	rcu_read_lock();
 	for (i=disks; i--; ) {
 		mdk_r

Re: [stable] [PATCH 000 of 2] md: Fixes for md in 2.6.23

2007-11-13 Thread Dan Williams
On Nov 13, 2007 5:23 PM, Greg KH <[EMAIL PROTECTED]> wrote:
> On Tue, Nov 13, 2007 at 04:22:14PM -0800, Greg KH wrote:
> > On Mon, Oct 22, 2007 at 05:15:27PM +1000, NeilBrown wrote:
> > >
> > > It appears that a couple of bugs slipped in to md for 2.6.23.
> > > These two patches fix them and are appropriate for 2.6.23.y as well
> > > as 2.6.24-rcX
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > >  [PATCH 001 of 2] md: Fix an unsigned compare to allow creation of 
> > > bitmaps with v1.0 metadata.
> > >  [PATCH 002 of 2] md: raid5: fix clearing of biofill operations
> >
> > I don't see these patches in 2.6.24-rcX, are they there under some other
> > subject?
>
> Oh nevermind, I found them, sorry for the noise...
>

Careful, it looks like you cherry picked commit 4ae3f847 "md: raid5:
fix clearing of biofill operations" which ended up misapplied in
Linus' tree,  You should either also pick up def6ae26 "md: fix
misapplied patch in raid5.c" or I can resend the original "raid5: fix
clearing of biofill operations."

The other patch for -stable "raid5: fix unending write sequence" is
currently in -mm.

> greg k-h

Regards,
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal: non-striping RAID4

2007-11-13 Thread James Lee
>From a quick search through this mailing list, it looks like I can
answer my own question regarding RAID1 --> RAID5 conversion.  Instead
of creating a RAID1 array for the partitions on the two biggest
drives, it should just create a 2-drive RAID5 (which is identical, but
can be expanded as with any other RAID5 array).

So it looks like this should work I guess.

On 13/11/2007, James Lee <[EMAIL PROTECTED]> wrote:
> Thanks for the reply Bill, and on reflection I agree with a lot of it.
>
> I do feel that the use case is a sensible valid one - though maybe I
> didn't illustrate it well.  As an example suppose I want to, starting
> from scratch, build up a cost-effective large redundant array to hold
> data.
>
> With standard RAID5 I might do as follows:
> - Buy 3x 500GB SATA drives, setup as a single RAID5 array.
> - Once I have run out of space on this array (say in 12 months, for
> example), add another 1x500GB drive and expand the array.
> - Another 6 months or so later, buy another 1x 500GB drive and expand
> the array, etc.
> This isn't that cost-efficient, as by the second or third iteration,
> 500GB drives are not good value per-GB (as the sweet-spot has moved
> onto 1TB drives say).
>
> With a scheme which gives a redundant array with the capacity being
> the sum of the size of all drives minus the size of the largest drive,
> the sequence can be something like:
> - Buy 3x500GB drives.
> - Once out of space, add a drive with size determined by current best
> price/GB (eg. 1x750GB drive).
> - Repeat as above (giving, say, 1TB, 1.5TB,  drives).
> (- When adding larger drives, potentially also start removing the
> smallest drives from the array and selling them - to avoid having too
> many drives.)
>
> However what I do agree with is that this is entirely achievable using
> current RAID5 and RAID1, and as you described (and ideally then having
> creating a linear array out of the resulting arrays).  All it would
> require, as you say, is either a simple wrapper script issuing mdadm
> commands, or ideally for this ability to be added to mdadm itself.  So
> that the create command for this new "raid type" would just create all
> the RAID5 and RAID1 arrays, and use them to make a linear array.  The
> grow command (when adding a new drive to the array) would partition it
> up, expand each of the RAID5 arrays onto it, convert the existing
> RAID1 array to a RAID5 array using the new drive, create a new RAID1
> array, and expand the linear array containing them all.  The only
> thing I'm not entirely sure about is whether mdadm currently supports
> online conversion of 2-drive RAID1 array --> 3-drive RAID5 array?
>
> So thanks for the input, and I'll now ask a slightly different
> question to my original one - would there be any interest in enhancing
> mdadm to do the above?  By which I mean would patches which did this
> be considered, or would this be deemed not to be useful / desirable?
>
> Thanks,
> James
>
> On 12/11/2007, Bill Davidsen <[EMAIL PROTECTED]> wrote:
> > James Lee wrote:
> > > I have come across an unusual RAID configuration type which differs
> > > from any of the standard RAID 0/1/4/5 levels currently available in
> > > the md driver, and has a couple of very useful properties (see below).
> > >  I think it would be useful to have this code included in the main
> > > kernel, as it allows for some use cases that aren't well catered for
> > > with the standard RAID levels.  I was wondering what people's thoughts
> > > on this might be?
> > >
> > > The RAID type has been named "unRAID" by it's author, and is basically
> > > similar to RAID 4 but without data being striped across the drives in
> > > the array.  In an n-drive array (where the drives need not have the
> > > same capacity), n-1 of the drives appear as independent drives with
> > > data written to them as with a single standalone drive, and the 1
> > > remaining drive is a parity drive (this must be the largest capacity
> > > drive), which stores the bitwise XOR of the data on the other n-1
> > > drives (where the data being XORed is taken to be 0 if we're past the
> > > end of that particular drive).  Data recovery then works as per normal
> > > RAID 4/5 in the case of the failure of any one of the drives in the
> > > array.
> > >
> > > The advantages of this are:
> > > - Drives need not be of the same size as each other; the only
> > > requirement is that the parity drive must be the largest drive in the
> > > array.  The available space of the array is the sum of the space of
> > > all drives in the array, minus the size of the largest drive.
> > > - Data protection is slightly better than with RAID 4/5 in that in the
> > > event of multiple drive failures, only some data is lost (since the
> > > data on any non-failed, non-parity drives is usable).
> > >
> > > The disadvantages are:
> > > - Performance:
> > > - As there is no striping, on a non-degraded array the read
> > > performance will be identical to that of a singl

RAID5 Recovery

2007-11-13 Thread Neil Cavan
Hello,

I have a 5-disk RAID5 array that has gone belly-up. It consists of 2x
2 disks on Promise PCI controllers, and one on the mobo controller.

This array has been running for a couple years, and every so often
(randomly, sometimes every couple weeks sometimes no problem for
months) it will drop a drive. It's not a drive failure per se, it's
something controller-related since the failures tend to happen in
pairs and SMART gives the drives a clean bill of health. If it's only
one drive, I can hot-add with no problem. If it's 2 drives my heart
leaps into my mouth but I reboot, only one of the drives comes up as
failed, and I can hot-add with no problem. The 2-drive case has
happened a dozen times and my array is never any worse for the wear.

This morning, I woke up to find the array had kicked two disks. This
time, though, /proc/mdstat showed one of the failed disks (U_U_U, one
of the "_"s) had been marked as a spare - weird, since there are no
spare drives in this array. I rebooted, and the array came back in the
same state: one failed, one spare. I hot-removed and hot-added the
spare drive, which put the array back to where I thought it should be
( still U_U_U, but with both "_"s marked as failed). Then I rebooted,
and the array began rebuilding on its own. Usually I have to hot-add
manually, so that struck me as a little odd, but I gave it no mind and
went to work. Without checking the contents of the filesystem. Which
turned out not to have been mounted on reboot. Because apparently
things went horribly wrong.

The rebuild process ran its course. I now have an array that mdadm
insists is peachy:
---
md0 : active raid5 hda1[0] hdc1[1] hdi1[4] hdg1[3] hde1[2]
  468872704 blocks level 5, 64k chunk, algorithm 2 [5/5] [U]

unused devices: 
---

But there is no filesystem on /dev/md0:

---
sudo mount -t reiserfs /dev/md0 /storage/
mount: wrong fs type, bad option, bad superblock on /dev/md0,
   missing codepage or other error
---

Do I have any hope of recovering this data? Could rebuilding the
reiserfs superblock help if the rebuild managed to corrupt the
superblock but not the data?

Any help is appreciated, below is the failure event in
/var/log/messages, followed by the output of cat /var/log/messages |
grep md.

Thanks,
Neil Cavan

Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr:
status=0x51 { DriveReady SeekComplete Error }
Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr:
error=0x40 { UncorrectableError }, LBAsect=11736, sector=1
1719
Nov 13 02:01:03 localhost kernel: [17805772.424000] ide: failed opcode
was: unknown
Nov 13 02:01:03 localhost kernel: [17805772.424000] end_request: I/O
error, dev hdc, sector 11719
Nov 13 02:01:03 localhost kernel: [17805772.424000] R5: read error not
correctable.
Nov 13 02:01:03 localhost kernel: [17805772.464000] lost page write
due to I/O error on md0
Nov 13 02:01:05 localhost kernel: [17805773.776000] hdc: dma_intr:
status=0x51 { DriveReady SeekComplete Error }
Nov 13 02:01:05 localhost kernel: [17805773.776000] hdc: dma_intr:
error=0x40 { UncorrectableError }, LBAsect=11736, sector=1
1727
Nov 13 02:01:05 localhost kernel: [17805773.776000] ide: failed opcode
was: unknown
Nov 13 02:01:05 localhost kernel: [17805773.776000] end_request: I/O
error, dev hdc, sector 11727
Nov 13 02:01:05 localhost kernel: [17805773.776000] R5: read error not
correctable.
Nov 13 02:01:05 localhost kernel: [17805773.776000] lost page write
due to I/O error on md0
Nov 13 02:01:06 localhost kernel: [17805775.156000] hdc: dma_intr:
status=0x51 { DriveReady SeekComplete Error }
Nov 13 02:01:06 localhost kernel: [17805775.156000] hdc: dma_intr:
error=0x40 { UncorrectableError }, LBAsect=11736, sector=1
1735
Nov 13 02:01:06 localhost kernel: [17805775.156000] ide: failed opcode
was: unknown
Nov 13 02:01:06 localhost kernel: [17805775.156000] end_request: I/O
error, dev hdc, sector 11735
Nov 13 02:01:06 localhost kernel: [17805775.156000] R5: read error not
correctable.
Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write
due to I/O error on md0
Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout:
Nov 13 02:01:06 localhost kernel: [17805775.196000]  --- rd:5 wd:3 fd:2
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 0, o:1, dev:hda1
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 1, o:0, dev:hdc1
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 2, o:1, dev:hde1
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 4, o:1, dev:hdi1
Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 c

Re: [stable] [PATCH 000 of 2] md: Fixes for md in 2.6.23

2007-11-13 Thread Greg KH
On Tue, Nov 13, 2007 at 04:22:14PM -0800, Greg KH wrote:
> On Mon, Oct 22, 2007 at 05:15:27PM +1000, NeilBrown wrote:
> > 
> > It appears that a couple of bugs slipped in to md for 2.6.23.
> > These two patches fix them and are appropriate for 2.6.23.y as well
> > as 2.6.24-rcX
> > 
> > Thanks,
> > NeilBrown
> > 
> >  [PATCH 001 of 2] md: Fix an unsigned compare to allow creation of bitmaps 
> > with v1.0 metadata.
> >  [PATCH 002 of 2] md: raid5: fix clearing of biofill operations
> 
> I don't see these patches in 2.6.24-rcX, are they there under some other
> subject?

Oh nevermind, I found them, sorry for the noise...

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [stable] [PATCH 000 of 2] md: Fixes for md in 2.6.23

2007-11-13 Thread Greg KH
On Mon, Oct 22, 2007 at 05:15:27PM +1000, NeilBrown wrote:
> 
> It appears that a couple of bugs slipped in to md for 2.6.23.
> These two patches fix them and are appropriate for 2.6.23.y as well
> as 2.6.24-rcX
> 
> Thanks,
> NeilBrown
> 
>  [PATCH 001 of 2] md: Fix an unsigned compare to allow creation of bitmaps 
> with v1.0 metadata.
>  [PATCH 002 of 2] md: raid5: fix clearing of biofill operations

I don't see these patches in 2.6.24-rcX, are they there under some other
subject?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal: non-striping RAID4

2007-11-13 Thread James Lee
Thanks for the reply Bill, and on reflection I agree with a lot of it.

I do feel that the use case is a sensible valid one - though maybe I
didn't illustrate it well.  As an example suppose I want to, starting
from scratch, build up a cost-effective large redundant array to hold
data.

With standard RAID5 I might do as follows:
- Buy 3x 500GB SATA drives, setup as a single RAID5 array.
- Once I have run out of space on this array (say in 12 months, for
example), add another 1x500GB drive and expand the array.
- Another 6 months or so later, buy another 1x 500GB drive and expand
the array, etc.
This isn't that cost-efficient, as by the second or third iteration,
500GB drives are not good value per-GB (as the sweet-spot has moved
onto 1TB drives say).

With a scheme which gives a redundant array with the capacity being
the sum of the size of all drives minus the size of the largest drive,
the sequence can be something like:
- Buy 3x500GB drives.
- Once out of space, add a drive with size determined by current best
price/GB (eg. 1x750GB drive).
- Repeat as above (giving, say, 1TB, 1.5TB,  drives).
(- When adding larger drives, potentially also start removing the
smallest drives from the array and selling them - to avoid having too
many drives.)

However what I do agree with is that this is entirely achievable using
current RAID5 and RAID1, and as you described (and ideally then having
creating a linear array out of the resulting arrays).  All it would
require, as you say, is either a simple wrapper script issuing mdadm
commands, or ideally for this ability to be added to mdadm itself.  So
that the create command for this new "raid type" would just create all
the RAID5 and RAID1 arrays, and use them to make a linear array.  The
grow command (when adding a new drive to the array) would partition it
up, expand each of the RAID5 arrays onto it, convert the existing
RAID1 array to a RAID5 array using the new drive, create a new RAID1
array, and expand the linear array containing them all.  The only
thing I'm not entirely sure about is whether mdadm currently supports
online conversion of 2-drive RAID1 array --> 3-drive RAID5 array?

So thanks for the input, and I'll now ask a slightly different
question to my original one - would there be any interest in enhancing
mdadm to do the above?  By which I mean would patches which did this
be considered, or would this be deemed not to be useful / desirable?

Thanks,
James

On 12/11/2007, Bill Davidsen <[EMAIL PROTECTED]> wrote:
> James Lee wrote:
> > I have come across an unusual RAID configuration type which differs
> > from any of the standard RAID 0/1/4/5 levels currently available in
> > the md driver, and has a couple of very useful properties (see below).
> >  I think it would be useful to have this code included in the main
> > kernel, as it allows for some use cases that aren't well catered for
> > with the standard RAID levels.  I was wondering what people's thoughts
> > on this might be?
> >
> > The RAID type has been named "unRAID" by it's author, and is basically
> > similar to RAID 4 but without data being striped across the drives in
> > the array.  In an n-drive array (where the drives need not have the
> > same capacity), n-1 of the drives appear as independent drives with
> > data written to them as with a single standalone drive, and the 1
> > remaining drive is a parity drive (this must be the largest capacity
> > drive), which stores the bitwise XOR of the data on the other n-1
> > drives (where the data being XORed is taken to be 0 if we're past the
> > end of that particular drive).  Data recovery then works as per normal
> > RAID 4/5 in the case of the failure of any one of the drives in the
> > array.
> >
> > The advantages of this are:
> > - Drives need not be of the same size as each other; the only
> > requirement is that the parity drive must be the largest drive in the
> > array.  The available space of the array is the sum of the space of
> > all drives in the array, minus the size of the largest drive.
> > - Data protection is slightly better than with RAID 4/5 in that in the
> > event of multiple drive failures, only some data is lost (since the
> > data on any non-failed, non-parity drives is usable).
> >
> > The disadvantages are:
> > - Performance:
> > - As there is no striping, on a non-degraded array the read
> > performance will be identical to that of a single drive setup, and the
> > write performance will be comparable or somewhat worse than that of a
> > single-drive setup.
> > - On a degraded arrays with many drives the read and write
> > performance could take further hits due to the PCI / PCI-E bus getting
> > saturated.
> >
>
> I personally feel that "this still looks like a bunch of little drives"
> should be listed first...
> > The company which has implemented this is "Lime technology" (website
> > here: http://www.lime-technology.com/); an overview of the technical
> > detail is given on their website here:
> > ht

Re: kernel panic (2.6.23.1-fc7) in drivers/md/raid5.c:144

2007-11-13 Thread Dan Williams
[ Adding Neil, stable@, DaveJ, and GregKH to the cc ]

On Nov 13, 2007 11:20 AM, Peter <[EMAIL PROTECTED]> wrote:
> Hi
>
> I had a 3 disc raid5 array running fine with Fedora 7 (32bit) kernel 
> 2.6.23.1-fc7 on an old Athlon XP using a two sata_sil cards.
>
> I replaced the hardware with an Athlon64 X2 and using the onboard sata_nv, 
> after I modified the initrd I was able to boot up from my old system drive. 
> However when it brought up the raid array it died with a kernel panic. I used 
> a rescue CD, commented out the array in mdadm.conf and booted up. I could 
> assemble the array manually (it kicked out one of the three drives for some 
> reason?) but when I used mdadm --examine /dev/md0 I got the kernel panic 
> again. I don't have remote debugging but I managed to take some pictures:
>
> http://img132.imageshack.us/img132/1697/kernel1sh3.jpg
> http://img132.imageshack.us/img132/3538/kernel2eu2.jpg
>
> From what I understand it should be possible to do this hardware upgrade with 
> using software raid? Any ideas?
>
> Thanks
> Peter
>

There are two bug fix patches pending for 2.6.23.2:
"raid5: fix clearing of biofill operations"
http://marc.info/?l=linux-raid&m=119303750132068&w=2
"raid5: fix unending write sequence"
http://marc.info/?l=linux-raid&m=119453934805607&w=2

You are hitting the bug that was fixed by: "raid5: fix clearing of
biofill operations"

Heads up for the stable@ team "raid5: fix clearing of biofill
operations" was originally misapplied for 2.6.24-rc:
"md: Fix misapplied patch in raid5.c"
http://marc.info/?l=linux-raid&m=119396783332081&w=2
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel panic (2.6.23.1-fc7) in drivers/md/raid5.c:144

2007-11-13 Thread Peter

Oops, I meant mdadm --detail /dev/md0 of course, not mdadm --examine
 /dev/md0

md: kicking non-fresh sdd1 from the array

I downgraded to 2.6.22.9-91.fc7 and that seems to be working so far, no
 kernel panics yet, trying to bring back sdd to the array. 


- Original Message 



Hi

I had a 3 disc raid5 array running fine with Fedora 7 (32bit) kernel
 2.6.23.1-fc7 on an old Athlon XP using a two sata_sil cards. 

I replaced the hardware with an Athlon64 X2 and using the onboard
 sata_nv, after I modified the initrd I was able to boot up from my old
 system drive. However when it brought up the raid array it died with a
 kernel panic. I used a rescue CD, commented out the array in mdadm.conf and
 booted up. I could assemble the array manually (it kicked out one of
 the three drives for some reason?) but when I used mdadm --examine
 /dev/md0 I got the kernel panic again. I don't have remote debugging but I
 managed to take some pictures:

http://img132.imageshack.us/img132/1697/kernel1sh3.jpg
http://img132.imageshack.us/img132/3538/kernel2eu2.jpg

>From what I understand it should be possible to do this hardware
 upgrade with using software raid? Any ideas?

Thanks
Peter


-
To unsubscribe from this list: send the line "unsubscribe linux-raid"
 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel panic (2.6.23.1-fc7) in drivers/md/raid5.c:144

2007-11-13 Thread Peter
Hi

I had a 3 disc raid5 array running fine with Fedora 7 (32bit) kernel 
2.6.23.1-fc7 on an old Athlon XP using a two sata_sil cards. 

I replaced the hardware with an Athlon64 X2 and using the onboard sata_nv, 
after I modified the initrd I was able to boot up from my old system drive. 
However when it brought up the raid array it died with a kernel panic. I used a 
rescue CD, commented out the array in mdadm.conf and booted up. I could 
assemble the array manually (it kicked out one of the three drives for some 
reason?) but when I used mdadm --examine /dev/md0 I got the kernel panic again. 
I don't have remote debugging but I managed to take some pictures:

http://img132.imageshack.us/img132/1697/kernel1sh3.jpg
http://img132.imageshack.us/img132/3538/kernel2eu2.jpg

>From what I understand it should be possible to do this hardware upgrade with 
>using software raid? Any ideas?

Thanks
Peter


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html