How to recover a multiple raid5 disk failure with mdadm?

2005-08-08 Thread Claas Hilbrecht
I'm running a 4 disk software raid5 array with linux 2.6.12.1. Each disk is 
a 80 GB IDE master disk on a single used IDE bus (no slave drives). So far 
the array runs great but a few weeks ago one disk (hdk) in the array 
failed. After looking at the connectors I refit the connector to the drive 
(it seems to be a weak connection). The resync begin as the system is 
rebooted. But in the middle off the resync a second drive (hdg) had a 
problem. There are a couple of block unreadable *sick*. The array went down 
and it seems that all data is lost. This is not a real problem since the 
array is only used for a personal VDR.


But I thought this would be a good time to start to fiddle with the raid to 
see if there is a chance to rescue some data. I first start making a backup 
of each drive with dd if=/dev/hde | gzip -1  hde.gz. After googling 
around for I while I found 
http://www.tldp.org/HOWTO/Software-RAID-HOWTO-8.html#ss8.1 but the 
instructions there won't work. I even tried to recreate the array as 
suggested on different mailling list. The last try I've done used 
mdadm-2.0-devel-2 with the patch from 14.07.2005 
(http://www.opensubscriber.com/message/linux-raid@vger.kernel.org/1737664.html) 
from this mailling list. Sometimes I was able to recreate the array but if 
I try to mount the array it seems that there is no valid ext3 filesystem 
within.


So here is the list of events that caused the raid failure:

1) hdk went down due to a connector problem.
2) power off machine and refit connector.
3) power on and resync starts
4) hdg fails with some unreadble sectors (as according to kern.log)
5) md0 went down.

Is there anything else I can do to rescue the data? I assume you need more 
input but I don't think its a good idea to post even more logs in the 
list, so please ask if something is missing.


The output below is from mdadm-2.0-devel-2 examine. What I don't understand 
is that there is difference in the Spare Devices.


---***---
/dev/hde1:
 Magic : a92b4efc
   Version : 00.90.01
  UUID : 89d60b87:f4132b59:c073bd02:53de0ef9
 Creation Time : Tue Dec 28 12:24:48 2004
Raid Level : raid5
   Device Size : 80043136 (76.34 GiB 81.96 GB)
  Raid Devices : 4
 Total Devices : 4
Preferred Minor : 0

   Update Time : Sat Jul 23 20:23:19 2005
 State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 2
 Spare Devices : 1
  Checksum : c5646fe8 - expected c6586ef4
Events : 0.4340017

Layout : left-symmetric
Chunk Size : 32K

 Number   Major   Minor   RaidDevice State
this 3  3313  active sync   /dev/hde1

  0 0   00524288  spare
  1 3670016   6553665536393216  spare
  2 0   0131072589824  spare
  3 2162688   65536196608393216  spare
  4 3735552   655362621440  spare
---***---

---***---
/dev/hdg1:
 Magic : a92b4efc
   Version : 00.90.00
  UUID : 7b631138:ca5ac82b:95f1b9df:25e26bff
 Creation Time : Fri Aug  5 11:55:02 2005
Raid Level : raid5
   Device Size : 80043136 (76.34 GiB 81.96 GB)
  Raid Devices : 4
 Total Devices : 4
Preferred Minor : 0

   Update Time : Fri Aug  5 11:55:02 2005
 State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
 Spare Devices : 0
  Checksum : 35699ae6 - correct
Events : 0.1

Layout : left-symmetric
Chunk Size : 32K

 Number   Major   Minor   RaidDevice State
this 1  3411  active sync   /dev/hdg1

  0 0  3310  active sync   /dev/hde1
  1 1  3411  active sync   /dev/hdg1
  2 2  5612  active sync   /dev/hdi1
  3 3   003  faulty
---***---
/dev/hdi1:
 Magic : a92b4efc
   Version : 00.90.01
  UUID : 89d60b87:f4132b59:c073bd02:53de0ef9
 Creation Time : Tue Dec 28 12:24:48 2004
Raid Level : raid5
   Device Size : 80043136 (76.34 GiB 81.96 GB)
  Raid Devices : 4
 Total Devices : 4
Preferred Minor : 0

   Update Time : Sat Jul 23 20:23:19 2005
 State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 2
 Spare Devices : 1
  Checksum : c564701b - correct
Events : 0.4340017

Layout : left-symmetric
Chunk Size : 32K

 Number   Major   Minor   RaidDevice State
this 1  5611  active sync   /dev/hdi1

  0 0   000  removed
  1 1  5611  active sync   /dev/hdi1
  2 2  3412  active sync   /dev/hdg1
  3 3  3313  active sync   /dev/hde1
  4 4  5714  spare   /dev/hdk1
---***---

---***---
/dev/hdk1:
 Magic : a92b4efc
   Version : 00.90.01
  UUID : 89d60b87:f4132b59:c073bd02:53de0ef9
 Creation Time : Tue Dec 28 12:24:48 2004
Raid Level : raid5
   

Re: How to force a spare drive to take over in a RAID5?

2005-08-08 Thread Mark Cuss

Kernel 2.4.27

I've been using the old raidtools stuff and am new to mdadm - sorry if this 
is a obvisouly simply question...


After perusing the man page, is looks to me that I should use mdadm to mark 
the drive I want to remove as failed to force a rebuild on the spare drive. 
I want to double check that this is correct first though, as this md device 
contains 300 gig of production data.


Thanks for the help!

Mark

- Original Message - 
From: David Greaves [EMAIL PROTECTED]

To: Mark Cuss [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Sunday, August 07, 2005 5:29 AM
Subject: Re: How to force a spare drive to take over in a RAID5?



Mark Cuss wrote:


Hi!

I have a 4 drive SW RAID5 running on my machine.  One of the drives is
upset for some reason - I'm not sure if the drive itself is bad, but
that's not too important right now.  The important thing is to get the
RAID5 to stop using this drive and start using a spare drive that I
just added.




I did a raidhotadd to add in a new drive, sds.  Now, I would like the
array to stop using sdr and reconstruct all of the parity tables on
sds so I can pull sdr and get it replaced or whatever...

Any ideas?


Install and read the manpage for mdadm

What kernel version?

David




-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: endianness of Linux kernel RAID

2005-08-08 Thread Brent Walsh
Neil Brown wrote:

I decided to try it anyway...

The following patch, when applied to
mdadm-2.0-devel-3 (Recently
released) should allow:

  mdadm --examine --metadata=0.swap /dev/sda1

which will show the superblock with bytes swapped.
If that looks right for all devices, then

  mdadm --assemble /dev/mdX
--update=byteorder /dev/sda1 /dev/sdb1 ...


will assemble the array after swapping the byte
order on all devices. Once it has been assembled
this way, the superblocks will have the correct
byte order, and in future the array can be
assembled in the normal way.

I have a PowerPC based NAS device that was damaged by
a recent a brownout.  The drives in the array are
fine, but the device's firmware was corrupted.  My
first attempt to assemble the array on my x86 system
failed due to the endian differences.  After applying
this patch, the mdadm examine command worked
perfectly.  As of yet, I have not attempted to update
the superblocks.  I would like to leave the original
superblocks intact so that I can reinstall the drives
in my NAS device when it is repaired.  Is it possible
to assemble the array without overwriting the
superblocks?

-Brent





Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: endianness of Linux kernel RAID

2005-08-08 Thread Neil Brown
On Monday August 8, [EMAIL PROTECTED] wrote:
 
 I have a PowerPC based NAS device that was damaged by
 a recent a brownout.  The drives in the array are
 fine, but the device's firmware was corrupted.  My
 first attempt to assemble the array on my x86 system
 failed due to the endian differences.  After applying
 this patch, the mdadm examine command worked
 perfectly.  As of yet, I have not attempted to update
 the superblocks.  I would like to leave the original
 superblocks intact so that I can reinstall the drives
 in my NAS device when it is repaired.  Is it possible
 to assemble the array without overwriting the
 superblocks?

Not really  The kernel reads the superblocks to find out the
details of the array.  If it cannot read them, it won't assemble the
array.

Depending on the type of array, you could --build the array instead.
This allows for arrays that deliberately don't have superblocks.
It should work for raid0, might work for linear, could work for raid1,
but no for anything else.

If you want to try it, send me the --examine output. and I'll suggest
a command.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html