How to recover a multiple raid5 disk failure with mdadm?
I'm running a 4 disk software raid5 array with linux 2.6.12.1. Each disk is a 80 GB IDE master disk on a single used IDE bus (no slave drives). So far the array runs great but a few weeks ago one disk (hdk) in the array failed. After looking at the connectors I refit the connector to the drive (it seems to be a weak connection). The resync begin as the system is rebooted. But in the middle off the resync a second drive (hdg) had a problem. There are a couple of block unreadable *sick*. The array went down and it seems that all data is lost. This is not a real problem since the array is only used for a personal VDR. But I thought this would be a good time to start to fiddle with the raid to see if there is a chance to rescue some data. I first start making a backup of each drive with dd if=/dev/hde | gzip -1 hde.gz. After googling around for I while I found http://www.tldp.org/HOWTO/Software-RAID-HOWTO-8.html#ss8.1 but the instructions there won't work. I even tried to recreate the array as suggested on different mailling list. The last try I've done used mdadm-2.0-devel-2 with the patch from 14.07.2005 (http://www.opensubscriber.com/message/linux-raid@vger.kernel.org/1737664.html) from this mailling list. Sometimes I was able to recreate the array but if I try to mount the array it seems that there is no valid ext3 filesystem within. So here is the list of events that caused the raid failure: 1) hdk went down due to a connector problem. 2) power off machine and refit connector. 3) power on and resync starts 4) hdg fails with some unreadble sectors (as according to kern.log) 5) md0 went down. Is there anything else I can do to rescue the data? I assume you need more input but I don't think its a good idea to post even more logs in the list, so please ask if something is missing. The output below is from mdadm-2.0-devel-2 examine. What I don't understand is that there is difference in the Spare Devices. ---***--- /dev/hde1: Magic : a92b4efc Version : 00.90.01 UUID : 89d60b87:f4132b59:c073bd02:53de0ef9 Creation Time : Tue Dec 28 12:24:48 2004 Raid Level : raid5 Device Size : 80043136 (76.34 GiB 81.96 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Sat Jul 23 20:23:19 2005 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 2 Spare Devices : 1 Checksum : c5646fe8 - expected c6586ef4 Events : 0.4340017 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State this 3 3313 active sync /dev/hde1 0 0 00524288 spare 1 3670016 6553665536393216 spare 2 0 0131072589824 spare 3 2162688 65536196608393216 spare 4 3735552 655362621440 spare ---***--- ---***--- /dev/hdg1: Magic : a92b4efc Version : 00.90.00 UUID : 7b631138:ca5ac82b:95f1b9df:25e26bff Creation Time : Fri Aug 5 11:55:02 2005 Raid Level : raid5 Device Size : 80043136 (76.34 GiB 81.96 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Fri Aug 5 11:55:02 2005 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Checksum : 35699ae6 - correct Events : 0.1 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State this 1 3411 active sync /dev/hdg1 0 0 3310 active sync /dev/hde1 1 1 3411 active sync /dev/hdg1 2 2 5612 active sync /dev/hdi1 3 3 003 faulty ---***--- /dev/hdi1: Magic : a92b4efc Version : 00.90.01 UUID : 89d60b87:f4132b59:c073bd02:53de0ef9 Creation Time : Tue Dec 28 12:24:48 2004 Raid Level : raid5 Device Size : 80043136 (76.34 GiB 81.96 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Sat Jul 23 20:23:19 2005 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 2 Spare Devices : 1 Checksum : c564701b - correct Events : 0.4340017 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State this 1 5611 active sync /dev/hdi1 0 0 000 removed 1 1 5611 active sync /dev/hdi1 2 2 3412 active sync /dev/hdg1 3 3 3313 active sync /dev/hde1 4 4 5714 spare /dev/hdk1 ---***--- ---***--- /dev/hdk1: Magic : a92b4efc Version : 00.90.01 UUID : 89d60b87:f4132b59:c073bd02:53de0ef9 Creation Time : Tue Dec 28 12:24:48 2004 Raid Level : raid5
Re: How to force a spare drive to take over in a RAID5?
Kernel 2.4.27 I've been using the old raidtools stuff and am new to mdadm - sorry if this is a obvisouly simply question... After perusing the man page, is looks to me that I should use mdadm to mark the drive I want to remove as failed to force a rebuild on the spare drive. I want to double check that this is correct first though, as this md device contains 300 gig of production data. Thanks for the help! Mark - Original Message - From: David Greaves [EMAIL PROTECTED] To: Mark Cuss [EMAIL PROTECTED] Cc: linux-raid@vger.kernel.org Sent: Sunday, August 07, 2005 5:29 AM Subject: Re: How to force a spare drive to take over in a RAID5? Mark Cuss wrote: Hi! I have a 4 drive SW RAID5 running on my machine. One of the drives is upset for some reason - I'm not sure if the drive itself is bad, but that's not too important right now. The important thing is to get the RAID5 to stop using this drive and start using a spare drive that I just added. I did a raidhotadd to add in a new drive, sds. Now, I would like the array to stop using sdr and reconstruct all of the parity tables on sds so I can pull sdr and get it replaced or whatever... Any ideas? Install and read the manpage for mdadm What kernel version? David - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: endianness of Linux kernel RAID
Neil Brown wrote: I decided to try it anyway... The following patch, when applied to mdadm-2.0-devel-3 (Recently released) should allow: mdadm --examine --metadata=0.swap /dev/sda1 which will show the superblock with bytes swapped. If that looks right for all devices, then mdadm --assemble /dev/mdX --update=byteorder /dev/sda1 /dev/sdb1 ... will assemble the array after swapping the byte order on all devices. Once it has been assembled this way, the superblocks will have the correct byte order, and in future the array can be assembled in the normal way. I have a PowerPC based NAS device that was damaged by a recent a brownout. The drives in the array are fine, but the device's firmware was corrupted. My first attempt to assemble the array on my x86 system failed due to the endian differences. After applying this patch, the mdadm examine command worked perfectly. As of yet, I have not attempted to update the superblocks. I would like to leave the original superblocks intact so that I can reinstall the drives in my NAS device when it is repaired. Is it possible to assemble the array without overwriting the superblocks? -Brent Start your day with Yahoo! - make it your home page http://www.yahoo.com/r/hs - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: endianness of Linux kernel RAID
On Monday August 8, [EMAIL PROTECTED] wrote: I have a PowerPC based NAS device that was damaged by a recent a brownout. The drives in the array are fine, but the device's firmware was corrupted. My first attempt to assemble the array on my x86 system failed due to the endian differences. After applying this patch, the mdadm examine command worked perfectly. As of yet, I have not attempted to update the superblocks. I would like to leave the original superblocks intact so that I can reinstall the drives in my NAS device when it is repaired. Is it possible to assemble the array without overwriting the superblocks? Not really The kernel reads the superblocks to find out the details of the array. If it cannot read them, it won't assemble the array. Depending on the type of array, you could --build the array instead. This allows for arrays that deliberately don't have superblocks. It should work for raid0, might work for linear, could work for raid1, but no for anything else. If you want to try it, send me the --examine output. and I'll suggest a command. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html