RAID5 rebuild with a bad source drive fails
Hello linux-raid, I have a home fileserver which used a 6-disk RAID5 array with old disks and cheap IDE controllers (all disks are IDE masters). As was expected, sooner or later the old hardware (and/or cabling) began failing. The array falls apart, in particular currently it has 5 working disks and one marked as a spare (which was working before). The rebuild does not complete, because half-way through one of the working disks has a set of bad blocks (about 30 of them). When the rebuild process (or the mount process) hits these blocks, I get a non-running array with 4 working drives, one failed and one spare. While I can force-assemble the failing drive back into the array, it's not useful - rebuild fails again and again. Question 1: is there a superblock-edit function, or maybe an equivalent manual procedure, which would let me mark the spare drive as a working part of the array? It [mostly] has all the data in correct stripes; at least the event counters are all the same, and it may be a better working drive than the one with bad blocks. Even if I succeeded in editing all the superblocks to believe that the spare disk is okay now, would it help in my data recovery? :) Question 2: the disk's hardware apparetly fails to relocate the bad blocks. Is it possible for the metadevice layer to do the same - remap and/or ignore the bad blocks? In particular, is it possible for linux md to consider a block of data as a failed quantum, not the whole partition or disk, and try to use all 6 drives I have to deliver the usable data (at least in some sort of recovery mode)? -- Best regards, Jim Klimov mailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5 reshape/resync
Hi, I'm running 2.6.23.8 x86_64 using mdadm v2.6.4. I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] - md0) During that reshape (at around 4%) /dev/sdd reported read errors and went offline. I replaced /dev/sdd with a new drive and tried to reassemble the array (/dev/sdd was shown as removed and now as spare). Assembly worked but it would not run unless I use --force. Since I'm always reluctant to use force I put the bad disk back in, this time as /dev/sdg . I re-added the drive and could run the array. The array started to resync (since the disk can be read until 4%) and then I marked the disk as failed. Now the array is active, degraded, recovering: nas:~# mdadm -Q --detail /dev/md0 /dev/md0: Version : 00.91.03 Creation Time : Sat Sep 15 21:11:41 2007 Raid Level : raid5 Array Size : 1953234688 (1862.75 GiB 2000.11 GB) Used Dev Size : 488308672 (465.69 GiB 500.03 GB) Raid Devices : 6 Total Devices : 7 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Nov 24 10:10:46 2007 State : active, degraded, recovering Active Devices : 5 Working Devices : 6 Failed Devices : 1 Spare Devices : 1 Layout : left-symmetric Chunk Size : 16K Reshape Status : 19% complete Delta Devices : 1, (5-6) UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380 Events : 0.726347 Number Major Minor RaidDevice State 0 800 active sync /dev/sda 1 8 161 active sync /dev/sdb 2 8 322 active sync /dev/sdc 6 8 963 faulty spare rebuilding /dev/sdg 4 8 644 active sync /dev/sde 5 8 805 active sync /dev/sdf 7 8 48- spare /dev/sdd iostat: Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 129.48 1498.01 1201.59 7520 6032 sdb 134.86 1498.01 1201.59 7520 6032 sdc 127.69 1498.01 1201.59 7520 6032 sdd 0.40 0.00 3.19 0 16 sde 111.55 1498.01 1201.59 7520 6032 sdf 117.73 0.00 1201.59 0 6032 sdg 0.00 0.00 0.00 0 0 What I find somewhat confusing/disturbing is that does not appear to utilize /dev/sdd. What I see here could be explained by md doing a RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would have expected it to use the new spare sdd for that. Also the speed is unusually low which seems to indicate a lot of seeking as if two operations are happening at the same time. Also when I look at the data rates it looks more like the reshape is continuing even though one drive is missing (possible but risky). Can someone relief my doubts as to whether md does the right thing here? Thanks, #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # cakebox.homeunix.net - all the machine one needs.. pgpuZCEerJfLu.pgp Description: PGP Digital Signature
Re: md RAID 10 on Linux 2.6.20?
On Thu, 22 Nov 2007 22:09:27 -0500, [EMAIL PROTECTED] said: [ ... ] a RAID 10 personality defined in md that can be implemented using mdadm. If so, is it available in 2.6.20.11, [ ... ] Very good choice about 'raid10' in general. For a single layer just use '-l raid10'. Run 'man mdadm', the '-l' option and also the '-p' option for the more exotic variants. Also 'man 4 md' the RAID10 section. The pairs are formed naturally out of the block device list (first with second listed, and so on). 8 drive RAID 10 would actually consist of 5 md devices (four RAID 1's and one RAID 0). [ ... ] one RAID 10, that would of course be better both in terms of management and probably performance I would guess. [ ... ] Indeed easier in terms of management and there are some interesting options for layout. Not sure about performance, as sometimes I have seen strange interactions with the page cache either way, but usually '-l raid10' is the way to go as you say. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HELP! New disks being dropped from RAID 6 array on every reboot
Joshua Johnson wrote: Greetings, long time listener, first time caller. I recently replaced a disk in my existing 8 disk RAID 6 array. Previously, all disks were PATA drives connected to the motherboard IDE and 3 promise Ultra 100/133 controllers. I replaced one of the Promise controllers with a Via 64xx based controller, which has 2 SATA ports and one PATA port. I connected a new SATA drive to the new card, partitioned the drive and added it to the array. After 5 or 6 hours the resyncing process finished and the array showed up complete. Upon rebooting I discovered that the new drive had not been added to the array when it was assembled on boot. I resynced it and tried again -- still would not persist after a reboot. I moved one of the existing PATA drives to the new controller (so I could have the slot for network), rebooted and rebuilt the array. Now when I reboot BOTH disks are missing from the array (sda and sdb). Upon examining the disks it appears they think they are part of the array, but for some reason they are not being added when the array is being assembled. For example, this is a disk on the new controller which was not added to the array after rebooting: # mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 00.90.03 UUID : 63ee7d14:a0ac6a6e:aef6fe14:50e047a5 Creation Time : Thu Sep 21 23:52:19 2006 Raid Level : raid6 Device Size : 191157248 (182.30 GiB 195.75 GB) Array Size : 1146943488 (1093.81 GiB 1174.47 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 0 Update Time : Fri Nov 23 10:22:57 2007 State : clean Active Devices : 8 Working Devices : 8 Failed Devices : 0 Spare Devices : 0 Checksum : 50df590e - correct Events : 0.96419878 Chunk Size : 256K Number Major Minor RaidDevice State this 6 816 active sync /dev/sda1 0 0 320 active sync /dev/hda2 1 1 5721 active sync /dev/hdk2 2 2 3322 active sync /dev/hde2 3 3 3423 active sync /dev/hdg2 4 4 2224 active sync /dev/hdc2 5 5 5625 active sync /dev/hdi2 6 6 816 active sync /dev/sda1 7 7 8 177 active sync /dev/sdb1 Everything there seems to be correct and current up to the last shutdown. But the disk is not being added on boot. Examining a disk that is currently running in the array shows: # mdadm --examine /dev/hdc2 /dev/hdc2: Magic : a92b4efc Version : 00.90.03 UUID : 63ee7d14:a0ac6a6e:aef6fe14:50e047a5 Creation Time : Thu Sep 21 23:52:19 2006 Raid Level : raid6 Device Size : 191157248 (182.30 GiB 195.75 GB) Array Size : 1146943488 (1093.81 GiB 1174.47 GB) Raid Devices : 8 Total Devices : 6 Preferred Minor : 0 Update Time : Fri Nov 23 10:23:52 2007 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 2 Spare Devices : 0 Checksum : 50df5934 - correct Events : 0.96419880 Chunk Size : 256K Number Major Minor RaidDevice State this 4 2224 active sync /dev/hdc2 0 0 320 active sync /dev/hda2 1 1 5721 active sync /dev/hdk2 2 2 3322 active sync /dev/hde2 3 3 3423 active sync /dev/hdg2 4 4 2224 active sync /dev/hdc2 5 5 5625 active sync /dev/hdi2 6 6 006 faulty removed 7 7 007 faulty removed Here is my /etc/mdadm/mdadm.conf: DEVICE partitions PROGRAM /bin/echo MAILADDR redacted ARRAY /dev/md0 level=raid6 num-devices=8 UUID=63ee7d14:a0ac6a6e:aef6fe14:50e047a5 Can anyone see anything that is glaringly wrong here? Has anybody experienced similar behavior? I am running Debian using kernel 2.6.23.8. All partitions are set to type 0xFD and it appears the superblocks on the sd* disks were written, why wouldn't they be added to the array on boot? Any help is greatly appreciated! Does that match what's in the init files used at boot? By any chance does the information there explicitly list partitions by name? If you change to PARTITIONS in /etc/mdadm.conf it won't bite you until you change the detected partitions so they no longer match what was correct at install time. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: HELP! New disks being dropped from RAID 6 array on every reboot
On Nov 24, 2007 12:20 PM, Bill Davidsen [EMAIL PROTECTED] wrote: Does that match what's in the init files used at boot? By any chance does the information there explicitly list partitions by name? If you change to PARTITIONS in /etc/mdadm.conf it won't bite you until you change the detected partitions so they no longer match what was correct at install time. According to the man page, using 'partitions' as your DEVICE should cause mdadm to read /proc/partitions and scan all partitions listed there. The sda*/sdb* partitions were in /proc/partitions (at least after the machine fully booted) but for some reason when mdadm assembled the array it was not adding those partitions. Changing the DEVICE to '/dev/hd* /dev/sd*' rather than 'partitions' resolved the issue. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html