Re: Fwd: RAID5 Recovery
Neil Cavan wrote: > Thanks for taking a look, David. No problem. > Kernel: > 2.6.15-27-k7, stock for Ubuntu 6.06 LTS > > mdadm: > mdadm - v1.12.0 - 14 June 2005 OK - fairly old then. Not really worth trying to figure out why hdc got re-added when things had gone wrong. > You're right, earlier in /var/log/messages there's a notice that hdg > dropped, I missed it before. I use mdadm --monitor, but I recently > changed the target email address - I guess it didn't take properly. > > As for replacing hdc, thanks for the diagnosis but it won't help: the > drive is actually fine, as is hdg. I've replaced hdc before, only to > have the brand new hdc show the same behaviour, and SMART says the > drive is A-OK. There's something flaky about these PCI IDE > controllers. I think it's new system time. Any excuse eh? :) > Reiserfs recovery-wise: any suggestions? A simple fsck doesn't find a > file system superblock. Is --rebuild-sb the way to go here? No idea, sorry. I only ever tried Reiser once and it failed. It was very hard to get recovered so I swapped back to XFS. Good luck on the fscking David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: RAID5 Recovery
Thanks for taking a look, David. Kernel: 2.6.15-27-k7, stock for Ubuntu 6.06 LTS mdadm: mdadm - v1.12.0 - 14 June 2005 You're right, earlier in /var/log/messages there's a notice that hdg dropped, I missed it before. I use mdadm --monitor, but I recently changed the target email address - I guess it didn't take properly. As for replacing hdc, thanks for the diagnosis but it won't help: the drive is actually fine, as is hdg. I've replaced hdc before, only to have the brand new hdc show the same behaviour, and SMART says the drive is A-OK. There's something flaky about these PCI IDE controllers. I think it's new system time. Reiserfs recovery-wise: any suggestions? A simple fsck doesn't find a file system superblock. Is --rebuild-sb the way to go here? Thanks, Neil On Nov 14, 2007 5:58 AM, David Greaves <[EMAIL PROTECTED]> wrote: > Neil Cavan wrote: > > Hello, > Hi Neil > > What kernel version? > What mdadm version? > > > This morning, I woke up to find the array had kicked two disks. This > > time, though, /proc/mdstat showed one of the failed disks (U_U_U, one > > of the "_"s) had been marked as a spare - weird, since there are no > > spare drives in this array. I rebooted, and the array came back in the > > same state: one failed, one spare. I hot-removed and hot-added the > > spare drive, which put the array back to where I thought it should be > > ( still U_U_U, but with both "_"s marked as failed). Then I rebooted, > > and the array began rebuilding on its own. Usually I have to hot-add > > manually, so that struck me as a little odd, but I gave it no mind and > > went to work. Without checking the contents of the filesystem. Which > > turned out not to have been mounted on reboot. > OK > > > Because apparently things went horribly wrong. > Yep :( > > > Do I have any hope of recovering this data? Could rebuilding the > > reiserfs superblock help if the rebuild managed to corrupt the > > superblock but not the data? > See below > > > > > Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr: > > status=0x51 { DriveReady SeekComplete Error } > > > Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write > > due to I/O error on md0 > hdc1 fails > > > > Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout: > > Nov 13 02:01:06 localhost kernel: [17805775.196000] --- rd:5 wd:3 fd:2 > > Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 0, o:1, dev:hda1 > > Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 1, o:0, dev:hdc1 > > Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 2, o:1, dev:hde1 > > Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 4, o:1, dev:hdi1 > > hdg1 is already missing? > > > Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 conf printout: > > Nov 13 02:01:06 localhost kernel: [17805775.212000] --- rd:5 wd:3 fd:2 > > Nov 13 02:01:06 localhost kernel: [17805775.212000] disk 0, o:1, dev:hda1 > > Nov 13 02:01:06 localhost kernel: [17805775.212000] disk 2, o:1, dev:hde1 > > Nov 13 02:01:06 localhost kernel: [17805775.212000] disk 4, o:1, dev:hdi1 > > so now the array is bad. > > a reboot happens and: > > Nov 13 07:21:07 localhost kernel: [17179584.712000] md: md0 stopped. > > Nov 13 07:21:07 localhost kernel: [17179584.876000] md: bind > > Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind > > Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind > > Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind > > Nov 13 07:21:07 localhost kernel: [17179584.892000] md: bind > > Nov 13 07:21:07 localhost kernel: [17179584.892000] md: kicking > > non-fresh hdg1 from array! > > Nov 13 07:21:07 localhost kernel: [17179584.892000] md: unbind > > Nov 13 07:21:07 localhost kernel: [17179584.892000] md: export_rdev(hdg1) > > Nov 13 07:21:07 localhost kernel: [17179584.896000] raid5: allocated > > 5245kB for md0 > ... apparently hdc1 is OK? Hmmm. > > > Nov 13 07:21:07 localhost kernel: [17179665.524000] ReiserFS: md0: > > found reiserfs format "3.6" with standard journal > > Nov 13 07:21:07 localhost kernel: [17179676.136000] ReiserFS: md0: > > using ordered data mode > > Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0: > > journal params: device md0, size 8192, journal first block 18, max > > trans len 1024, max batch 900, max commit age 30, max trans age 30 > > Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0: > > checking transaction log (md0) > > Nov 13 07:21:07 localhost kernel: [17179676.828000] ReiserFS: md0: > > replayed 7 transactions in 1 seconds > > Nov 13 07:21:07 localhost kernel: [17179677.012000] ReiserFS: md0: > > Using r5 hash to sort names > > Nov 13 07:21:09 localhost kernel: [17179682.064000] lost page write > > due to I/O error on md0 > Reiser tries to mount/replay itself relying on hdc1 (which is partly bad) > > > Nov 13 07:25:39 localhost kernel: [17179584.828000] md: raid5 > > personality registered as nr 4 > > Nov 13 07:25:39 localh
Re: RAID5 Recovery
Neil Cavan wrote: > Hello, Hi Neil What kernel version? What mdadm version? > This morning, I woke up to find the array had kicked two disks. This > time, though, /proc/mdstat showed one of the failed disks (U_U_U, one > of the "_"s) had been marked as a spare - weird, since there are no > spare drives in this array. I rebooted, and the array came back in the > same state: one failed, one spare. I hot-removed and hot-added the > spare drive, which put the array back to where I thought it should be > ( still U_U_U, but with both "_"s marked as failed). Then I rebooted, > and the array began rebuilding on its own. Usually I have to hot-add > manually, so that struck me as a little odd, but I gave it no mind and > went to work. Without checking the contents of the filesystem. Which > turned out not to have been mounted on reboot. OK > Because apparently things went horribly wrong. Yep :( > Do I have any hope of recovering this data? Could rebuilding the > reiserfs superblock help if the rebuild managed to corrupt the > superblock but not the data? See below > Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr: > status=0x51 { DriveReady SeekComplete Error } > Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write > due to I/O error on md0 hdc1 fails > Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout: > Nov 13 02:01:06 localhost kernel: [17805775.196000] --- rd:5 wd:3 fd:2 > Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 0, o:1, dev:hda1 > Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 1, o:0, dev:hdc1 > Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 2, o:1, dev:hde1 > Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 4, o:1, dev:hdi1 hdg1 is already missing? > Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 conf printout: > Nov 13 02:01:06 localhost kernel: [17805775.212000] --- rd:5 wd:3 fd:2 > Nov 13 02:01:06 localhost kernel: [17805775.212000] disk 0, o:1, dev:hda1 > Nov 13 02:01:06 localhost kernel: [17805775.212000] disk 2, o:1, dev:hde1 > Nov 13 02:01:06 localhost kernel: [17805775.212000] disk 4, o:1, dev:hdi1 so now the array is bad. a reboot happens and: > Nov 13 07:21:07 localhost kernel: [17179584.712000] md: md0 stopped. > Nov 13 07:21:07 localhost kernel: [17179584.876000] md: bind > Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind > Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind > Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind > Nov 13 07:21:07 localhost kernel: [17179584.892000] md: bind > Nov 13 07:21:07 localhost kernel: [17179584.892000] md: kicking > non-fresh hdg1 from array! > Nov 13 07:21:07 localhost kernel: [17179584.892000] md: unbind > Nov 13 07:21:07 localhost kernel: [17179584.892000] md: export_rdev(hdg1) > Nov 13 07:21:07 localhost kernel: [17179584.896000] raid5: allocated > 5245kB for md0 ... apparently hdc1 is OK? Hmmm. > Nov 13 07:21:07 localhost kernel: [17179665.524000] ReiserFS: md0: > found reiserfs format "3.6" with standard journal > Nov 13 07:21:07 localhost kernel: [17179676.136000] ReiserFS: md0: > using ordered data mode > Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0: > journal params: device md0, size 8192, journal first block 18, max > trans len 1024, max batch 900, max commit age 30, max trans age 30 > Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0: > checking transaction log (md0) > Nov 13 07:21:07 localhost kernel: [17179676.828000] ReiserFS: md0: > replayed 7 transactions in 1 seconds > Nov 13 07:21:07 localhost kernel: [17179677.012000] ReiserFS: md0: > Using r5 hash to sort names > Nov 13 07:21:09 localhost kernel: [17179682.064000] lost page write > due to I/O error on md0 Reiser tries to mount/replay itself relying on hdc1 (which is partly bad) > Nov 13 07:25:39 localhost kernel: [17179584.828000] md: raid5 > personality registered as nr 4 > Nov 13 07:25:39 localhost kernel: [17179585.708000] md: kicking > non-fresh hdg1 from array! Another reboot... > Nov 13 07:25:40 localhost kernel: [17179666.064000] ReiserFS: md0: > found reiserfs format "3.6" with standard journal > Nov 13 07:25:40 localhost kernel: [17179676.904000] ReiserFS: md0: > using ordered data mode > Nov 13 07:25:40 localhost kernel: [17179676.928000] ReiserFS: md0: > journal params: device md0, size 8192, journal first block 18, max > trans len 1024, max batch 900, max commit age 30, max trans age 30 > Nov 13 07:25:40 localhost kernel: [17179676.932000] ReiserFS: md0: > checking transaction log (md0) > Nov 13 07:25:40 localhost kernel: [17179677.08] ReiserFS: md0: > Using r5 hash to sort names > Nov 13 07:25:42 localhost kernel: [17179683.128000] lost page write > due to I/O error on md0 Reiser tries again... > Nov 13 07:26:57 localhost kernel: [17179757.524000] md: unbind > Nov 13 07:26:57 localhost kernel: [17179757.524000] md: export_rdev(hdc1) > Nov 13 07:27:03 localhost kernel: [17
RAID5 Recovery
Hello, I have a 5-disk RAID5 array that has gone belly-up. It consists of 2x 2 disks on Promise PCI controllers, and one on the mobo controller. This array has been running for a couple years, and every so often (randomly, sometimes every couple weeks sometimes no problem for months) it will drop a drive. It's not a drive failure per se, it's something controller-related since the failures tend to happen in pairs and SMART gives the drives a clean bill of health. If it's only one drive, I can hot-add with no problem. If it's 2 drives my heart leaps into my mouth but I reboot, only one of the drives comes up as failed, and I can hot-add with no problem. The 2-drive case has happened a dozen times and my array is never any worse for the wear. This morning, I woke up to find the array had kicked two disks. This time, though, /proc/mdstat showed one of the failed disks (U_U_U, one of the "_"s) had been marked as a spare - weird, since there are no spare drives in this array. I rebooted, and the array came back in the same state: one failed, one spare. I hot-removed and hot-added the spare drive, which put the array back to where I thought it should be ( still U_U_U, but with both "_"s marked as failed). Then I rebooted, and the array began rebuilding on its own. Usually I have to hot-add manually, so that struck me as a little odd, but I gave it no mind and went to work. Without checking the contents of the filesystem. Which turned out not to have been mounted on reboot. Because apparently things went horribly wrong. The rebuild process ran its course. I now have an array that mdadm insists is peachy: --- md0 : active raid5 hda1[0] hdc1[1] hdi1[4] hdg1[3] hde1[2] 468872704 blocks level 5, 64k chunk, algorithm 2 [5/5] [U] unused devices: --- But there is no filesystem on /dev/md0: --- sudo mount -t reiserfs /dev/md0 /storage/ mount: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or other error --- Do I have any hope of recovering this data? Could rebuilding the reiserfs superblock help if the rebuild managed to corrupt the superblock but not the data? Any help is appreciated, below is the failure event in /var/log/messages, followed by the output of cat /var/log/messages | grep md. Thanks, Neil Cavan Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=11736, sector=1 1719 Nov 13 02:01:03 localhost kernel: [17805772.424000] ide: failed opcode was: unknown Nov 13 02:01:03 localhost kernel: [17805772.424000] end_request: I/O error, dev hdc, sector 11719 Nov 13 02:01:03 localhost kernel: [17805772.424000] R5: read error not correctable. Nov 13 02:01:03 localhost kernel: [17805772.464000] lost page write due to I/O error on md0 Nov 13 02:01:05 localhost kernel: [17805773.776000] hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 13 02:01:05 localhost kernel: [17805773.776000] hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=11736, sector=1 1727 Nov 13 02:01:05 localhost kernel: [17805773.776000] ide: failed opcode was: unknown Nov 13 02:01:05 localhost kernel: [17805773.776000] end_request: I/O error, dev hdc, sector 11727 Nov 13 02:01:05 localhost kernel: [17805773.776000] R5: read error not correctable. Nov 13 02:01:05 localhost kernel: [17805773.776000] lost page write due to I/O error on md0 Nov 13 02:01:06 localhost kernel: [17805775.156000] hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 13 02:01:06 localhost kernel: [17805775.156000] hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=11736, sector=1 1735 Nov 13 02:01:06 localhost kernel: [17805775.156000] ide: failed opcode was: unknown Nov 13 02:01:06 localhost kernel: [17805775.156000] end_request: I/O error, dev hdc, sector 11735 Nov 13 02:01:06 localhost kernel: [17805775.156000] R5: read error not correctable. Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write due to I/O error on md0 Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout: Nov 13 02:01:06 localhost kernel: [17805775.196000] --- rd:5 wd:3 fd:2 Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 0, o:1, dev:hda1 Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 1, o:0, dev:hdc1 Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 2, o:1, dev:hde1 Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 4, o:1, dev:hdi1 Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 c
Re: RAID5 Recovery
On Sunday October 22, [EMAIL PROTECTED] wrote: > The drives have not been repartitioned. > > I think what happened is that I created a new raid5 array over the old > one, but never synced or initialized it. If you created an array - whether it synced or not - the superblock would be written and --examine would have found them. So there must be something else that happened. Hard to know what. > > I'm leery of re-creating the array as you suggest, because I think > re-creating an array "over top" of my existing array is what got me into > trouble in the first place. > > Also, from mdadm man page (using v 1.12.0): > > --assume-clean > Tell mdadm that the array pre-existed and is known to be clean. > This is only really useful for Building RAID1 array. Only > use this if you really know what you are doing. This is currently > only supported for --build. > > This suggests to me that I can only use this to build a legacy array > without superblocks - which I don't want - and that since my array was > RAID5, that it's not "really useful", whatever that means. Oh, and also, > I don't really know what I'm doing. ;) --assume-clean was extended to --create in mdadm-2.2. > > If I do re-create the array to regenerate the superblocks, isn't it > important that I know the exact parameters of the pre-existing array, to > get the data to match up? chunk size, parity method, etc? Yes, but I would assume you just used the defaults. If not, you presumably know why you changed the defaults and can do it again??? In any case, creating the array with --assume-clean does not modify any data. It only overwrites the superblocks. As you currently don't have any superblock, you have nothing to lose. After you create the array you can try 'fsck' or other tools to see if the data is intact. If it is - good. If not, stop the array and try creating it with different parameters. > > I just don't want to rush in and mess things up. Did that once > already. ;) Very sensible. Assuming the partitions really are the same as they were before (can't hurt to triple-check) then I really thing '--create --assume-clean' is your best bet. Maybe download and compile the latest mdadm http://www.kernel.org/pub/linux/utils/raid/mdadm/ to make sure you have a working --assume-clean. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 Recovery
The drives have not been repartitioned. I think what happened is that I created a new raid5 array over the old one, but never synced or initialized it. I'm leery of re-creating the array as you suggest, because I think re-creating an array "over top" of my existing array is what got me into trouble in the first place. Also, from mdadm man page (using v 1.12.0): --assume-clean Tell mdadm that the array pre-existed and is known to be clean. This is only really useful for Building RAID1 array. Only use this if you really know what you are doing. This is currently only supported for --build. This suggests to me that I can only use this to build a legacy array without superblocks - which I don't want - and that since my array was RAID5, that it's not "really useful", whatever that means. Oh, and also, I don't really know what I'm doing. ;) If I do re-create the array to regenerate the superblocks, isn't it important that I know the exact parameters of the pre-existing array, to get the data to match up? chunk size, parity method, etc? I just don't want to rush in and mess things up. Did that once already. ;) Thanks, Neil On Mon, 2006-23-10 at 11:29 +1000, Neil Brown wrote: > On Saturday October 21, [EMAIL PROTECTED] wrote: > > Hi, > > > > I had a run-in with the Ubuntu Server installer, and in trying to get > > the new system to recognize the clean 5-disk raid5 array left behind by > > the previous Ubuntu system, I think I inadvertently instructed it to > > create a new raid array using those same partitions. > > > > What I know for sure is that now, I get this: > > > > [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hda1 > > mdadm: No super block found on /dev/hda1 (Expected magic a92b4efc, got > > ) > > [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdc1 > > mdadm: No super block found on /dev/hdc1 (Expected magic a92b4efc, got > > ) > > [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hde1 > > mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got > > ) > > [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdg1 > > mdadm: No super block found on /dev/hdg1 (Expected magic a92b4efc, got > > ) > > [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdi1 > > mdadm: No super block found on /dev/hdi1 (Expected magic a92b4efc, got > > ) > > > > I didn't format the partitions or write any data to the disk, so I think > > the array's data should be intact. Is there a way to recreate the > > superblocks, or am I hosed? > > Weirds Could the drives have been repartitioned in the process, > with the partitions being slightly different sizes or at slightly > different offsets? That might explain the disappearing superblocks, > and remaking the partitions might fix it. > > Or you can just re-create the array. Doing so won't destroy any data > that happens to be there. > To be on the safe side, create it with --assume-clean. This will avoid > a resync so you can be sure that no data blocks will be written at > all. > Then 'fsck -n' or mount readonly and see if you data is safe. > Once you are happy that you have the data safe you can trigger the > resync with >mdadm --assemble --update=resync . > or >echo resync > /sys/block/md0/md/sync_action > > (assuming it is 'md0'). > > Good luck. > > NeilBrown > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 Recovery
On Saturday October 21, [EMAIL PROTECTED] wrote: > Hi, > > I had a run-in with the Ubuntu Server installer, and in trying to get > the new system to recognize the clean 5-disk raid5 array left behind by > the previous Ubuntu system, I think I inadvertently instructed it to > create a new raid array using those same partitions. > > What I know for sure is that now, I get this: > > [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hda1 > mdadm: No super block found on /dev/hda1 (Expected magic a92b4efc, got > ) > [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdc1 > mdadm: No super block found on /dev/hdc1 (Expected magic a92b4efc, got > ) > [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hde1 > mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got > ) > [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdg1 > mdadm: No super block found on /dev/hdg1 (Expected magic a92b4efc, got > ) > [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdi1 > mdadm: No super block found on /dev/hdi1 (Expected magic a92b4efc, got > ) > > I didn't format the partitions or write any data to the disk, so I think > the array's data should be intact. Is there a way to recreate the > superblocks, or am I hosed? Weirds Could the drives have been repartitioned in the process, with the partitions being slightly different sizes or at slightly different offsets? That might explain the disappearing superblocks, and remaking the partitions might fix it. Or you can just re-create the array. Doing so won't destroy any data that happens to be there. To be on the safe side, create it with --assume-clean. This will avoid a resync so you can be sure that no data blocks will be written at all. Then 'fsck -n' or mount readonly and see if you data is safe. Once you are happy that you have the data safe you can trigger the resync with mdadm --assemble --update=resync . or echo resync > /sys/block/md0/md/sync_action (assuming it is 'md0'). Good luck. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID5 Recovery
Hi, I had a run-in with the Ubuntu Server installer, and in trying to get the new system to recognize the clean 5-disk raid5 array left behind by the previous Ubuntu system, I think I inadvertently instructed it to create a new raid array using those same partitions. What I know for sure is that now, I get this: [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hda1 mdadm: No super block found on /dev/hda1 (Expected magic a92b4efc, got ) [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdc1 mdadm: No super block found on /dev/hdc1 (Expected magic a92b4efc, got ) [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hde1 mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got ) [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdg1 mdadm: No super block found on /dev/hdg1 (Expected magic a92b4efc, got ) [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdi1 mdadm: No super block found on /dev/hdi1 (Expected magic a92b4efc, got ) I didn't format the partitions or write any data to the disk, so I think the array's data should be intact. Is there a way to recreate the superblocks, or am I hosed? Thanks, Neil - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Hello, I replaced the failed disk. The configuration is /dev/hde, /dev/hdf (replaced), on IDE channel 0, /dev/hdg, /dev/hdh on IDE channel 1, on a single PCI controller card. The issue here is that hde in now also not accessible after the failure of hdf. I cannot see the jumper configs as the server is at home, and I am at work. The general thinking was that the hde superblock got hosed with the loss of hdf. My initial post only did discuss the disk ordering and device names. As I had replaced the disk which had failed (in a previously fully functioning array), with a new disk with exactly the same configuration (jumpers, cable locations, etc), and each of the disks could be accessed, my thinking was that there would not be a hardware problem to sort through. Is this logic flawed? Thanks again, Nate Maurice Hilarius wrote: Nate Byrnes wrote: Hi All, I'm not sure that is entirely the case. From a hardware perspective, I can access all the disks from the OS, via fdisk and dd. It is really just mdadm that is failing. Would I still need to work the jumper issue? Thanks, Nate IF the disks are as we suspect (master and slave relationships) and IF you now have either a failed or a removed drive, then you MUST correct the jumpering. Sure, you can often see a disk that is misconfigured. It is almost certain, however, that when you write to it you will simply cause corruption on it. Of course, so far this is all speculation, as you have not actually said what the disks, controller interfaces, and jumpering and so forth are at. I was merely speculating, based on what you have said. No amount of software magic will "cure" a hardware problem.. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Nate Byrnes wrote: > Hi All, >I'm not sure that is entirely the case. From a hardware > perspective, I can access all the disks from the OS, via fdisk and dd. > It is really just mdadm that is failing. Would I still need to work > the jumper issue? >Thanks, >Nate > IF the disks are as we suspect (master and slave relationships) and IF you now have either a failed or a removed drive, then you MUST correct the jumpering. Sure, you can often see a disk that is misconfigured. It is almost certain, however, that when you write to it you will simply cause corruption on it. Of course, so far this is all speculation, as you have not actually said what the disks, controller interfaces, and jumpering and so forth are at. I was merely speculating, based on what you have said. No amount of software magic will "cure" a hardware problem.. -- With our best regards, Maurice W. HilariusTelephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:[EMAIL PROTECTED] Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Hi All, I'm not sure that is entirely the case. From a hardware perspective, I can access all the disks from the OS, via fdisk and dd. It is really just mdadm that is failing. Would I still need to work the jumper issue? Thanks, Nate Maurice Hilarius wrote: Nathanial Byrnes wrote: Yes, I did not have the funding nor approval to purchase more hardware when I set it up (read wife). Once it was working... the rest is history. OK, so if you have a pair of IDE disks, jumpered as Master and slave, and if one fails: If Master failed, re-jumper remaining disk on pair on same cable as Master, no slave present If Slave failed, re-jumper remaining disk on pair on same cable as Master, no slave present. Then you will have the remaining disk working normally, at least. When you can afford it I suggest buying a controller with enough ports to support the number of drives you have, with no Master/Slave pairing. Good luck ! And to the software guys trying to help: We need to start with the (obvious) hardware problem, before we advise on how to recover data from a borked system.. Once he has the jumpering on the drives sorted out, the drive that went missing will be back again.. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Nathanial Byrnes wrote: > Yes, I did not have the funding nor approval to purchase more hardware > when I set it up (read wife). Once it was working... the rest is > history. > > OK, so if you have a pair of IDE disks, jumpered as Master and slave, and if one fails: If Master failed, re-jumper remaining disk on pair on same cable as Master, no slave present If Slave failed, re-jumper remaining disk on pair on same cable as Master, no slave present. Then you will have the remaining disk working normally, at least. When you can afford it I suggest buying a controller with enough ports to support the number of drives you have, with no Master/Slave pairing. Good luck ! And to the software guys trying to help: We need to start with the (obvious) hardware problem, before we advise on how to recover data from a borked system.. Once he has the jumpering on the drives sorted out, the drive that went missing will be back again.. -- Regards, Maurice - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Yes, I did not have the funding nor approval to purchase more hardware when I set it up (read wife). Once it was working... the rest is history. On Tue, 2006-04-18 at 16:13 -0600, Maurice Hilarius wrote: > Nathanial Byrnes wrote: > > Hi All, > > Recently I lost a disk in my raid5 SW array. It seems that it took a > > second disk with it. The other disk appears to still be funtional (from > > an fdisk perspective...). I am trying to get the array to work in > > degraded mode via failed-disk in raidtab, but am always getting the > > following error: > > > > > Let me guess: > IDE disks, in pairs. > Jumpered as Master and Salve. > > Right? > > > > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Nathanial Byrnes wrote: > Hi All, > Recently I lost a disk in my raid5 SW array. It seems that it took a > second disk with it. The other disk appears to still be funtional (from > an fdisk perspective...). I am trying to get the array to work in > degraded mode via failed-disk in raidtab, but am always getting the > following error: > > Let me guess: IDE disks, in pairs. Jumpered as Master and Salve. Right? -- With our best regards, Maurice W. HilariusTelephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:[EMAIL PROTECTED] Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
2.4.1 behaves just like 2.1. so far nothing in the syslog or messages. On Tue, 2006-04-18 at 10:24 +1000, Neil Brown wrote: > On Monday April 17, [EMAIL PROTECTED] wrote: > > Unfortunately nothing changed. > > Weird... so hdf still reports as 'busy'? > Is it mentioned anywhere in /var/log/messages since reboot? > > What version of mdadm are you using? Try 2.4.1 and see if that works > differently. > > NeilBrown > > > > > > > On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote: > > > On Monday April 17, [EMAIL PROTECTED] wrote: > > > > Hi Neil, List, > > > > Am I just out of luck? Perhaps a full reboot? Something else? > > > > Thanks, > > > > Nate > > > > > > Reboot and try again seems like the best bet at this stage. > > > > > > NeilBrown > > > - > > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > > the body of a message to [EMAIL PROTECTED] > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > !DSPAM:31e693751804284693! > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
On Monday April 17, [EMAIL PROTECTED] wrote: > Unfortunately nothing changed. Weird... so hdf still reports as 'busy'? Is it mentioned anywhere in /var/log/messages since reboot? What version of mdadm are you using? Try 2.4.1 and see if that works differently. NeilBrown > > > On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote: > > On Monday April 17, [EMAIL PROTECTED] wrote: > > > Hi Neil, List, > > > Am I just out of luck? Perhaps a full reboot? Something else? > > > Thanks, > > > Nate > > > > Reboot and try again seems like the best bet at this stage. > > > > NeilBrown > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > !DSPAM:0c1a90901937570534! > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Unfortunately nothing changed. On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote: > On Monday April 17, [EMAIL PROTECTED] wrote: > > Hi Neil, List, > > Am I just out of luck? Perhaps a full reboot? Something else? > > Thanks, > > Nate > > Reboot and try again seems like the best bet at this stage. > > NeilBrown > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > > !DSPAM:0c1a90901937570534! > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
On Monday April 17, [EMAIL PROTECTED] wrote: > Hi Neil, List, > Am I just out of luck? Perhaps a full reboot? Something else? > Thanks, > Nate Reboot and try again seems like the best bet at this stage. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Hi Neil, List, Am I just out of luck? Perhaps a full reboot? Something else? Thanks, Nate Nate Byrnes wrote: Hi Neil, Nothing references hdf as you can see below. I have also rmmod'ed md and raid5 modules and modprobed them back in. Thoughts? Thanks again, Nate [EMAIL PROTECTED]:~# cat /proc/swaps FilenameTypeSize UsedPriority /dev/sdb2 partition 1050616 1028-1 [EMAIL PROTECTED]:~# cat /proc/mounts rootfs / rootfs rw 0 0 /dev/root / ext3 rw 0 0 proc /proc proc rw,nodiratime 0 0 sysfs /sys sysfs rw 0 0 none /dev ramfs rw 0 0 /dev/sdb1 /usr ext3 rw 0 0 devpts /dev/pts devpts rw 0 0 nfsd /proc/fs/nfsd nfsd rw 0 0 usbfs /proc/bus/usb usbfs rw 0 0 [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [raid5] md0 : inactive hdh[2] hdg[3] hde[1] 234451968 blocks unused devices: Neil Brown wrote: On Monday April 17, [EMAIL PROTECTED] wrote: What is /dev/hdf busy? Is it in use? mounted? something? Not that I am aware of. Here is the mount output: [EMAIL PROTECTED]:/etc# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) /dev/sdb1 on /usr type ext3 (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) nfsd on /proc/fs/nfsd type nfsd (rw) usbfs on /proc/bus/usb type usbfs (rw) lsof | grep hdf does not return any results. is there some other way to find out? cat /proc/swaps cat /proc/mounts cat /proc/mdstat as well as 'lsof' should find it. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html !DSPAM:444386c978211215816793! - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Hi Neil, Nothing references hdf as you can see below. I have also rmmod'ed md and raid5 modules and modprobed them back in. Thoughts? Thanks again, Nate [EMAIL PROTECTED]:~# cat /proc/swaps FilenameTypeSizeUsed Priority /dev/sdb2 partition 1050616 1028-1 [EMAIL PROTECTED]:~# cat /proc/mounts rootfs / rootfs rw 0 0 /dev/root / ext3 rw 0 0 proc /proc proc rw,nodiratime 0 0 sysfs /sys sysfs rw 0 0 none /dev ramfs rw 0 0 /dev/sdb1 /usr ext3 rw 0 0 devpts /dev/pts devpts rw 0 0 nfsd /proc/fs/nfsd nfsd rw 0 0 usbfs /proc/bus/usb usbfs rw 0 0 [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [raid5] md0 : inactive hdh[2] hdg[3] hde[1] 234451968 blocks unused devices: Neil Brown wrote: On Monday April 17, [EMAIL PROTECTED] wrote: What is /dev/hdf busy? Is it in use? mounted? something? Not that I am aware of. Here is the mount output: [EMAIL PROTECTED]:/etc# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) /dev/sdb1 on /usr type ext3 (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) nfsd on /proc/fs/nfsd type nfsd (rw) usbfs on /proc/bus/usb type usbfs (rw) lsof | grep hdf does not return any results. is there some other way to find out? cat /proc/swaps cat /proc/mounts cat /proc/mdstat as well as 'lsof' should find it. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html !DSPAM:44436e3576593808182809! - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
On Monday April 17, [EMAIL PROTECTED] wrote: > > > > What is /dev/hdf busy? Is it in use? mounted? something? > > > Not that I am aware of. Here is the mount output: > > [EMAIL PROTECTED]:/etc# mount > /dev/sda1 on / type ext3 (rw) > proc on /proc type proc (rw) > sysfs on /sys type sysfs (rw) > /dev/sdb1 on /usr type ext3 (rw) > devpts on /dev/pts type devpts (rw,gid=5,mode=620) > nfsd on /proc/fs/nfsd type nfsd (rw) > usbfs on /proc/bus/usb type usbfs (rw) > > lsof | grep hdf does not return any results. > > is there some other way to find out? cat /proc/swaps cat /proc/mounts cat /proc/mdstat as well as 'lsof' should find it. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Please see below. On Mon, 2006-04-17 at 13:04 +1000, Neil Brown wrote: > On Sunday April 16, [EMAIL PROTECTED] wrote: > > Hi Neil, > > Thanks for your reply. I tried that, but here is there error I > > received: > > > > [EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0 > > --uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh] > > mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy > > mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to > > start the array. > > What is /dev/hdf busy? Is it in use? mounted? something? > Not that I am aware of. Here is the mount output: [EMAIL PROTECTED]:/etc# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) /dev/sdb1 on /usr type ext3 (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) nfsd on /proc/fs/nfsd type nfsd (rw) usbfs on /proc/bus/usb type usbfs (rw) lsof | grep hdf does not return any results. is there some other way to find out? > > > > The output from lsraid against each device is as follows (I think that I > > messed up my superblocks pretty well...): > > Sorry, but I don't use lsraid and cannot tell anything useful from it's > output. ok > > NeilBrown > > !DSPAM:444305b971501811819476! > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
On Sunday April 16, [EMAIL PROTECTED] wrote: > Hi Neil, > Thanks for your reply. I tried that, but here is there error I > received: > > [EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0 > --uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh] > mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy > mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to > start the array. What is /dev/hdf busy? Is it in use? mounted? something? > > The output from lsraid against each device is as follows (I think that I > messed up my superblocks pretty well...): Sorry, but I don't use lsraid and cannot tell anything useful from it's output. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Hi Neil, Thanks for your reply. I tried that, but here is there error I received: [EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh] mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to start the array. The output from lsraid against each device is as follows (I think that I messed up my superblocks pretty well...): [EMAIL PROTECTED]:/etc# lsraid -d /dev/hde [dev 9, 0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2 offline [dev ?, ?] (unknown)... missing [dev ?, ?] (unknown)... missing [dev 34, 64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good [dev 34, 0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good [dev 33, 64] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 0] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 0] /dev/hde 38081921.59A998F9.64C1A001.EC534EF2 unbound [EMAIL PROTECTED]:/etc# lsraid -d /dev/hdf [dev 9, 0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2 offline [dev ?, ?] (unknown)... missing [dev ?, ?] (unknown)... missing [dev 34, 64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good [dev 34, 0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good [dev 33, 64] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 0] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 64] /dev/hdf 38081921.59A998F9.64C1A001.EC534EF2 unbound [EMAIL PROTECTED]:/etc# lsraid -d /dev/hdg [dev 9, 0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2 offline [dev ?, ?] (unknown)... missing [dev ?, ?] (unknown)... missing [dev 34, 64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good [dev 34, 0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good [dev 33, 64] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 0] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [EMAIL PROTECTED]:/etc# lsraid -d /dev/hdh [dev 9, 0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2 offline [dev ?, ?] (unknown)... missing [dev ?, ?] (unknown)... missing [dev 34, 64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good [dev 34, 0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good [dev 33, 64] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 0] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown Thanks again, Nate On Mon, 2006-04-17 at 08:46 +1000, Neil Brown wrote: > On Saturday April 15, [EMAIL PROTECTED] wrote: > > Hi All, > > Recently I lost a disk in my raid5 SW array. It seems that it took a > > second disk with it. The other disk appears to still be funtional (from > > an fdisk perspective...). I am trying to get the array to work in > > degraded mode via failed-disk in raidtab, but am always getting the > > following error: > > > > md: could not bd_claim hde. > > md: autostart failed! > > > > When I try to raidstart the array. Is it the case tha I had been running > > in degraded mode before the disk failure, and then lost the other disk? > > if so, how can I tell. > > raidstart is deprecated. It doesn't work reliably. Don't use it. > > > > > I have been messing about with mkraid -R and I have tried to > > add /dev/hdf (a new disk) back to the array. However, I am fairly > > confident that I have not kicked off the recovery process, so I am > > imagining that once I get the superblocks in order, I should be able to > > recover to the new disk? > > > > My system and raid config are: > > Kernel 2.6.13.1 > > Slack 10.2 > > RAID 5 which originally looked like: > > /dev/hde > > /dev/hdg > > /dev/hdi > > /dev/hdk > > > > but when I moved the disks to another box with fewer IDE controllers > > /dev/hde > > /dev/hdf > > /dev/hdg > > /dev/hdh > > > > How should I approach this? > > mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec534ef2 /dev/hd* > > If that doesn't work, add "--force" but be cautious of the data - do > an fsck atleast. > > NeilBrown > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > > !DSPAM:4442c93863991804284693! > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
On Saturday April 15, [EMAIL PROTECTED] wrote: > Hi All, > Recently I lost a disk in my raid5 SW array. It seems that it took a > second disk with it. The other disk appears to still be funtional (from > an fdisk perspective...). I am trying to get the array to work in > degraded mode via failed-disk in raidtab, but am always getting the > following error: > > md: could not bd_claim hde. > md: autostart failed! > > When I try to raidstart the array. Is it the case tha I had been running > in degraded mode before the disk failure, and then lost the other disk? > if so, how can I tell. raidstart is deprecated. It doesn't work reliably. Don't use it. > > I have been messing about with mkraid -R and I have tried to > add /dev/hdf (a new disk) back to the array. However, I am fairly > confident that I have not kicked off the recovery process, so I am > imagining that once I get the superblocks in order, I should be able to > recover to the new disk? > > My system and raid config are: > Kernel 2.6.13.1 > Slack 10.2 > RAID 5 which originally looked like: > /dev/hde > /dev/hdg > /dev/hdi > /dev/hdk > > but when I moved the disks to another box with fewer IDE controllers > /dev/hde > /dev/hdf > /dev/hdg > /dev/hdh > > How should I approach this? mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec534ef2 /dev/hd* If that doesn't work, add "--force" but be cautious of the data - do an fsck atleast. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID5 recovery trouble, bd_claim failed?
Hi All, Recently I lost a disk in my raid5 SW array. It seems that it took a second disk with it. The other disk appears to still be funtional (from an fdisk perspective...). I am trying to get the array to work in degraded mode via failed-disk in raidtab, but am always getting the following error: md: could not bd_claim hde. md: autostart failed! When I try to raidstart the array. Is it the case tha I had been running in degraded mode before the disk failure, and then lost the other disk? if so, how can I tell. I have been messing about with mkraid -R and I have tried to add /dev/hdf (a new disk) back to the array. However, I am fairly confident that I have not kicked off the recovery process, so I am imagining that once I get the superblocks in order, I should be able to recover to the new disk? My system and raid config are: Kernel 2.6.13.1 Slack 10.2 RAID 5 which originally looked like: /dev/hde /dev/hdg /dev/hdi /dev/hdk but when I moved the disks to another box with fewer IDE controllers /dev/hde /dev/hdf /dev/hdg /dev/hdh How should I approach this? Below is the output of mdadm --examine /dev/hd* Thanks in advance, Nate /dev/hde: Magic : a92b4efc Version : 00.90.00 UUID : 38081921:59a998f9:64c1a001:ec534ef2 Creation Time : Fri Aug 22 16:34:37 2003 Raid Level : raid5 Device Size : 78150656 (74.53 GiB 80.03 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Wed Apr 12 02:26:37 2006 State : active Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Checksum : 165c1b4c - correct Events : 0.37523832 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 1 3301 active sync /dev/hde 0 0 000 removed 1 1 3301 active sync /dev/hde 2 2 34 642 active sync /dev/hdh 3 3 3403 active sync /dev/hdg /dev/hdf: Magic : a92b4efc Version : 00.90.00 UUID : 38081921:59a998f9:64c1a001:ec534ef2 Creation Time : Fri Aug 22 16:34:37 2003 Raid Level : raid5 Device Size : 78150656 (74.53 GiB 80.03 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Wed Apr 12 02:26:37 2006 State : active Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Checksum : 165c1bc5 - correct Events : 0.37523832 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 3 33 64 -1 sync /dev/hdf 0 0 000 removed 1 1 3301 active sync /dev/hde 2 2 34 642 active sync /dev/hdh 3 3 33 64 -1 sync /dev/hdf /dev/hdg: Magic : a92b4efc Version : 00.90.00 UUID : 38081921:59a998f9:64c1a001:ec534ef2 Creation Time : Fri Aug 22 16:34:37 2003 Raid Level : raid5 Device Size : 78150656 (74.53 GiB 80.03 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Wed Apr 12 06:12:58 2006 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 3 Spare Devices : 0 Checksum : 1898e1fd - correct Events : 0.37523844 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 3 3403 active sync /dev/hdg 0 0 000 removed 1 1 001 faulty removed 2 2 34 642 active sync /dev/hdh 3 3 3403 active sync /dev/hdg /dev/hdh: Magic : a92b4efc Version : 00.90.00 UUID : 38081921:59a998f9:64c1a001:ec534ef2 Creation Time : Fri Aug 22 16:34:37 2003 Raid Level : raid5 Device Size : 78150656 (74.53 GiB 80.03 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Wed Apr 12 06:12:58 2006 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 3 Spare Devices : 0 Checksum : 1898e23b - correct Events : 0.37523844 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 2 34 642 active sync /dev/hdh 0 0 000 removed 1 1 001 faulty removed 2 2 34 642 active sync /dev/hdh 3 3 3403 active sync /dev/hdg - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/
Re: Help needed - RAID5 recovery from Power-fail - SOLVED
Thanks for all the help. I am now up and running again and have been stable for over a day. I will now install my new drive and add it to give me an array of three drives. I'll also learn more about Raid, mdadm and smartd so that I am better prepared next time. Thanks again Nigel Neil Brown wrote: > On Monday April 3, [EMAIL PROTECTED] wrote: > >> I wonder if you could help a Raid Newbie with a problem >> >> I had a power fail, and now I can't access my RAID array. It has been >> working fine for months until I lost power... Being a fool, I don't have >> a full backup, so I really need to get this data back. >> >> I run FC4 (64bit). >> I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array >> /dev/md0 on top of which I run lvm and mount the whole lot as /home. My >> intention was always to add another disk to this array, and I purchased >> one yesterday. >> > > 2 devices in a raid5?? Doesn't seem a lot of point it being raid5 > rather than raid1. > > >> When I boot, I get: >> >> md0 is not clean >> Cannot start dirty degraded array >> failed to run raid set md0 >> > > This tells use that the array is degraded. A dirty degraded array can > have undetectable data corruption. That is why it won't start it for > you. > However with only two devices, data corruption from this cause isn't > actually possible. > > The kernel parameter >md_mod.start_dirty_degraded=1 > will bypass this message and start the array anyway. > > Alternately: > mdadm -A --force /dev/md0 /dev/sd[ab]1 > > >> # mdadm --examine /dev/sda1 >> /dev/sda1: >> Magic : a92b4efc >> Version : 00.90.02 >>UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 >> Creation Time : Thu Dec 15 15:29:36 2005 >> Raid Level : raid5 >>Raid Devices : 2 >> Total Devices : 2 >> Preferred Minor : 0 >> >> Update Time : Tue Mar 21 06:25:52 2006 >> State : active >> Active Devices : 1 >> > > So at 06:25:52, there was only one working devices, while... > > > >> #mdadm --examine /dev/sdb1 >> /dev/sdb1: >> Magic : a92b4efc >> Version : 00.90.02 >>UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 >> Creation Time : Thu Dec 15 15:29:36 2005 >> Raid Level : raid5 >>Raid Devices : 2 >> Total Devices : 2 >> Preferred Minor : 0 >> >> Update Time : Tue Mar 21 06:23:57 2006 >> State : active >> Active Devices : 2 >> > > at 06:23:57 there were two. > > It looks like you lost a drive a while ago. Did you notice? > > Anyway, the 'mdadm' command I gave above should get the array working > again for you. Then you might want to >mdadm /dev/md0 -a /dev/sdb1 > is you trust /dev/sdb > > NeilBrown > > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help needed - RAID5 recovery from Power-fail
Neil Brown wrote: > 2 devices in a raid5?? Doesn't seem a lot of point it being raid5 > rather than raid1. Wouldn't a 2-dev raid5 imply a striped block mirror (i.e faster) rather than a raid1 duplicate block mirror (i.e. slower) ? Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help needed - RAID5 recovery from Power-fail
Neil Brown wrote: >On Monday April 3, [EMAIL PROTECTED] wrote: > > >>I wonder if you could help a Raid Newbie with a problem >> >> >It looks like you lost a drive a while ago. Did you notice? > This is not unusual - raid just keeps on going if a disk fails. When things are working again you really should read up on "mdadm -F" - it runs as a daemon and sends you mail if any raid events occur. See if FC4 has a script that automatically runs it - you may need to tweak some config parameters somewhere (I use Debian so I'm not much help). David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help needed - RAID5 recovery from Power-fail
On Monday April 3, [EMAIL PROTECTED] wrote: > I wonder if you could help a Raid Newbie with a problem > > I had a power fail, and now I can't access my RAID array. It has been > working fine for months until I lost power... Being a fool, I don't have > a full backup, so I really need to get this data back. > > I run FC4 (64bit). > I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array > /dev/md0 on top of which I run lvm and mount the whole lot as /home. My > intention was always to add another disk to this array, and I purchased > one yesterday. 2 devices in a raid5?? Doesn't seem a lot of point it being raid5 rather than raid1. > > When I boot, I get: > > md0 is not clean > Cannot start dirty degraded array > failed to run raid set md0 This tells use that the array is degraded. A dirty degraded array can have undetectable data corruption. That is why it won't start it for you. However with only two devices, data corruption from this cause isn't actually possible. The kernel parameter md_mod.start_dirty_degraded=1 will bypass this message and start the array anyway. Alternately: mdadm -A --force /dev/md0 /dev/sd[ab]1 > > # mdadm --examine /dev/sda1 > /dev/sda1: > Magic : a92b4efc > Version : 00.90.02 >UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 > Creation Time : Thu Dec 15 15:29:36 2005 > Raid Level : raid5 >Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 0 > > Update Time : Tue Mar 21 06:25:52 2006 > State : active > Active Devices : 1 So at 06:25:52, there was only one working devices, while... > > #mdadm --examine /dev/sdb1 > /dev/sdb1: > Magic : a92b4efc > Version : 00.90.02 >UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 > Creation Time : Thu Dec 15 15:29:36 2005 > Raid Level : raid5 >Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 0 > > Update Time : Tue Mar 21 06:23:57 2006 > State : active > Active Devices : 2 at 06:23:57 there were two. It looks like you lost a drive a while ago. Did you notice? Anyway, the 'mdadm' command I gave above should get the array working again for you. Then you might want to mdadm /dev/md0 -a /dev/sdb1 is you trust /dev/sdb NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Help needed - RAID5 recovery from Power-fail
I wonder if you could help a Raid Newbie with a problem I had a power fail, and now I can't access my RAID array. It has been working fine for months until I lost power... Being a fool, I don't have a full backup, so I really need to get this data back. I run FC4 (64bit). I have an array of two disks /dev/sda1 and /dev/sdb1 as a raid5 array /dev/md0 on top of which I run lvm and mount the whole lot as /home. My intention was always to add another disk to this array, and I purchased one yesterday. When I boot, I get: md0 is not clean Cannot start dirty degraded array failed to run raid set md0 I can provide the following extra information: # cat /proc/mdstat Personalities : [raid5] unused devices: # mdadm --query /dev/md0 /dev/md0: is an md device which is not active # mdadm --query /dev/md0 /dev/md0: is an md device which is not active /dev/md0: is too small to be an md component. # mdadm --query /dev/sda1 /dev/sda1: is not an md array /dev/sda1: device 0 in 2 device undetected raid5 md0. Use mdadm --examine for more detail. #mdadm --query /dev/sdb1 /dev/sdb1: is not an md array /dev/sdb1: device 1 in 2 device undetected raid5 md0. Use mdadm --examine for more detail. # mdadm --examine /dev/md0 mdadm: /dev/md0 is too small for md # mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 00.90.02 UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 Creation Time : Thu Dec 15 15:29:36 2005 Raid Level : raid5 Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Tue Mar 21 06:25:52 2006 State : active Active Devices : 1 Working Devices : 1 Failed Devices : 2 Spare Devices : 0 Checksum : 2ba99f09 - correct Events : 0.1498318 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 0 810 active sync /dev/sda1 0 0 810 active sync /dev/sda1 1 1 001 faulty removed #mdadm --examine /dev/sdb1 /dev/sdb1: Magic : a92b4efc Version : 00.90.02 UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 Creation Time : Thu Dec 15 15:29:36 2005 Raid Level : raid5 Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Tue Mar 21 06:23:57 2006 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 2ba99e95 - correct Events : 0.1498307 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 1 8 171 active sync /dev/sdb1 0 0 810 active sync /dev/sda1 1 1 8 171 active sync /dev/sdb1 It looks to me like there is no hardware problem, but maybe I am wrong. I cannot find any file /etc/mdadm.confnor /etc/raidtab. How would you suggest I proceed? I'm wary of doing anything (assemble, build, create) until I am sure it won't reset everything. Many Thanks Nigel - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 recovery fails
On Tuesday November 15, [EMAIL PROTECTED] wrote: > > mdadm --add /dev/md0 /dev/sda2.. > yes , i am using raidstart for this. it should be the same. No, it shouldn't. raidstart is broken by design and cannot work reliable. It is one of the main reasons that I wrote mdadm. raidstart trusts the device numbers (major and minor) that are stored in the superblock. If you pull a drive out these numbers change and raidstart fails miserably. # rm -f /usr/sbin/raidstart is probably a good idea. > I am handling a big cluster with supermicro machines,each machine has its > own 4 sata disks.I am using 2.6.6 kernel. > > Did you ever pulled out a disk from raid5 while the machine was > running ? Yes, several times. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 recovery fails
> mdadm --add /dev/md0 /dev/sda2.. yes , i am using raidstart for this. it should be the same. I am handling a big cluster with supermicro machines,each machine has its own 4 sata disks.I am using 2.6.6 kernel. Did you ever pulled out a disk from raid5 while the machine was running ? Just want to know before i dive into the raid code that it is realy a bug. On 11/14/05, Ross Vandegrift <[EMAIL PROTECTED]> wrote: > On Mon, Nov 14, 2005 at 09:27:25PM +0200, Raz Ben-Jehuda(caro) wrote: > > I have made the following test with my raid5: > > 1. created raid5 with 4 sata disks. > > 2. waited untill raid was fully initialized. > > 3. pulled a disk from the panel. > > 4. shut the system. > > 5. put back the disk. > > 6. turn on the system. > > > > The raid failed failed to recver. i got message from the md layer > > saying that it rejects the dirty disk. > > Anyone ? > > Did you re-add the disk to the array? > > # mdadm --add /dev/md0 /dev/sda2 > > Of course, substitude your appropriate devices for the ones that I > randomly chose ::-) > > > -- > Ross Vandegrift > [EMAIL PROTECTED] > > "The good Christian should beware of mathematicians, and all those who > make empty prophecies. The danger already exists that the mathematicians > have made a covenant with the devil to darken the spirit and to confine > man in the bonds of Hell." > --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 > -- Raz - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 recovery fails
On Mon, Nov 14, 2005 at 09:27:25PM +0200, Raz Ben-Jehuda(caro) wrote: > I have made the following test with my raid5: > 1. created raid5 with 4 sata disks. > 2. waited untill raid was fully initialized. > 3. pulled a disk from the panel. > 4. shut the system. > 5. put back the disk. > 6. turn on the system. > > The raid failed failed to recver. i got message from the md layer > saying that it rejects the dirty disk. > Anyone ? Did you re-add the disk to the array? # mdadm --add /dev/md0 /dev/sda2 Of course, substitude your appropriate devices for the ones that I randomly chose ::-) -- Ross Vandegrift [EMAIL PROTECTED] "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5 recovery fails
I have made the following test with my raid5: 1. created raid5 with 4 sata disks. 2. waited untill raid was fully initialized. 3. pulled a disk from the panel. 4. shut the system. 5. put back the disk. 6. turn on the system. The raid failed failed to recver. i got message from the md layer saying that it rejects the dirty disk. Anyone ? -- Raz - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html