Re: RAID5 Recovery
Neil Cavan wrote: Hello, Hi Neil What kernel version? What mdadm version? This morning, I woke up to find the array had kicked two disks. This time, though, /proc/mdstat showed one of the failed disks (U_U_U, one of the _s) had been marked as a spare - weird, since there are no spare drives in this array. I rebooted, and the array came back in the same state: one failed, one spare. I hot-removed and hot-added the spare drive, which put the array back to where I thought it should be ( still U_U_U, but with both _s marked as failed). Then I rebooted, and the array began rebuilding on its own. Usually I have to hot-add manually, so that struck me as a little odd, but I gave it no mind and went to work. Without checking the contents of the filesystem. Which turned out not to have been mounted on reboot. OK Because apparently things went horribly wrong. Yep :( Do I have any hope of recovering this data? Could rebuilding the reiserfs superblock help if the rebuild managed to corrupt the superblock but not the data? See below Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } snip Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write due to I/O error on md0 hdc1 fails Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout: Nov 13 02:01:06 localhost kernel: [17805775.196000] --- rd:5 wd:3 fd:2 Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 0, o:1, dev:hda1 Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 1, o:0, dev:hdc1 Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 2, o:1, dev:hde1 Nov 13 02:01:06 localhost kernel: [17805775.196000] disk 4, o:1, dev:hdi1 hdg1 is already missing? Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 conf printout: Nov 13 02:01:06 localhost kernel: [17805775.212000] --- rd:5 wd:3 fd:2 Nov 13 02:01:06 localhost kernel: [17805775.212000] disk 0, o:1, dev:hda1 Nov 13 02:01:06 localhost kernel: [17805775.212000] disk 2, o:1, dev:hde1 Nov 13 02:01:06 localhost kernel: [17805775.212000] disk 4, o:1, dev:hdi1 so now the array is bad. a reboot happens and: Nov 13 07:21:07 localhost kernel: [17179584.712000] md: md0 stopped. Nov 13 07:21:07 localhost kernel: [17179584.876000] md: bindhdc1 Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhde1 Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhdg1 Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bindhdi1 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: bindhda1 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: kicking non-fresh hdg1 from array! Nov 13 07:21:07 localhost kernel: [17179584.892000] md: unbindhdg1 Nov 13 07:21:07 localhost kernel: [17179584.892000] md: export_rdev(hdg1) Nov 13 07:21:07 localhost kernel: [17179584.896000] raid5: allocated 5245kB for md0 ... apparently hdc1 is OK? Hmmm. Nov 13 07:21:07 localhost kernel: [17179665.524000] ReiserFS: md0: found reiserfs format 3.6 with standard journal Nov 13 07:21:07 localhost kernel: [17179676.136000] ReiserFS: md0: using ordered data mode Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0: journal params: device md0, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0: checking transaction log (md0) Nov 13 07:21:07 localhost kernel: [17179676.828000] ReiserFS: md0: replayed 7 transactions in 1 seconds Nov 13 07:21:07 localhost kernel: [17179677.012000] ReiserFS: md0: Using r5 hash to sort names Nov 13 07:21:09 localhost kernel: [17179682.064000] lost page write due to I/O error on md0 Reiser tries to mount/replay itself relying on hdc1 (which is partly bad) Nov 13 07:25:39 localhost kernel: [17179584.828000] md: raid5 personality registered as nr 4 Nov 13 07:25:39 localhost kernel: [17179585.708000] md: kicking non-fresh hdg1 from array! Another reboot... Nov 13 07:25:40 localhost kernel: [17179666.064000] ReiserFS: md0: found reiserfs format 3.6 with standard journal Nov 13 07:25:40 localhost kernel: [17179676.904000] ReiserFS: md0: using ordered data mode Nov 13 07:25:40 localhost kernel: [17179676.928000] ReiserFS: md0: journal params: device md0, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Nov 13 07:25:40 localhost kernel: [17179676.932000] ReiserFS: md0: checking transaction log (md0) Nov 13 07:25:40 localhost kernel: [17179677.08] ReiserFS: md0: Using r5 hash to sort names Nov 13 07:25:42 localhost kernel: [17179683.128000] lost page write due to I/O error on md0 Reiser tries again... Nov 13 07:26:57 localhost kernel: [17179757.524000] md: unbindhdc1 Nov 13 07:26:57 localhost kernel: [17179757.524000] md: export_rdev(hdc1) Nov 13 07:27:03 localhost kernel: [17179763.70] md: bindhdc1 Nov 13 07:30:24
Re: RAID5 Recovery
On Saturday October 21, [EMAIL PROTECTED] wrote: Hi, I had a run-in with the Ubuntu Server installer, and in trying to get the new system to recognize the clean 5-disk raid5 array left behind by the previous Ubuntu system, I think I inadvertently instructed it to create a new raid array using those same partitions. What I know for sure is that now, I get this: [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hda1 mdadm: No super block found on /dev/hda1 (Expected magic a92b4efc, got ) [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdc1 mdadm: No super block found on /dev/hdc1 (Expected magic a92b4efc, got ) [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hde1 mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got ) [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdg1 mdadm: No super block found on /dev/hdg1 (Expected magic a92b4efc, got ) [EMAIL PROTECTED]:~$ sudo mdadm --examine /dev/hdi1 mdadm: No super block found on /dev/hdi1 (Expected magic a92b4efc, got ) I didn't format the partitions or write any data to the disk, so I think the array's data should be intact. Is there a way to recreate the superblocks, or am I hosed? Weirds Could the drives have been repartitioned in the process, with the partitions being slightly different sizes or at slightly different offsets? That might explain the disappearing superblocks, and remaking the partitions might fix it. Or you can just re-create the array. Doing so won't destroy any data that happens to be there. To be on the safe side, create it with --assume-clean. This will avoid a resync so you can be sure that no data blocks will be written at all. Then 'fsck -n' or mount readonly and see if you data is safe. Once you are happy that you have the data safe you can trigger the resync with mdadm --assemble --update=resync . or echo resync /sys/block/md0/md/sync_action (assuming it is 'md0'). Good luck. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Nathanial Byrnes wrote: Yes, I did not have the funding nor approval to purchase more hardware when I set it up (read wife). Once it was working... the rest is history. OK, so if you have a pair of IDE disks, jumpered as Master and slave, and if one fails: If Master failed, re-jumper remaining disk on pair on same cable as Master, no slave present If Slave failed, re-jumper remaining disk on pair on same cable as Master, no slave present. Then you will have the remaining disk working normally, at least. When you can afford it I suggest buying a controller with enough ports to support the number of drives you have, with no Master/Slave pairing. Good luck ! And to the software guys trying to help: We need to start with the (obvious) hardware problem, before we advise on how to recover data from a borked system.. Once he has the jumpering on the drives sorted out, the drive that went missing will be back again.. -- Regards, Maurice - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Hi All, I'm not sure that is entirely the case. From a hardware perspective, I can access all the disks from the OS, via fdisk and dd. It is really just mdadm that is failing. Would I still need to work the jumper issue? Thanks, Nate Maurice Hilarius wrote: Nathanial Byrnes wrote: Yes, I did not have the funding nor approval to purchase more hardware when I set it up (read wife). Once it was working... the rest is history. OK, so if you have a pair of IDE disks, jumpered as Master and slave, and if one fails: If Master failed, re-jumper remaining disk on pair on same cable as Master, no slave present If Slave failed, re-jumper remaining disk on pair on same cable as Master, no slave present. Then you will have the remaining disk working normally, at least. When you can afford it I suggest buying a controller with enough ports to support the number of drives you have, with no Master/Slave pairing. Good luck ! And to the software guys trying to help: We need to start with the (obvious) hardware problem, before we advise on how to recover data from a borked system.. Once he has the jumpering on the drives sorted out, the drive that went missing will be back again.. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Nate Byrnes wrote: Hi All, I'm not sure that is entirely the case. From a hardware perspective, I can access all the disks from the OS, via fdisk and dd. It is really just mdadm that is failing. Would I still need to work the jumper issue? Thanks, Nate IF the disks are as we suspect (master and slave relationships) and IF you now have either a failed or a removed drive, then you MUST correct the jumpering. Sure, you can often see a disk that is misconfigured. It is almost certain, however, that when you write to it you will simply cause corruption on it. Of course, so far this is all speculation, as you have not actually said what the disks, controller interfaces, and jumpering and so forth are at. I was merely speculating, based on what you have said. No amount of software magic will cure a hardware problem.. -- With our best regards, Maurice W. HilariusTelephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:[EMAIL PROTECTED] Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Hello, I replaced the failed disk. The configuration is /dev/hde, /dev/hdf (replaced), on IDE channel 0, /dev/hdg, /dev/hdh on IDE channel 1, on a single PCI controller card. The issue here is that hde in now also not accessible after the failure of hdf. I cannot see the jumper configs as the server is at home, and I am at work. The general thinking was that the hde superblock got hosed with the loss of hdf. My initial post only did discuss the disk ordering and device names. As I had replaced the disk which had failed (in a previously fully functioning array), with a new disk with exactly the same configuration (jumpers, cable locations, etc), and each of the disks could be accessed, my thinking was that there would not be a hardware problem to sort through. Is this logic flawed? Thanks again, Nate Maurice Hilarius wrote: Nate Byrnes wrote: Hi All, I'm not sure that is entirely the case. From a hardware perspective, I can access all the disks from the OS, via fdisk and dd. It is really just mdadm that is failing. Would I still need to work the jumper issue? Thanks, Nate IF the disks are as we suspect (master and slave relationships) and IF you now have either a failed or a removed drive, then you MUST correct the jumpering. Sure, you can often see a disk that is misconfigured. It is almost certain, however, that when you write to it you will simply cause corruption on it. Of course, so far this is all speculation, as you have not actually said what the disks, controller interfaces, and jumpering and so forth are at. I was merely speculating, based on what you have said. No amount of software magic will cure a hardware problem.. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
2.4.1 behaves just like 2.1. so far nothing in the syslog or messages. On Tue, 2006-04-18 at 10:24 +1000, Neil Brown wrote: On Monday April 17, [EMAIL PROTECTED] wrote: Unfortunately nothing changed. Weird... so hdf still reports as 'busy'? Is it mentioned anywhere in /var/log/messages since reboot? What version of mdadm are you using? Try 2.4.1 and see if that works differently. NeilBrown On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote: On Monday April 17, [EMAIL PROTECTED] wrote: Hi Neil, List, Am I just out of luck? Perhaps a full reboot? Something else? Thanks, Nate Reboot and try again seems like the best bet at this stage. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html !DSPAM:31e693751804284693! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Nathanial Byrnes wrote: Hi All, Recently I lost a disk in my raid5 SW array. It seems that it took a second disk with it. The other disk appears to still be funtional (from an fdisk perspective...). I am trying to get the array to work in degraded mode via failed-disk in raidtab, but am always getting the following error: Let me guess: IDE disks, in pairs. Jumpered as Master and Salve. Right? -- With our best regards, Maurice W. HilariusTelephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:[EMAIL PROTECTED] Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Yes, I did not have the funding nor approval to purchase more hardware when I set it up (read wife). Once it was working... the rest is history. On Tue, 2006-04-18 at 16:13 -0600, Maurice Hilarius wrote: Nathanial Byrnes wrote: Hi All, Recently I lost a disk in my raid5 SW array. It seems that it took a second disk with it. The other disk appears to still be funtional (from an fdisk perspective...). I am trying to get the array to work in degraded mode via failed-disk in raidtab, but am always getting the following error: Let me guess: IDE disks, in pairs. Jumpered as Master and Salve. Right? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Please see below. On Mon, 2006-04-17 at 13:04 +1000, Neil Brown wrote: On Sunday April 16, [EMAIL PROTECTED] wrote: Hi Neil, Thanks for your reply. I tried that, but here is there error I received: [EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh] mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to start the array. What is /dev/hdf busy? Is it in use? mounted? something? Not that I am aware of. Here is the mount output: [EMAIL PROTECTED]:/etc# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) /dev/sdb1 on /usr type ext3 (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) nfsd on /proc/fs/nfsd type nfsd (rw) usbfs on /proc/bus/usb type usbfs (rw) lsof | grep hdf does not return any results. is there some other way to find out? The output from lsraid against each device is as follows (I think that I messed up my superblocks pretty well...): Sorry, but I don't use lsraid and cannot tell anything useful from it's output. ok NeilBrown !DSPAM:444305b971501811819476! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
On Monday April 17, [EMAIL PROTECTED] wrote: What is /dev/hdf busy? Is it in use? mounted? something? Not that I am aware of. Here is the mount output: [EMAIL PROTECTED]:/etc# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) /dev/sdb1 on /usr type ext3 (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) nfsd on /proc/fs/nfsd type nfsd (rw) usbfs on /proc/bus/usb type usbfs (rw) lsof | grep hdf does not return any results. is there some other way to find out? cat /proc/swaps cat /proc/mounts cat /proc/mdstat as well as 'lsof' should find it. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Hi Neil, Nothing references hdf as you can see below. I have also rmmod'ed md and raid5 modules and modprobed them back in. Thoughts? Thanks again, Nate [EMAIL PROTECTED]:~# cat /proc/swaps FilenameTypeSizeUsed Priority /dev/sdb2 partition 1050616 1028-1 [EMAIL PROTECTED]:~# cat /proc/mounts rootfs / rootfs rw 0 0 /dev/root / ext3 rw 0 0 proc /proc proc rw,nodiratime 0 0 sysfs /sys sysfs rw 0 0 none /dev ramfs rw 0 0 /dev/sdb1 /usr ext3 rw 0 0 devpts /dev/pts devpts rw 0 0 nfsd /proc/fs/nfsd nfsd rw 0 0 usbfs /proc/bus/usb usbfs rw 0 0 [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [raid5] md0 : inactive hdh[2] hdg[3] hde[1] 234451968 blocks unused devices: none Neil Brown wrote: On Monday April 17, [EMAIL PROTECTED] wrote: What is /dev/hdf busy? Is it in use? mounted? something? Not that I am aware of. Here is the mount output: [EMAIL PROTECTED]:/etc# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) /dev/sdb1 on /usr type ext3 (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) nfsd on /proc/fs/nfsd type nfsd (rw) usbfs on /proc/bus/usb type usbfs (rw) lsof | grep hdf does not return any results. is there some other way to find out? cat /proc/swaps cat /proc/mounts cat /proc/mdstat as well as 'lsof' should find it. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html !DSPAM:44436e3576593808182809! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Hi Neil, List, Am I just out of luck? Perhaps a full reboot? Something else? Thanks, Nate Nate Byrnes wrote: Hi Neil, Nothing references hdf as you can see below. I have also rmmod'ed md and raid5 modules and modprobed them back in. Thoughts? Thanks again, Nate [EMAIL PROTECTED]:~# cat /proc/swaps FilenameTypeSize UsedPriority /dev/sdb2 partition 1050616 1028-1 [EMAIL PROTECTED]:~# cat /proc/mounts rootfs / rootfs rw 0 0 /dev/root / ext3 rw 0 0 proc /proc proc rw,nodiratime 0 0 sysfs /sys sysfs rw 0 0 none /dev ramfs rw 0 0 /dev/sdb1 /usr ext3 rw 0 0 devpts /dev/pts devpts rw 0 0 nfsd /proc/fs/nfsd nfsd rw 0 0 usbfs /proc/bus/usb usbfs rw 0 0 [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [raid5] md0 : inactive hdh[2] hdg[3] hde[1] 234451968 blocks unused devices: none Neil Brown wrote: On Monday April 17, [EMAIL PROTECTED] wrote: What is /dev/hdf busy? Is it in use? mounted? something? Not that I am aware of. Here is the mount output: [EMAIL PROTECTED]:/etc# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) /dev/sdb1 on /usr type ext3 (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) nfsd on /proc/fs/nfsd type nfsd (rw) usbfs on /proc/bus/usb type usbfs (rw) lsof | grep hdf does not return any results. is there some other way to find out? cat /proc/swaps cat /proc/mounts cat /proc/mdstat as well as 'lsof' should find it. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html !DSPAM:444386c978211215816793! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
On Monday April 17, [EMAIL PROTECTED] wrote: Hi Neil, List, Am I just out of luck? Perhaps a full reboot? Something else? Thanks, Nate Reboot and try again seems like the best bet at this stage. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Unfortunately nothing changed. On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote: On Monday April 17, [EMAIL PROTECTED] wrote: Hi Neil, List, Am I just out of luck? Perhaps a full reboot? Something else? Thanks, Nate Reboot and try again seems like the best bet at this stage. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html !DSPAM:0c1a90901937570534! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
On Monday April 17, [EMAIL PROTECTED] wrote: Unfortunately nothing changed. Weird... so hdf still reports as 'busy'? Is it mentioned anywhere in /var/log/messages since reboot? What version of mdadm are you using? Try 2.4.1 and see if that works differently. NeilBrown On Tue, 2006-04-18 at 07:43 +1000, Neil Brown wrote: On Monday April 17, [EMAIL PROTECTED] wrote: Hi Neil, List, Am I just out of luck? Perhaps a full reboot? Something else? Thanks, Nate Reboot and try again seems like the best bet at this stage. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html !DSPAM:0c1a90901937570534! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
On Saturday April 15, [EMAIL PROTECTED] wrote: Hi All, Recently I lost a disk in my raid5 SW array. It seems that it took a second disk with it. The other disk appears to still be funtional (from an fdisk perspective...). I am trying to get the array to work in degraded mode via failed-disk in raidtab, but am always getting the following error: md: could not bd_claim hde. md: autostart failed! When I try to raidstart the array. Is it the case tha I had been running in degraded mode before the disk failure, and then lost the other disk? if so, how can I tell. raidstart is deprecated. It doesn't work reliably. Don't use it. I have been messing about with mkraid -R and I have tried to add /dev/hdf (a new disk) back to the array. However, I am fairly confident that I have not kicked off the recovery process, so I am imagining that once I get the superblocks in order, I should be able to recover to the new disk? My system and raid config are: Kernel 2.6.13.1 Slack 10.2 RAID 5 which originally looked like: /dev/hde /dev/hdg /dev/hdi /dev/hdk but when I moved the disks to another box with fewer IDE controllers /dev/hde /dev/hdf /dev/hdg /dev/hdh How should I approach this? mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec534ef2 /dev/hd* If that doesn't work, add --force but be cautious of the data - do an fsck atleast. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
Hi Neil, Thanks for your reply. I tried that, but here is there error I received: [EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh] mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to start the array. The output from lsraid against each device is as follows (I think that I messed up my superblocks pretty well...): [EMAIL PROTECTED]:/etc# lsraid -d /dev/hde [dev 9, 0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2 offline [dev ?, ?] (unknown)... missing [dev ?, ?] (unknown)... missing [dev 34, 64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good [dev 34, 0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good [dev 33, 64] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 0] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 0] /dev/hde 38081921.59A998F9.64C1A001.EC534EF2 unbound [EMAIL PROTECTED]:/etc# lsraid -d /dev/hdf [dev 9, 0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2 offline [dev ?, ?] (unknown)... missing [dev ?, ?] (unknown)... missing [dev 34, 64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good [dev 34, 0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good [dev 33, 64] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 0] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 64] /dev/hdf 38081921.59A998F9.64C1A001.EC534EF2 unbound [EMAIL PROTECTED]:/etc# lsraid -d /dev/hdg [dev 9, 0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2 offline [dev ?, ?] (unknown)... missing [dev ?, ?] (unknown)... missing [dev 34, 64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good [dev 34, 0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good [dev 33, 64] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 0] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [EMAIL PROTECTED]:/etc# lsraid -d /dev/hdh [dev 9, 0] /dev/md/038081921.59A998F9.64C1A001.EC534EF2 offline [dev ?, ?] (unknown)... missing [dev ?, ?] (unknown)... missing [dev 34, 64] /dev/hdh 38081921.59A998F9.64C1A001.EC534EF2 good [dev 34, 0] /dev/hdg 38081921.59A998F9.64C1A001.EC534EF2 good [dev 33, 64] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown [dev 33, 0] (unknown)38081921.59A998F9.64C1A001.EC534EF2 unknown Thanks again, Nate On Mon, 2006-04-17 at 08:46 +1000, Neil Brown wrote: On Saturday April 15, [EMAIL PROTECTED] wrote: Hi All, Recently I lost a disk in my raid5 SW array. It seems that it took a second disk with it. The other disk appears to still be funtional (from an fdisk perspective...). I am trying to get the array to work in degraded mode via failed-disk in raidtab, but am always getting the following error: md: could not bd_claim hde. md: autostart failed! When I try to raidstart the array. Is it the case tha I had been running in degraded mode before the disk failure, and then lost the other disk? if so, how can I tell. raidstart is deprecated. It doesn't work reliably. Don't use it. I have been messing about with mkraid -R and I have tried to add /dev/hdf (a new disk) back to the array. However, I am fairly confident that I have not kicked off the recovery process, so I am imagining that once I get the superblocks in order, I should be able to recover to the new disk? My system and raid config are: Kernel 2.6.13.1 Slack 10.2 RAID 5 which originally looked like: /dev/hde /dev/hdg /dev/hdi /dev/hdk but when I moved the disks to another box with fewer IDE controllers /dev/hde /dev/hdf /dev/hdg /dev/hdh How should I approach this? mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec534ef2 /dev/hd* If that doesn't work, add --force but be cautious of the data - do an fsck atleast. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html !DSPAM:4442c93863991804284693! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 recovery trouble, bd_claim failed?
On Sunday April 16, [EMAIL PROTECTED] wrote: Hi Neil, Thanks for your reply. I tried that, but here is there error I received: [EMAIL PROTECTED]:/etc# mdadm --assemble /dev/md0 --uuid=38081921:59a998f9:64c1a001:ec53 4ef2 /dev/hd[efgh] mdadm: failed to add /dev/hdf to /dev/md0: Device or resource busy mdadm: /dev/md0 assembled from 2 drives and -1 spares - not enough to start the array. What is /dev/hdf busy? Is it in use? mounted? something? The output from lsraid against each device is as follows (I think that I messed up my superblocks pretty well...): Sorry, but I don't use lsraid and cannot tell anything useful from it's output. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 recovery fails
On Mon, Nov 14, 2005 at 09:27:25PM +0200, Raz Ben-Jehuda(caro) wrote: I have made the following test with my raid5: 1. created raid5 with 4 sata disks. 2. waited untill raid was fully initialized. 3. pulled a disk from the panel. 4. shut the system. 5. put back the disk. 6. turn on the system. The raid failed failed to recver. i got message from the md layer saying that it rejects the dirty disk. Anyone ? Did you re-add the disk to the array? # mdadm --add /dev/md0 /dev/sda2 Of course, substitude your appropriate devices for the ones that I randomly chose ::-) -- Ross Vandegrift [EMAIL PROTECTED] The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell. --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html