Re: Raid5 with two failed disks?
Its a nice complicated case of semaphores in threaded (multi process?) systems ... ... one system needs to be aware that the other system isn't ready yet, without causing incompatibilities. With RAID, would it be possible for the MD driver to actually accept the mount request but halt the process until the driver was ready to actually give data? Jakob Østergaard wrote: I think my situation is the same as this "two failed disks" one but I haven't been following the thread carefully and I just want to double check. I have a mirrored RAID-1 setup between 2 disks with no spare disks. Inadvertantly the machine got powered down without a proper shutdown apparently causing the RAID to become unhappy. It would boot to the point where it needed to mount root and then would fail saying that it couldn't access /dev/md1 because the two RAID disks were out of sync. Anyway, given this situation, how can I rebuild my array? Is all it takes is doing another mkraid (given the raidtab is identical to the real setup, etc)? If so, since I'm also booting off of raid, how do I do this for the boot partition? I can boot up using one of the individual disks (e.g. /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be able to do a mkraid on an in-use partition? If not, how do I resolve this (boot from floppy?). Finally, is there any way to automate this recovery process. That is, if the machine is improperly powered down again, can I have it automatically rebuild itself the next time it comes up? As others already pointed out, this doesn't make sense. The boot sequence uses the mount command to mount your fs, and mount doesn't know that your md device is in any way different from other block devices. Only if the md device doesn't start, the mount program will be unable to request the kernel to mount the device. We definitely need log output in order to tell what happened and why. -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...: -- _/~-=##=-~\_ -=+0+=- Michael T. Babcock -=+0+=- ~\_-=##=-_/~ http://www.linuxsupportline.com/~pgp/ ICQ: 4835018
Re: Raid5 with two failed disks?
On Mon, 03 Apr 2000, Rainer Mager wrote: Hi all, I think my situation is the same as this "two failed disks" one but I haven't been following the thread carefully and I just want to double check. I have a mirrored RAID-1 setup between 2 disks with no spare disks. Inadvertantly the machine got powered down without a proper shutdown apparently causing the RAID to become unhappy. It would boot to the point where it needed to mount root and then would fail saying that it couldn't access /dev/md1 because the two RAID disks were out of sync. Anyway, given this situation, how can I rebuild my array? Is all it takes is doing another mkraid (given the raidtab is identical to the real setup, etc)? If so, since I'm also booting off of raid, how do I do this for the boot partition? I can boot up using one of the individual disks (e.g. /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be able to do a mkraid on an in-use partition? If not, how do I resolve this (boot from floppy?). Finally, is there any way to automate this recovery process. That is, if the machine is improperly powered down again, can I have it automatically rebuild itself the next time it comes up? As others already pointed out, this doesn't make sense. The boot sequence uses the mount command to mount your fs, and mount doesn't know that your md device is in any way different from other block devices. Only if the md device doesn't start, the mount program will be unable to request the kernel to mount the device. We definitely need log output in order to tell what happened and why. -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...:
Re: Raid5 with two failed disks?
On Sun, 02 Apr 2000, Marc Haber wrote: On Sat, 1 Apr 2000 12:44:49 +0200, you wrote: It _is_ in the docs. Which docs do you refer to? I must have missed this. Section 6.1 in http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/ Didn't you actually mention it yourself ? :) (don't remember - someone mentioned it at least...) -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...:
Re: Raid5 with two failed disks?
On Sun, 2 Apr 2000 15:28:28 +0200, you wrote: On Sun, 02 Apr 2000, Marc Haber wrote: On Sat, 1 Apr 2000 12:44:49 +0200, you wrote: It _is_ in the docs. Which docs do you refer to? I must have missed this. Section 6.1 in http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/ Didn't you actually mention it yourself ? :) Yes, I did. However, I'd add a sentence mentioning that in this case mkraid probably won't be destructive to the HOWTO. After the mkraid warning, I aborted the procedure and started asking. I think this should be avoided in the future. Greetings Marc -- -- !! No courtesy copies, please !! - Marc Haber | " Questions are the | Mailadresse im Header Karlsruhe, Germany | Beginning of Wisdom " | Fon: *49 721 966 32 15 Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29
Re: Raid5 with two failed disks?
On Sun, 02 Apr 2000, Marc Haber wrote: [snip] Yes, I did. However, I'd add a sentence mentioning that in this case mkraid probably won't be destructive to the HOWTO. After the mkraid warning, I aborted the procedure and started asking. I think this should be avoided in the future. I have added this to my FIX file for the next revision of the HOWTO. Thanks, -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...:
RE: Raid5 with two failed disks?
Hi all, I think my situation is the same as this "two failed disks" one but I haven't been following the thread carefully and I just want to double check. I have a mirrored RAID-1 setup between 2 disks with no spare disks. Inadvertantly the machine got powered down without a proper shutdown apparently causing the RAID to become unhappy. It would boot to the point where it needed to mount root and then would fail saying that it couldn't access /dev/md1 because the two RAID disks were out of sync. Anyway, given this situation, how can I rebuild my array? Is all it takes is doing another mkraid (given the raidtab is identical to the real setup, etc)? If so, since I'm also booting off of raid, how do I do this for the boot partition? I can boot up using one of the individual disks (e.g. /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be able to do a mkraid on an in-use partition? If not, how do I resolve this (boot from floppy?). Finally, is there any way to automate this recovery process. That is, if the machine is improperly powered down again, can I have it automatically rebuild itself the next time it comes up? Thanks in advance, --Rainer
RE: Raid5 with two failed disks?
On Mon, 3 Apr 2000, Rainer Mager wrote: I think my situation is the same as this "two failed disks" one but I haven't been following the thread carefully and I just want to double check. I have a mirrored RAID-1 setup between 2 disks with no spare disks. Inadvertantly the machine got powered down without a proper shutdown apparently causing the RAID to become unhappy. It would boot to the point where it needed to mount root and then would fail saying that it couldn't access /dev/md1 because the two RAID disks were out of sync. Anyway, given this situation, how can I rebuild my array? Is all it takes is doing another mkraid (given the raidtab is identical to the real setup, etc)? If so, since I'm also booting off of raid, how do I do this for the boot partition? I can boot up using one of the individual disks (e.g. /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be able to do a mkraid on an in-use partition? If not, how do I resolve this (boot from floppy?). Finally, is there any way to automate this recovery process. That is, if the machine is improperly powered down again, can I have it automatically rebuild itself the next time it comes up? Whether or not the array is in sync should not make a difference to the boot process. I have both raid1 and raid 5 systems that run root raid and will boot quite nicely and rsync automatically after a "dumb" shutdown that leaves them out of sync. Do you have your kernel built for auto raid start?? and partitions marked "fd" ? You can reconstruct you existing array by booting with a kernel that supports raid and with the raid tools on the rescue system. Do it all the time. Michael
RE: Raid5 with two failed disks?
Hmm, well, I'm certainly not positive why it wouldn't boot and I don't have the logs in front of me, but I do remember it saying that it couldn't mount /dev/md1 and therefore had a panic during boot. My solution was to specify the root device as /dev/sda1 instead of the configured /dev/md1 from the lilo prompt. The disk is marked to auto raid start and marked as fd. And, it booted just fine until the "dumb" shutdown. As for a rescue disk I'll put one together. Thanks for the advice. --Rainer -Original Message- From: Michael Robinton [mailto:[EMAIL PROTECTED]] Sent: Monday, April 03, 2000 8:50 AM To: Rainer Mager Cc: Jakob Ostergaard; [EMAIL PROTECTED] Subject: RE: Raid5 with two failed disks? Whether or not the array is in sync should not make a difference to the boot process. I have both raid1 and raid 5 systems that run root raid and will boot quite nicely and rsync automatically after a "dumb" shutdown that leaves them out of sync. Do you have your kernel built for auto raid start?? and partitions marked "fd" ? You can reconstruct you existing array by booting with a kernel that supports raid and with the raid tools on the rescue system. Do it all the time. Michael
RE: Raid5 with two failed disks?
Hmm, well, I'm certainly not positive why it wouldn't boot and I don't have the logs in front of me, but I do remember it saying that it couldn't mount /dev/md1 and therefore had a panic during boot. My solution was to specify the root device as /dev/sda1 instead of the configured /dev/md1 from the lilo prompt. Hmm the only time I've seen this message has been when using initrd with an out of sync /dev/md or when the raidtab in the initrd was bad or missing. This was without autostart. Michael The disk is marked to auto raid start and marked as fd. And, it booted just fine until the "dumb" shutdown. As for a rescue disk I'll put one together. Thanks for the advice. --Rainer -Original Message- From: Michael Robinton [mailto:[EMAIL PROTECTED]] Sent: Monday, April 03, 2000 8:50 AM To: Rainer Mager Cc: Jakob Ostergaard; [EMAIL PROTECTED] Subject: RE: Raid5 with two failed disks? Whether or not the array is in sync should not make a difference to the boot process. I have both raid1 and raid 5 systems that run root raid and will boot quite nicely and rsync automatically after a "dumb" shutdown that leaves them out of sync. Do you have your kernel built for auto raid start?? and partitions marked "fd" ? You can reconstruct you existing array by booting with a kernel that supports raid and with the raid tools on the rescue system. Do it all the time. Michael
Re: Raid5 with two failed disks?
On Fri, 31 Mar 2000, Marc Haber wrote: On Thu, 30 Mar 2000 09:20:57 +0200, you wrote: At 02:16 30.03.00, you wrote: Hi... I have a Raid5 Array, using 4 IDE HDs. A few days ago, the system hung, no reaction, except ping from the host, nothing to see on the monitor. I rebooted the system and it told me, 2 out of 4 disks were out of sync. 2 Disks have an event counter of 0062, the two others 0064. I hope, that there is a way to fix this. I searched through the mailing-list and found one thread, but it did not help me. Yes I do. Check Jakobs Raid howto, section "recovering from multiple failures". You can recreate the superblocks of the raid disks using mkraid; I had that problem a week ago and chickened out after mkraid told me it would destroy my array. If, in this situation, destruction doesn't happen, this should be mentioned in the docs. It _is_ in the docs. But the message from the mkraid tool is still sane, because it actually _will_ destroy your data *if you do not know what you are doing*. So, for the average Joe-user just playing with his tools as root (*ouch!*), this message is a life saver. For people who actually need to re-write the superblocks for good reasons, well they have read the docs so they know the message doesn't apply to them - if they don't make mistakes. mkraid'ing an existing array is inherently dangerous if you're not careful and know what you're doing. It's perfectly safe otherwise. Having the tool tell the user that ``here be dragons'' is perfectly sensible IMHO. -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...:
Re: Raid5 with two failed disks?
On Thu, 30 Mar 2000 10:17:06 -0500, you wrote: On Thu, Mar 30, 2000 at 08:36:52AM -0600, Bill Carlson wrote: I've been thinking about this for a different project, how bad would it be to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this handled under a different class of RAID (ignoring things like RAID 5 over mirrored disks and such). You just can't do that with RAID5. I seem to remember that there's a RAID 6 or 7 that handles 2 disk failures (multiple parity devices or something like that.) You can optionally do RAID 5+1 where you mirror partitions and then stripe across them ala RAID 0+1. You'd have to lose 4 disks minimally before the array goes offline. How about a RAID 5 with a single spare disk? You are dead if two disks fail within the time it takes to resync, though. If you have n spare disks, you can survive n+1 disks failing, provided they don't fail at once. Greetings Marc -- -- !! No courtesy copies, please !! - Marc Haber | " Questions are the | Mailadresse im Header Karlsruhe, Germany | Beginning of Wisdom " | Fon: *49 721 966 32 15 Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29
Re: Raid5 with two failed disks?
On Thu, 30 Mar 2000 09:20:57 +0200, you wrote: At 02:16 30.03.00, you wrote: Hi... I have a Raid5 Array, using 4 IDE HDs. A few days ago, the system hung, no reaction, except ping from the host, nothing to see on the monitor. I rebooted the system and it told me, 2 out of 4 disks were out of sync. 2 Disks have an event counter of 0062, the two others 0064. I hope, that there is a way to fix this. I searched through the mailing-list and found one thread, but it did not help me. Yes I do. Check Jakobs Raid howto, section "recovering from multiple failures". You can recreate the superblocks of the raid disks using mkraid; I had that problem a week ago and chickened out after mkraid told me it would destroy my array. If, in this situation, destruction doesn't happen, this should be mentioned in the docs. Greetings Marc -- -- !! No courtesy copies, please !! - Marc Haber | " Questions are the | Mailadresse im Header Karlsruhe, Germany | Beginning of Wisdom " | Fon: *49 721 966 32 15 Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29
Re: Raid5 with two failed disks?
On Thu, 30 Mar 2000, Martin Bene wrote: At 02:16 30.03.00, you wrote: Hi... I have a Raid5 Array, using 4 IDE HDs. A few days ago, the system hung, no reaction, except ping from the host, nothing to see on the monitor. I rebooted the system and it told me, 2 out of 4 disks were out of sync. 2 Disks have an event counter of 0062, the two others 0064. I hope, that there is a way to fix this. I searched through the mailing-list and found one thread, but it did not help me. Yes I do. Check Jakobs Raid howto, section "recovering from multiple failures". You can recreate the superblocks of the raid disks using mkraid; if you explicitly mark one disk as failed in the raidtab, no automatic resync is started, so you get to check if all works and perhaps change something and retry. Hey all, I've been thinking about this for a different project, how bad would it be to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this handled under a different class of RAID (ignoring things like RAID 5 over mirrored disks and such). Three words: Net block device Bill Carlson Systems Programmer[EMAIL PROTECTED]| Opinions are mine, Virtual Hospital http://www.vh.org/| not my employer's. University of Iowa Hospitals and Clinics|
Re: Raid5 with two failed disks?
On Thu, Mar 30, 2000 at 08:36:52AM -0600, Bill Carlson wrote: I've been thinking about this for a different project, how bad would it be to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this handled under a different class of RAID (ignoring things like RAID 5 over mirrored disks and such). You just can't do that with RAID5. I seem to remember that there's a RAID 6 or 7 that handles 2 disk failures (multiple parity devices or something like that.) You can optionally do RAID 5+1 where you mirror partitions and then stripe across them ala RAID 0+1. You'd have to lose 4 disks minimally before the array goes offline. -- Randomly Generated Tagline: "There are more ways to reduce friction in metals then there were release dates for Windows 95."- Quantum on TLC
Re: Raid5 with two failed disks?
Thanks to all, it worked!
Re: Raid5 with two failed disks?
Hi Bill, Thursday, March 30, 2000, 4:36:52 PM, you wrote: I've been thinking about this for a different project, how bad would it be to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this handled under a different class of RAID (ignoring things like RAID 5 over mirrored disks and such). Raid 6 is exactly what you are looking for. Raid 5 with double parity info. You lose 2 disks of N. http://www.raid5.com/raid6.html Or you may just take Raid 7 http://www.raid5.com/raid7.html ... Sounds great. :-) Sven
Re: Raid5 with two failed disks?
On Thu, 30 Mar 2000, Theo Van Dinter wrote: On Thu, Mar 30, 2000 at 08:36:52AM -0600, Bill Carlson wrote: I've been thinking about this for a different project, how bad would it be to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this handled under a different class of RAID (ignoring things like RAID 5 over mirrored disks and such). You just can't do that with RAID5. I seem to remember that there's a RAID 6 or 7 that handles 2 disk failures (multiple parity devices or something like that.) You can optionally do RAID 5+1 where you mirror partitions and then stripe across them ala RAID 0+1. You'd have to lose 4 disks minimally before the array goes offline. 1+5 would still fail on 2 drives if those 2 drives where both from the same RAID 1 set. The wasted space becomes more than N/2, but it might worth it for the HA aspect. RAID 6 looks cleaner, but that would require someone to write an implementation, whereas you could do RAID 15 (51?) now. My thought here is leading to a distributed file system that is server independent, it seems something like that would solve a lot of problems that things like NFS and Coda don't handle. From what I've read GFS is supposed to do this, never hurts to attack a thing from a couple of directions. Use the net block device, RAID 15 and go. Very tempting...:) Bill Carlson Systems Programmer[EMAIL PROTECTED]| Opinions are mine, Virtual Hospital http://www.vh.org/| not my employer's. University of Iowa Hospitals and Clinics|
Re: Raid5 with two failed disks?
On Thu, 30 Mar 2000, Theo Van Dinter wrote: On Thu, Mar 30, 2000 at 02:21:45PM -0600, Bill Carlson wrote: 1+5 would still fail on 2 drives if those 2 drives where both from the same RAID 1 set. The wasted space becomes more than N/2, but it might worth it for the HA aspect. RAID 6 looks cleaner, but that would require someone to write an implementation, whereas you could do RAID 15 (51?) now. 2 drives failing in either RAID 1+5 or 5+1 results in a still available array: Doh, you're right. Thanks for drawing me a picture...:) Bill Carlson Systems Programmer[EMAIL PROTECTED]| Opinions are mine, Virtual Hospital http://www.vh.org/| not my employer's. University of Iowa Hospitals and Clinics|
failed disks
Hi, I'm doing a series of bonnie tests along with a fair amount of file md5summing to determine speed and reliability of a raid5 configuration. I have 5 drives on a TekRam 390U2W adapter. 3 of the drives are the same seagate barracuda 9.1 gig drive. The other two are the 18 gig barracuda's. Two of the nine gigs fail - consistently - when I run bonnie tests on them. One will get flagged as bad in one run and die out. This one I can confirm is bad b/c it fails on its own outside of the raid array (it fails to be detected by linux at all - no partitions are found and it can't be started) - the other passes a badblocks -w test and appears to work. However it ALWAYS fails when its a part of the array and a bonnie test is run. Does this sound like a hardware fault? If so why is it only occurring when raid is used? thanks -sv
Re: failed disks
On Tue, 21 Mar 2000, Seth Vidal wrote: Hi, I'm doing a series of bonnie tests along with a fair amount of file md5summing to determine speed and reliability of a raid5 configuration. I have 5 drives on a TekRam 390U2W adapter. 3 of the drives are the same seagate barracuda 9.1 gig drive. The other two are the 18 gig barracuda's. Two of the nine gigs fail - consistently - when I run bonnie tests on them. One will get flagged as bad in one run and die out. This one I can confirm is bad b/c it fails on its own outside of the raid array (it fails to be detected by linux at all - no partitions are found and it can't be started) - the other passes a badblocks -w test and appears to work. However it ALWAYS fails when its a part of the array and a bonnie test is run. Does this sound like a hardware fault? If so why is it only occurring when raid is used? You can most likely trigger it too if you run non-RAID I/O on all the disks simultaneously. It sounds like you have a SCSI bus problem, bad cabling / termination etc. -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...: