Re: Raid5 with two failed disks?

2000-04-09 Thread Michael T. Babcock

Its a nice complicated case of semaphores in threaded (multi process?) systems ...

... one system needs to be aware that the other system isn't ready yet, without
causing incompatibilities.  With RAID, would it be possible for the MD driver to
actually accept the mount request but halt the process until the driver was ready to
actually give data?

Jakob Østergaard wrote:

I think my situation is the same as this "two failed disks" one but I
  haven't been following the thread carefully and I just want to double check.
 
I have a mirrored RAID-1 setup between 2 disks with no spare disks.
  Inadvertantly the machine got powered down without a proper shutdown
  apparently causing the RAID to become unhappy. It would boot to the point
  where it needed to mount root and then would fail saying that it couldn't
  access /dev/md1 because the two RAID disks were out of sync.
Anyway, given this situation, how can I rebuild my array? Is all it takes
  is doing another mkraid (given the raidtab is identical to the real setup,
  etc)? If so, since I'm also booting off of raid, how do I do this for the
  boot partition? I can boot up using one of the individual disks (e.g.
  /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be
  able to do a mkraid on an in-use partition? If not, how do I resolve this
  (boot from floppy?).
Finally, is there any way to automate this recovery process. That is, if
  the machine is improperly powered down again, can I have it automatically
  rebuild itself the next time it comes up?

 As others already pointed out, this doesn't make sense.  The boot sequence uses
 the mount command to mount your fs, and mount doesn't know that your md device
 is in any way different from other block devices.

 Only if the md device doesn't start, the mount program will be unable to
 request the kernel to mount the device.

 We definitely need log output in order to tell what happened and why.

 --
 
 : [EMAIL PROTECTED]  : And I see the elder races, :
 :.: putrid forms of man:
 :   Jakob Østergaard  : See him rise and claim the earth,  :
 :OZ9ABN   : his downfall is at hand.   :
 :.:{Konkhra}...:

--
   _/~-=##=-~\_
   -=+0+=- Michael T. Babcock -=+0+=-
   ~\_-=##=-_/~
http://www.linuxsupportline.com/~pgp/ ICQ: 4835018






Re: Raid5 with two failed disks?

2000-04-04 Thread Jakob Østergaard

On Mon, 03 Apr 2000, Rainer Mager wrote:

 Hi all,
 
   I think my situation is the same as this "two failed disks" one but I
 haven't been following the thread carefully and I just want to double check.
 
   I have a mirrored RAID-1 setup between 2 disks with no spare disks.
 Inadvertantly the machine got powered down without a proper shutdown
 apparently causing the RAID to become unhappy. It would boot to the point
 where it needed to mount root and then would fail saying that it couldn't
 access /dev/md1 because the two RAID disks were out of sync.
   Anyway, given this situation, how can I rebuild my array? Is all it takes
 is doing another mkraid (given the raidtab is identical to the real setup,
 etc)? If so, since I'm also booting off of raid, how do I do this for the
 boot partition? I can boot up using one of the individual disks (e.g.
 /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be
 able to do a mkraid on an in-use partition? If not, how do I resolve this
 (boot from floppy?).
   Finally, is there any way to automate this recovery process. That is, if
 the machine is improperly powered down again, can I have it automatically
 rebuild itself the next time it comes up?

As others already pointed out, this doesn't make sense.  The boot sequence uses
the mount command to mount your fs, and mount doesn't know that your md device
is in any way different from other block devices.

Only if the md device doesn't start, the mount program will be unable to
request the kernel to mount the device.

We definitely need log output in order to tell what happened and why.

-- 

: [EMAIL PROTECTED]  : And I see the elder races, :
:.: putrid forms of man:
:   Jakob Østergaard  : See him rise and claim the earth,  :
:OZ9ABN   : his downfall is at hand.   :
:.:{Konkhra}...:



Re: Raid5 with two failed disks?

2000-04-02 Thread Jakob Østergaard

On Sun, 02 Apr 2000, Marc Haber wrote:

 On Sat, 1 Apr 2000 12:44:49 +0200, you wrote:
 It _is_ in the docs.
 
 Which docs do you refer to? I must have missed this.

Section 6.1 in http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/

Didn't you actually mention it yourself ?   :)
(don't remember - someone mentioned it at least...)

-- 

: [EMAIL PROTECTED]  : And I see the elder races, :
:.: putrid forms of man:
:   Jakob Østergaard  : See him rise and claim the earth,  :
:OZ9ABN   : his downfall is at hand.   :
:.:{Konkhra}...:



Re: Raid5 with two failed disks?

2000-04-02 Thread Marc Haber

On Sun, 2 Apr 2000 15:28:28 +0200, you wrote:
On Sun, 02 Apr 2000, Marc Haber wrote:
 On Sat, 1 Apr 2000 12:44:49 +0200, you wrote:
 It _is_ in the docs.
 
 Which docs do you refer to? I must have missed this.

Section 6.1 in http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/

Didn't you actually mention it yourself ?   :)

Yes, I did. However, I'd add a sentence mentioning that in this case
mkraid probably won't be destructive to the HOWTO. After the mkraid
warning, I aborted the procedure and started asking. I think this
should be avoided in the future.

Greetings
Marc

-- 
-- !! No courtesy copies, please !! -
Marc Haber  |   " Questions are the | Mailadresse im Header
Karlsruhe, Germany  | Beginning of Wisdom " | Fon: *49 721 966 32 15
Nordisch by Nature  | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29



Re: Raid5 with two failed disks?

2000-04-02 Thread Jakob Østergaard

On Sun, 02 Apr 2000, Marc Haber wrote:

[snip]
 Yes, I did. However, I'd add a sentence mentioning that in this case
 mkraid probably won't be destructive to the HOWTO. After the mkraid
 warning, I aborted the procedure and started asking. I think this
 should be avoided in the future.

I have added this to my FIX file for the next revision of the HOWTO.

Thanks,
-- 

: [EMAIL PROTECTED]  : And I see the elder races, :
:.: putrid forms of man:
:   Jakob Østergaard  : See him rise and claim the earth,  :
:OZ9ABN   : his downfall is at hand.   :
:.:{Konkhra}...:



RE: Raid5 with two failed disks?

2000-04-02 Thread Rainer Mager

Hi all,

I think my situation is the same as this "two failed disks" one but I
haven't been following the thread carefully and I just want to double check.

I have a mirrored RAID-1 setup between 2 disks with no spare disks.
Inadvertantly the machine got powered down without a proper shutdown
apparently causing the RAID to become unhappy. It would boot to the point
where it needed to mount root and then would fail saying that it couldn't
access /dev/md1 because the two RAID disks were out of sync.
Anyway, given this situation, how can I rebuild my array? Is all it takes
is doing another mkraid (given the raidtab is identical to the real setup,
etc)? If so, since I'm also booting off of raid, how do I do this for the
boot partition? I can boot up using one of the individual disks (e.g.
/dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be
able to do a mkraid on an in-use partition? If not, how do I resolve this
(boot from floppy?).
Finally, is there any way to automate this recovery process. That is, if
the machine is improperly powered down again, can I have it automatically
rebuild itself the next time it comes up?

Thanks in advance,

--Rainer




RE: Raid5 with two failed disks?

2000-04-02 Thread Michael Robinton

On Mon, 3 Apr 2000, Rainer Mager wrote:

   I think my situation is the same as this "two failed disks" one but I
 haven't been following the thread carefully and I just want to double check.
 
   I have a mirrored RAID-1 setup between 2 disks with no spare disks.
 Inadvertantly the machine got powered down without a proper shutdown
 apparently causing the RAID to become unhappy. It would boot to the point
 where it needed to mount root and then would fail saying that it couldn't
 access /dev/md1 because the two RAID disks were out of sync.
   Anyway, given this situation, how can I rebuild my array? Is all it takes
 is doing another mkraid (given the raidtab is identical to the real setup,
 etc)? If so, since I'm also booting off of raid, how do I do this for the
 boot partition? I can boot up using one of the individual disks (e.g.
 /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be
 able to do a mkraid on an in-use partition? If not, how do I resolve this
 (boot from floppy?).
   Finally, is there any way to automate this recovery process. That is, if
 the machine is improperly powered down again, can I have it automatically
 rebuild itself the next time it comes up?
 
Whether or not the array is in sync should not make a difference to the 
boot process. I have both raid1 and raid 5 systems that run root raid and 
will boot quite nicely and rsync automatically after a "dumb" shutdown 
that leaves them out of sync.

Do you have your kernel built for auto raid start?? and partitions marked 
"fd" ?

You can reconstruct you existing array by booting with a kernel that 
supports raid and with the raid tools on the rescue system. Do it all the 
time.

Michael



RE: Raid5 with two failed disks?

2000-04-02 Thread Rainer Mager

Hmm, well, I'm certainly not positive why it wouldn't boot and I don't have
the logs in front of me, but I do remember it saying that it couldn't mount
/dev/md1 and therefore had a panic during boot. My solution was to specify
the root device as /dev/sda1 instead of the configured /dev/md1 from the
lilo prompt.

The disk is marked to auto raid start and marked as fd. And, it booted just
fine until the "dumb" shutdown.

As for a rescue disk I'll put one together. Thanks for the advice.

--Rainer


 -Original Message-
 From: Michael Robinton [mailto:[EMAIL PROTECTED]]
 Sent: Monday, April 03, 2000 8:50 AM
 To: Rainer Mager
 Cc: Jakob Ostergaard; [EMAIL PROTECTED]
 Subject: RE: Raid5 with two failed disks?

 Whether or not the array is in sync should not make a difference to the
 boot process. I have both raid1 and raid 5 systems that run root raid and
 will boot quite nicely and rsync automatically after a "dumb" shutdown
 that leaves them out of sync.

 Do you have your kernel built for auto raid start?? and partitions marked
 "fd" ?

 You can reconstruct you existing array by booting with a kernel that
 supports raid and with the raid tools on the rescue system. Do it all the
 time.

 Michael





RE: Raid5 with two failed disks?

2000-04-02 Thread Michael Robinton

 Hmm, well, I'm certainly not positive why it wouldn't boot and I don't have
 the logs in front of me, but I do remember it saying that it couldn't mount
 /dev/md1 and therefore had a panic during boot. My solution was to specify
 the root device as /dev/sda1 instead of the configured /dev/md1 from the
 lilo prompt.

Hmm the only time I've seen this message has been when using initrd 
with an out of sync /dev/md or when the raidtab in the initrd was bad or 
missing. This was without autostart.

Michael


 
 The disk is marked to auto raid start and marked as fd. And, it booted just
 fine until the "dumb" shutdown.
 
 As for a rescue disk I'll put one together. Thanks for the advice.
 
 --Rainer
 
 
  -Original Message-
  From: Michael Robinton [mailto:[EMAIL PROTECTED]]
  Sent: Monday, April 03, 2000 8:50 AM
  To: Rainer Mager
  Cc: Jakob Ostergaard; [EMAIL PROTECTED]
  Subject: RE: Raid5 with two failed disks?
 
  Whether or not the array is in sync should not make a difference to the
  boot process. I have both raid1 and raid 5 systems that run root raid and
  will boot quite nicely and rsync automatically after a "dumb" shutdown
  that leaves them out of sync.
 
  Do you have your kernel built for auto raid start?? and partitions marked
  "fd" ?
 
  You can reconstruct you existing array by booting with a kernel that
  supports raid and with the raid tools on the rescue system. Do it all the
  time.
 
  Michael
 
 



Re: Raid5 with two failed disks?

2000-04-01 Thread Jakob Østergaard

On Fri, 31 Mar 2000, Marc Haber wrote:

 On Thu, 30 Mar 2000 09:20:57 +0200, you wrote:
 At 02:16 30.03.00, you wrote:
 Hi... I have a Raid5 Array, using 4 IDE HDs. A few days ago, the system
 hung, no reaction, except ping from the host, nothing to see on the
 monitor. I rebooted the system and it told me, 2 out of 4 disks were out
 of sync. 2 Disks have an event counter of 0062, the two others
 0064. I hope, that there is a way to fix this. I searched through the
 mailing-list and found one thread, but it did not help me.
 
 Yes I do. Check Jakobs Raid howto, section "recovering from multiple failures".
 
 You can recreate the superblocks of the raid disks using mkraid;
 
 I had that problem a week ago and chickened out after mkraid told me
 it would destroy my array. If, in this situation, destruction doesn't
 happen, this should be mentioned in the docs.

It _is_ in the docs.  But the message from the mkraid tool is still sane,
because it actually _will_ destroy your data *if you do not know what you are
doing*.   So, for the average Joe-user just playing with his tools as root
(*ouch!*), this message is a life saver.  For people who actually need to
re-write the superblocks for good reasons, well they have read the docs so they
know the message doesn't apply to them - if they don't make mistakes.

mkraid'ing an existing array is inherently dangerous if you're not careful and
know what you're doing.  It's perfectly safe otherwise.  Having the tool tell
the user that ``here be dragons'' is perfectly sensible IMHO.

-- 

: [EMAIL PROTECTED]  : And I see the elder races, :
:.: putrid forms of man:
:   Jakob Østergaard  : See him rise and claim the earth,  :
:OZ9ABN   : his downfall is at hand.   :
:.:{Konkhra}...:



Re: Raid5 with two failed disks?

2000-03-31 Thread Marc Haber

On Thu, 30 Mar 2000 10:17:06 -0500, you wrote:
On Thu, Mar 30, 2000 at 08:36:52AM -0600, Bill Carlson wrote:
 I've been thinking about this for a different project, how bad would it be
 to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this
 handled under a different class of RAID (ignoring things like RAID 5 over
 mirrored disks and such).

You just can't do that with RAID5.  I seem to remember that there's a RAID 6
or 7 that handles 2 disk failures (multiple parity devices or something like
that.)

You can optionally do RAID 5+1 where you mirror partitions and then stripe
across them ala RAID 0+1.  You'd have to lose 4 disks minimally before the
array goes offline.

How about a RAID 5 with a single spare disk? You are dead if two disks
fail within the time it takes to resync, though. If you have n spare
disks, you can survive n+1 disks failing, provided they don't fail at
once.

Greetings
Marc

-- 
-- !! No courtesy copies, please !! -
Marc Haber  |   " Questions are the | Mailadresse im Header
Karlsruhe, Germany  | Beginning of Wisdom " | Fon: *49 721 966 32 15
Nordisch by Nature  | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29



Re: Raid5 with two failed disks?

2000-03-31 Thread Marc Haber

On Thu, 30 Mar 2000 09:20:57 +0200, you wrote:
At 02:16 30.03.00, you wrote:
Hi... I have a Raid5 Array, using 4 IDE HDs. A few days ago, the system
hung, no reaction, except ping from the host, nothing to see on the
monitor. I rebooted the system and it told me, 2 out of 4 disks were out
of sync. 2 Disks have an event counter of 0062, the two others
0064. I hope, that there is a way to fix this. I searched through the
mailing-list and found one thread, but it did not help me.

Yes I do. Check Jakobs Raid howto, section "recovering from multiple failures".

You can recreate the superblocks of the raid disks using mkraid;

I had that problem a week ago and chickened out after mkraid told me
it would destroy my array. If, in this situation, destruction doesn't
happen, this should be mentioned in the docs.

Greetings
Marc

-- 
-- !! No courtesy copies, please !! -
Marc Haber  |   " Questions are the | Mailadresse im Header
Karlsruhe, Germany  | Beginning of Wisdom " | Fon: *49 721 966 32 15
Nordisch by Nature  | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29



Re: Raid5 with two failed disks?

2000-03-30 Thread Bill Carlson

On Thu, 30 Mar 2000, Martin Bene wrote:

 At 02:16 30.03.00, you wrote:
 Hi... I have a Raid5 Array, using 4 IDE HDs. A few days ago, the system
 hung, no reaction, except ping from the host, nothing to see on the
 monitor. I rebooted the system and it told me, 2 out of 4 disks were out
 of sync. 2 Disks have an event counter of 0062, the two others
 0064. I hope, that there is a way to fix this. I searched through the
 mailing-list and found one thread, but it did not help me.
 
 Yes I do. Check Jakobs Raid howto, section "recovering from multiple failures".
 
 You can recreate the superblocks of the raid disks using mkraid; if you 
 explicitly mark one disk as failed in the raidtab, no automatic resync is 
 started, so you get to check if all works and perhaps change something and 
 retry.


Hey all,

I've been thinking about this for a different project, how bad would it be
to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this
handled under a different class of RAID (ignoring things like RAID 5 over
mirrored disks and such).

Three words: Net block device 

Bill Carlson

Systems Programmer[EMAIL PROTECTED]|  Opinions are mine,
Virtual Hospital  http://www.vh.org/|  not my employer's.
University of Iowa Hospitals and Clinics|





Re: Raid5 with two failed disks?

2000-03-30 Thread Theo Van Dinter

On Thu, Mar 30, 2000 at 08:36:52AM -0600, Bill Carlson wrote:
 I've been thinking about this for a different project, how bad would it be
 to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this
 handled under a different class of RAID (ignoring things like RAID 5 over
 mirrored disks and such).

You just can't do that with RAID5.  I seem to remember that there's a RAID 6
or 7 that handles 2 disk failures (multiple parity devices or something like
that.)

You can optionally do RAID 5+1 where you mirror partitions and then stripe
across them ala RAID 0+1.  You'd have to lose 4 disks minimally before the
array goes offline.

-- 
Randomly Generated Tagline:
"There are more ways to reduce friction in metals then there were
 release dates for Windows 95."- Quantum on TLC



Re: Raid5 with two failed disks?

2000-03-30 Thread Tmm

Thanks to all, it worked!





Re: Raid5 with two failed disks?

2000-03-30 Thread Sven Kirmess

 Hi Bill,

Thursday, March 30, 2000, 4:36:52 PM, you wrote:

 I've been thinking about this for a different project, how bad would
 it be to setup RAID 5 to allow for 2 (or more) failures in an array?
 Or is this handled under a different class of RAID (ignoring things
 like RAID 5 over mirrored disks and such).

Raid 6 is exactly what you are looking for. Raid 5 with double parity
info. You lose 2 disks of N.

http://www.raid5.com/raid6.html


Or you may just take Raid 7 http://www.raid5.com/raid7.html ... Sounds
great. :-)



 Sven





Re: Raid5 with two failed disks?

2000-03-30 Thread Bill Carlson

On Thu, 30 Mar 2000, Theo Van Dinter wrote:

 On Thu, Mar 30, 2000 at 08:36:52AM -0600, Bill Carlson wrote:
  I've been thinking about this for a different project, how bad would it be
  to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this
  handled under a different class of RAID (ignoring things like RAID 5 over
  mirrored disks and such).
 
 You just can't do that with RAID5.  I seem to remember that there's a RAID 6
 or 7 that handles 2 disk failures (multiple parity devices or something like
 that.)
 
 You can optionally do RAID 5+1 where you mirror partitions and then stripe
 across them ala RAID 0+1.  You'd have to lose 4 disks minimally before the
 array goes offline.

1+5 would still fail on 2 drives if those 2 drives where both from the 
same RAID 1 set. The wasted space becomes more than N/2, but it might
worth it for the HA aspect. RAID 6 looks cleaner, but that would require
someone to write an implementation, whereas you could do RAID 15 (51?)
now. 

My thought here is leading to a distributed file system that is server
independent, it seems something like that would solve a lot of problems
that things like NFS and Coda don't handle. From what I've read GFS is
supposed to do this, never hurts to attack a thing from a couple of
directions.

Use the net block device, RAID 15 and go. Very tempting...:)

Bill Carlson

Systems Programmer[EMAIL PROTECTED]|  Opinions are mine,
Virtual Hospital  http://www.vh.org/|  not my employer's.
University of Iowa Hospitals and Clinics|





Re: Raid5 with two failed disks?

2000-03-30 Thread Bill Carlson

On Thu, 30 Mar 2000, Theo Van Dinter wrote:

 On Thu, Mar 30, 2000 at 02:21:45PM -0600, Bill Carlson wrote:
  1+5 would still fail on 2 drives if those 2 drives where both from the 
  same RAID 1 set. The wasted space becomes more than N/2, but it might
  worth it for the HA aspect. RAID 6 looks cleaner, but that would require
  someone to write an implementation, whereas you could do RAID 15 (51?)
  now. 
 
 2 drives failing in either RAID 1+5 or 5+1 results in a still available
 array:

Doh, you're right. Thanks for drawing me a picture...:)

Bill Carlson

Systems Programmer[EMAIL PROTECTED]|  Opinions are mine,
Virtual Hospital  http://www.vh.org/|  not my employer's.
University of Iowa Hospitals and Clinics|





failed disks

2000-03-21 Thread Seth Vidal

Hi,
 I'm doing a series of bonnie tests along with a fair amount of file
md5summing to determine speed and reliability of a raid5 configuration.
I have 5 drives on a TekRam 390U2W adapter. 3 of the drives are the same
seagate barracuda 9.1 gig drive. The other two are the 18 gig barracuda's.

Two of the nine gigs fail - consistently - when I run bonnie tests on
them. One will get flagged as bad in one run and die out. This one I can
confirm is bad b/c it fails on its own outside of the raid array (it
fails to be detected by linux at all - no partitions are found and it
can't be started) - the other passes a badblocks -w test and appears to
work. However it ALWAYS fails when its a part of the array and a bonnie
test is run.

Does this sound like a hardware fault? If so why is it only occurring when
raid is used?

thanks
-sv





Re: failed disks

2000-03-21 Thread Jakob Østergaard

On Tue, 21 Mar 2000, Seth Vidal wrote:

 Hi,
  I'm doing a series of bonnie tests along with a fair amount of file
 md5summing to determine speed and reliability of a raid5 configuration.
 I have 5 drives on a TekRam 390U2W adapter. 3 of the drives are the same
 seagate barracuda 9.1 gig drive. The other two are the 18 gig barracuda's.
 
 Two of the nine gigs fail - consistently - when I run bonnie tests on
 them. One will get flagged as bad in one run and die out. This one I can
 confirm is bad b/c it fails on its own outside of the raid array (it
 fails to be detected by linux at all - no partitions are found and it
 can't be started) - the other passes a badblocks -w test and appears to
 work. However it ALWAYS fails when its a part of the array and a bonnie
 test is run.
 
 Does this sound like a hardware fault? If so why is it only occurring when
 raid is used?

You can most likely trigger it too if you run non-RAID I/O on all the disks
simultaneously.

It sounds like you have a SCSI bus problem, bad cabling / termination etc.

-- 

: [EMAIL PROTECTED]  : And I see the elder races, :
:.: putrid forms of man:
:   Jakob Østergaard  : See him rise and claim the earth,  :
:OZ9ABN   : his downfall is at hand.   :
:.:{Konkhra}...: