Hello...

I've been working with the RAID code on two disk failures.  You may have
noticed my earlier patch that cleans things up a bit when this happens... at
least now you can umount the disk and reboot.

A couple of issues are in my head about how raid5 handles this, and I
thought I'd solicit some feedback before I patched a couple of other things.

First:  Attempting to access beyond the end of the RAID array will result in
disks being kicked out of the array.  I noticed this when I had some file
system corruption, and others have reported this relating to a (non raid)
buffer being trashed recently.  As it stands now, raid5 tries to read the
full stripe that is beyond the capacity of all the drives, gets an error
back and errors out all or many of the drives.  I'm planning on adding a
check to see if we're asking for more than we have, and if so just return an
IO error, but not actually kick the disks.  Is there any reason that the
disks should be kicked?

Second: When a second disk fails in a raid5 system, you can't use it any
more.  With that in mind, I wonder how useful it is to have the failure on
the second disk trigger a re-writing of the superblocks.  I guess my thought
would be that the fault could have been caused by a temporal problem, or a
single bad cluster on the disk.  In either case, it would be nice for the
raid array to continue to try to access this second failed disk...  If it's
really dead, you can just pass the errors back.. and if it's only partially
dead... you still have a partially working system, whereas you would have
been really fucked before.  Also, this could allow temporal (like power
problems) to come up clean on a reboot without operator involvement, whereas
now you have to re mkraid, which is a daunting task to to uninitiated.  Any
thoughts?

Third: I've noticed that a raid5 array, when powered-off without a sync...
almost always has one disk that isn't up to date, and has an events counter
that is down compared to the rest of the array.  So it'll start a rebuild.
That's fine, but what I can't figure out is why...  if you have 10 disks,
and 9 get the updated events counter.. why doesn't the 10th one?  Is this
intentional, to force a rebuild?  It would seem to me that the best deal
would be to keep all of the array written up as in-sync... unless the
failure happens half-way through the writes, in which case obviously things
will be out of sync.  Anyone agree with this?

Thanks!

Tom

Reply via email to