Re: Major bug in raid driver

Hardware Stuff Thu, 10 Dec 1998 13:08:17 -0500
   From: "David Harris" <[EMAIL PROTECTED]>

   ...
   What I need to do is restore a superblock that will force it to view sdb
   (the second scsi disk when both are in) as the up-to-date version, and have
   it sync sda. I'll look into using "mkraid --force-resync" or mucking with
   the superblock myself.

   In the future, I think I'm going to keep superblock backups. Three for each
   device: both in sync, a is correct and b is out of date, and b is correct
   and a is out of date. This way I can restore the superblocks and force a
   sync whichever way I want.

enclosed is a copy of mingo's comments for a similar situation.

> While on the subject of RAID0, I also wondered about a single
> superblock for the autostart information.  Since I've experienced
> superblock corruption on a Solaris system, and recovered from it by
> utilizing one of the superblock copies, could this be a single point
> failure location?  Can the RAID code stash a superblock copy
> elsewhere, or is it easily recreated from the /etc/raid0.conf file?

the RAID architecture deals with this idssue and has several mechanizms to
protect against superblock lossage:

        - every disk in the array has it's own superblock. The whole RAID
        state can be restored from a single superblock.

        - there is a per-superblock 32-bit checksum to detect corruption
        or partial writes. (the superblock is multiple sectors, so it
        might happen that an update gets executed only partially) 

        - there is an 'last update timestamp'. If a disk is offline
        temporarily and the array is reconfigured, then the timestamp
        helps us to not use the 'old' superblock.

        [ - if you maintain /etc/raidtab properly (so that it really
        represents the array) it might be used to recreate the array _if
        there is no failed disk in the array_, without data loss.]

(actually, an invalid checksum doesnt mean we do not consider the
superblock, it just means that we consider it to be a 'very old'
superblock. So if everything else fails we try even such a superblock)

and as a generic rule, we never write superblocks back if an array has a
fatal failure. (this policy makes no difference to a completely failed
array, because a lost array is a lost array. But it might save us if there
is some other hardware component that has failed, after system restart we
might have a working array) 

if there is anything else that should be added as a protection mechanizm,
let me know. we can never be paranoid enough about RAID arrays ;)

-- mingo
Re: Major bug in raid driver

Reply via email to