"Brothers, John" <[EMAIL PROTECTED]> writes:
> Boot Sector Errors:
> if the power glitches, for example, and the CPU starts to
> write garbage to the drives, how well does
> the system recover - I have seen some comments on the
> root-raid HOWTO, but nothing has clarified
Could you please stick to shorter lines?
Anyway, no RAID is a protection against a CPU gone mad, or against a
power glitch. RAID protects you from a drive crash. To protect against
power glitches, you need to use a good UPS. To protect against a CPU
gone mad, well, you need a redundant cluster system where every CPU
watches the others, and IBM AS/400 might be a good thing to take a
look at.
If you're referring to boot possibilities when the first drive goes
down, it is somewhat of an unsolved issue with software RAID, since
LILO knows nothing about RAID configurations, and the BIOS even less.
> Hot-Swap and automatic rebuild
> The last document I read talked about Hot-Swap being
> pre-alpha - but that document seemed to be
> out of date - has anything changed there?
Hot-swap would probably work if your SCSI controller could handle it
(the aic7xxx driver doesn't much like it when a drive drops dead in
the middle of a transaction).
> I have two empty 5 1/4" bays in my system - if I get the proper RAID
> SCSI controller, and hook it into two hard drives in removable chassis
> in those bays, is that a cheap hot-swap system - or do you need
> three drives? (I'm assuming RAID level 5 is the right level)
You can't do RAID-5 with two drives. Well, you can, theoretically, but
it's really just a rather weird RAID-1 setup then. In any case, with a
hardware RAID controller, any drives hooked up to it look like one
drive to the system, and the controller itself deals with hotswap
detection etc. That's the solution you want to go with if you want to
be able to boot and operate even with a drive going down.
And unless you're prepared to install a new drive in a failed system
within hours of a drive failure alert, you'd better base your setup on
hardware where you have room for a spare drive that the RAID system
can take into operation on its own to replace a broken one.
Which level is right for you depends on how much reliability and
storage space you need. RAID-1 (mirroring) is the most reliable
method, since you only need one working drive in the entire
array. RAID-5 can tolerate one drive failing, and naturally is has
less of storage overhead because of that. A setup with two active
disks and one spare disk in a RAID-1 array would appear to be a good
choice for you, since it doesn't seem that your system would need huge
amounts of storage.
>
> Lastly, remote failure detection is very important - as I understand
> it, these SCSI-to-SCSI controllers mostly just 'beep' when they have a
> failing drive. Is it fundamentally impossible to discover that a
> drive has failed via an audit, or is it just a matter of getting the right
> driver
> interface?
All RAID systems have some way of notifying about a drive failure. For
Linux software RAID, you can see it from the /proc/mdstat output,
hardware controllers report the information in their drivers'
respective information. Simplest solution is to poll this information
periodically and trigger the hotel's fire alarm if something goes
wrong. ;)
--
Osma Ahvenlampi