Hi all,

        Does anyone know what types of disk failures are acceptable to the RAID
code? What I mean is this. Yesterday, I decided to actually put our RAID to
a test (RAID-1). While the machine was running I unplugged the SCSI cable
from one of the drives (yeah, yeah, I know, probably not very safe). Anyway,
the system didn't die (even though we do have swap memory on that drive as
well), but it didn't exactly continue to work either. The kernel printed out
message regarding running in degraded mode but it wasn't really useable.
        For example, my 'top' process continued to run but I couldn't run any
commands from the shell. My guess is that any attempt at disk access froze.
        Anyway, after about 4-5 minutes of this I plugged back in the SCSI cable
(yes, still live with power flowing), and things came back almost
immediately. RAID status showed the drives in degraded mode (as one would
expect).
        My next test was to unplug the power from one of the drives. This resulted
in a few kernel messages going by but then the system was happy and
continued running.

        So, what types of failures will RAID successfully overcome? Although in an
operationaly environment the SCSI cable isn't likely to come loose, couldn't
an internal drive failure "look like" that? Would the RAID be ok in such a
case?

        Oh, BTW, the system has dual channel SCSI (Adaptec AIC-7xxx) with one drive
on each channel. Thus when I unplugged the SCSI cable it only affected that
one channel. And, since we're using Ultra SCSI II (or whatever the proper
term is) the cable is actively terminated with a circuit board on its end so
I don't think it is a termination issue.


Any insights would be appreciated.

Thanks,

--Rainer

Reply via email to