On Fri, 31 Dec 1999, Jim Ford wrote:
> I've been experimenting with my 4 disk Raid5 setup for a few weeks now and been
>impressed - so far.
>
> Today, I sucumbed to the temptation of of simulating a disk failure (or more
>accurately a power supply failure to a disk), by 'hot' unplugging its power lead.
>Nothing appeared to happen at first - mdstat reported that all disks were working.
>Then the system stopped responding to console commands - not even to a
>shutdown.'Never mind' I thought, I'll power cycle and when it fires up again the
>array will get reconstructed and all will be fine. However, on restarting, md
>reported that _2_ disks were non-fresh and were kicked from the array. This left only
>2 disks for reconstruction - and the system gave up with a kernel panic. I'm now left
>with a system that I can't do anything with except a reinstallation from scratch.
>
If you have to simulate disk failure, do NOT disconnect the power.
Disconnect the bus..or if you have to disconnect the power, do
not disconnect the ground wire. Well all this is very unsafe, but
disconnecting the whole power chord might do bad things in your
SCSI bus, perhaps even burn your host card.
> Where did I go wrong - what strategy should be adopted in such a situation? It seems
>to me that cutting the power to a disk is a reasonable test - simulating a faulty
>connection. Should I have waited longer before shutting the power off to the system -
>and was this the reason that the 2nd disk went down? If I'd had a spare disk in the
>array, would it have reconstructed O.K.?
>
> Now - this is the interesting one - if the system had _not_ been Raid5 I could
>probably have done the same thing and still ended up with a useable system. In
>likelyhood, all that would have happened is that the fs would have been marked as
>dirty and a fschk carried out at next reboot. This suggests that Raid5 _can_ be more
>fragile than a non Raid setup.
>
I simulated failures few times in raid5 array by disconnecting
bus from thone of the drives and i didn't notice any of this
behaviour. Perhaps SCSI host card makes a difference
> Regards: Jim Ford
>