[Ian Morgan]
> I can raidhotremove the (simulated-)faulty disk, and then physically remove
> it. Next, I put the disk back in physically. I then want to run raidhotadd to
> add the disk back into the array and begin reconstruction.
>
> Problem is, when I run raidhotadd, the system totally locks up solid. I've
> tried giving it time to come back to life, but nothing happens even after
> several minutes, and the system is so dead that the software watchdog is
> also toast.
In my experience, any drive manipulation (in terms of what's attached,
what's seen by the kernel, etc) that locks up the machine has been
strictly a device driver problem. Assuming this is SCSI, it may
help to do the add-single-device/remove-single-device commands as
per drivers/scsi/scsi.c lines 2389 and 2447 respectively (2.2.15 src).
If the initial detachment didn't propogate up the device removal through
the driver, the reattachment may have caused some problems (creating
data structures already there and populated, scribbling over valid
values... who knows). Just a guess.
> kernel: 2.2.16pre2 SMP
reproducible on 2.2.15 proper?
> raid: mingo's raid-2.2.15-A0
> tools: raidtools-19990824-0.90
>
> Is this a known problem? Am I using the right procedure to replace a faulty
> disk? Would a raidstop/raidstart work? Isn't there a way to replace a drive
> without taking the array down? The HOWTO is not very detailed in this area
> of reconstruction. It makes it sound like this should all be a no-brainer.
James