Hi!

I setup a system with RAID0 using 2.2.13-ac3 and raidtools 0.90.990824
(the current debian potato package, compiled for slink).

My main goal in building this raid is stability and fault-tolerance. And
indeed, the Linux software RAID system looks very promising -- I just
found one problem.

Normal operation is fine. When I simulate a failure by shutting down the
system, removing one drive and powering it up again, it works as
expected (i.e. boots, complains that the mirror is missing but
continues). At this point, I can use raidhotadd to add a fresh partition
to the set and the kernel automatically starts creating the mirror. This
is perfect.

However, I intend to be able to willingly remove one disk from the
mirror, hot-remove it form the SCSI bus (provided the hardware handles this,
of course), attach a new drive and re-create the mirror there. From what
I gater out of the documentation, this should work like this:

raidsetfaulty /dev/mdX /dev/sdXX
raidhotremove /dev/mdX /dev/sdXX
(swap disks)
raidhotadd /dev/mdX /dev/sdXX

Since I have no hot-swap hardware yet, I tried the above without the
third step. Up to the "raidhotremove", this also seems to work fine --
i.e. /proc/mdstat shows that the /dev/sdXX is no longer part of
/dev/mdX. However, in three out of four times, I get a kernel panic 
when I either re-add the formerly removed drive, or also when I add a
frehs partition using "raidhotadd". The kernel panic always is a null
pointer dereference at virtual address 00000000 and already occured in 
kupdate and mdrecoveryd. I can provide exact details (i.e. a report run
through ksymoops) if this is of any interest.



Now for the question: Am I doing something totally stupid, or is this a
real bug in the RAID code? And, if it's the first case, how is one
supposed to "migrate" a mirror onto a different device while the system
is running?

-- 
Andreas Trottmann <[EMAIL PROTECTED]>

Reply via email to