Walter Haidinger writes: > First of all: Thanks for replying to an issue with a > non-generic kernel! I really appreciate that!
That it was a non-generic kernel didn't even cross my mind... it was an issue w/ RAIDframe, and that's why I responded... > On Thu, 29 Jun 2006, Greg Oster wrote: > > > > Adding a spare did work: > > > > > > # raidctl -a /dev/wd1g raid1 > > > > Isn't that the spare you used for raid2 ? > > Sorry, cut&paste error, should have been wd1f. > > > Hmm.. where is the lines saying reconstruction is "n% complete"? > > (they arn't pretty, but in this case they'd be useful) > > I'm sorry, I did not record those. Reconstructing did take some time, > though, I recall checking the progress, nothing suspicous there, So did the reconstruction actually complete? > > > raidctl -F subsequently fails with "error rewriting parity". > > > > That error will only come from attempting to check parity on a RAID > > set with a failed component. It has nothing to do with "raidctl -F". > > Oh yes, of course! Should have mentioned that I've tried raidctl -P > after raidctl -F ... Ok... so the big question is still: how far along was the reconstruction? "raidctl -P" would fail even if the reconstruct was still in progress. > > How long did you wait for the reconstruction to finish? > > For the above output, note that it still says "reconstructing" > > for component1... When that finishes, it will say "spared". > > And what about the spare? Shouldn't it replace component1? It won't replace it in the output of 'raidctl -s', but it will replace component1 for all accesses and what-not.. (and will take its proper place (with autoconfig turned on) after a reboot (well... sans a known bug in rf_reconstruct.c where this line: c_label.partitionSize = raidPtr->Disks[srow][scol].partitionSize; should be added to where it says: /* XXXX MORE NEEDED HERE. */ ) > That never happend. Instead, component1 sequence was: > failed -> reconstructing -> failed. Hmm... I think you should see failed->reconstructing->spared (that's what you'd see if 'component1' was a "normal" disk...) You might want to check /var/log/messages* for some indication as to why the reconstruction failed... (as well, there should be something in there indicating the reconstruction completed, if it did...) Later... Greg Oster