On Thursday 18 November 2010, Uwe Dippel wrote:
> Just had a problem with softraid on a 4.6 box. No, I don't ask to solve
> it, it needed urgent replacement, and so I did.
> What I would like to ask for, is advice on best practices for softraid
> under OpenBSD, to prevent similar things from happening again; getting
> hints on how to set it up better, and mostly: how to recover it better.
>
> What happened, was that some slices in a softraid simply went away after
> some power surge.
> In detail: sd1 and sd2 were set to RAID, and the ensuing RAID1 (sd3)
> sliced up into a number of /usr/, /var, /home/, /var/www, /var/mail, swap.
> After the reboot after a power surge, two of the slices (/var/mail, sd3g
> and /home, sd3h) were simply unavailable, couldn't be 'mount -a'-ed at
> reboot, and the system fell back to '/' only being mounted (on sd0).
> Strangely, though, disklabel sd3 showed the slices, as sd3g, sd3h. But
> they could not be accessed at all; and were not visible under /dev/.

The /dev directory is not automatically populated - if sd3g and sd3h were not 
present then it sounds like you were missing device nodes. Running MAKEDEV 
for sd3 would have probably resolved this.

> Still, an unexpected bahaviour as far as I am concerned, even more so
> since sysctl and bioctl showed an 'OK' and 'Online' softraid.

This means that the softraid metadata was intact and that the volume was 
correctly assembled.

> I tried a few things, like fsck_ffs on these two disappeared slices, as
> well as the 'good' ones. The good ones were good, also with fsck_ffs -f.
> But the two gone missing were just not available (as devices). Then I
> made, I guess, a big mistake, and instead of ripping out one of the
> drives, I bioctl -d -ed sd3; leaving 2 drives with RAID file system on
> them. Over.

The `bioctl -d` command is non-destructive for softraid - it will detach the 
softraid volume and you should be able to reconstruct it again with 
`bioctl -c`.

> Now, please, any suggestions on how to do better next time something
> like this happens?

Without further information it is hard to tell what has actually occurred - if 
you have indeed lost device nodes then it is likely that fsck reported that 
it made corrections to sd0a (possibly even after it asked you to confirm 
those changes). Regardless of any hardware or software RAID you will still 
potentially face filesystem level problems with sudden power outages - the 
best solution is probably to get yourself a good UPS.
-- 

   "Stop assuming that systems are secure unless demonstrated insecure;
    start assuming that systems are insecure unless designed securely."
          - Bruce Schneier

Reply via email to