"Walter Haidinger" writes:
> Hi!
> 
> Summary: raid set reconstruction fails with "error rewriting parity"
> for sets with non-root autoconfigure enabled, works when disabled.
> It seems as if there is a bug when reading the the component label.
> 
> Details:
> I'm running a OpenBSD 3.9 GENERIC kernel with RAID enabled.
> That is, no other changes but the ones from raid(4):
>    pseudo-device raid 4
>    option    RAID_AUTOCONFIG
> 
> I'm running a raid1 mirror of both ide channel master devices.
> After a complete disk failure of wd1, I replaced the faulty drive
> and rebootet (came up in degraded mode on wd0 just fine) and did
> fdisk/disklabel to match wd0 layout which looks as in
> "Auto-configuration and Root on RAID" of raidctl(8):
> wd[01]a: minium openbsd install (/bsd is RAID capable kernel)
> wd[01]e: raid0  (raid0a is /)
> wd[01]f: raid1  (raid1b is swap)
> wd[01]g: raid2  (raid2d is /usr, raid2e is /var, ...)
> All raid sets are set to autoconfigure, raid0 as root autoconfigure.
> 
> Then I tried to resync using the method from raidctl(8), bottom of
> "Dealing with Component Failures", i.e.:
> # raidctl -a /dev/wd1e raid0
> # raidctl -F component1 raid0
> # raidctl -a /dev/wd1f raid1
> # raidctl -F component1 raid1
> # raidctl -a /dev/wd1g raid2
> # raidctl -F component1 raid2
> 
> Only rebuilding root autoconfigued raid0 set succeeded.
> Non-root sets raid1 and raid2 failed with
> raidctl: ioctl (RAIDFRAME_GET_COMPONENT_LABEL).
> 
> Adding a spare did work:
> 
> # raidctl -a /dev/wd1g raid1

Isn't that the spare you used for raid2 ?

> # raidctl -vs raid1
> raid1 Components:
>            /dev/wd0f: optimal
>           component1: failed
> Spares:
>            /dev/wd1f: spare

Oh.. but here it's correct..

> Component label for /dev/wd0f:
>    Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
>    Version: 2, Serial Number: 298644, Mod Counter: 657
>    Clean: No, Status: 0
>    sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
>    Queue size: 100, blocksize: 512, numBlocks: 1024000
>    RAID Level: 1
>    Autoconfig: Yes
>    Root partition: No
>    Last configured as: raid1
> component1 status is: failed.  Skipping label.
> /dev/wd1f status is: spare.  Skipping label.
> Parity status: DIRTY
> Reconstruction is 100% complete.
> Parity Re-write is 100% complete.
> Copyback is 100% complete.
> 
> However, failure and immediate reconstruction did not work:
> 
> # raidctl -F component1 raid1
> # raidctl -vs raid1
> raid1 Components:
>            /dev/wd0f: optimal
>           component1: reconstructing
> Spares:
>            /dev/wd1f: used_spare
> Component label for /dev/wd0f:
>    Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
>    Version: 2, Serial Number: 298644, Mod Counter: 658
>    Clean: No, Status: 0
>    sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
>    Queue size: 100, blocksize: 512, numBlocks: 1024000
>    RAID Level: 1
>    Autoconfig: Yes
>    Root partition: No
>    Last configured as: raid1
> component1 status is: reconstructing.  Skipping label.
> raidctl: ioctl (RAIDFRAME_GET_COMPONENT_LABEL) failed

Hmm.. where is the lines saying reconstruction is "n% complete"? 
(they arn't pretty, but in this case they'd be useful)

> raidctl -F subsequently fails with "error rewriting parity".

That error will only come from attempting to check parity on a RAID 
set with a failed component.  It has nothing to do with "raidctl -F".

How long did you wait for the reconstruction to finish?  
For the above output, note that it still says "reconstructing" 
for component1...  When that finishes, it will say "spared".  

Later...

Greg Oster

Reply via email to