Hi!

Summary: raid set reconstruction fails with "error rewriting parity"
for sets with non-root autoconfigure enabled, works when disabled.
It seems as if there is a bug when reading the the component label.

Details:
I'm running a OpenBSD 3.9 GENERIC kernel with RAID enabled.
That is, no other changes but the ones from raid(4):
   pseudo-device raid 4
   option    RAID_AUTOCONFIG

I'm running a raid1 mirror of both ide channel master devices.
After a complete disk failure of wd1, I replaced the faulty drive
and rebootet (came up in degraded mode on wd0 just fine) and did
fdisk/disklabel to match wd0 layout which looks as in
"Auto-configuration and Root on RAID" of raidctl(8):
wd[01]a: minium openbsd install (/bsd is RAID capable kernel)
wd[01]e: raid0  (raid0a is /)
wd[01]f: raid1  (raid1b is swap)
wd[01]g: raid2  (raid2d is /usr, raid2e is /var, ...)
All raid sets are set to autoconfigure, raid0 as root autoconfigure.

Then I tried to resync using the method from raidctl(8), bottom of
"Dealing with Component Failures", i.e.:
# raidctl -a /dev/wd1e raid0
# raidctl -F component1 raid0
# raidctl -a /dev/wd1f raid1
# raidctl -F component1 raid1
# raidctl -a /dev/wd1g raid2
# raidctl -F component1 raid2

Only rebuilding root autoconfigued raid0 set succeeded.
Non-root sets raid1 and raid2 failed with
raidctl: ioctl (RAIDFRAME_GET_COMPONENT_LABEL).

Adding a spare did work:

# raidctl -a /dev/wd1g raid1
# raidctl -vs raid1
raid1 Components:
           /dev/wd0f: optimal
          component1: failed
Spares:
           /dev/wd1f: spare
Component label for /dev/wd0f:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 298644, Mod Counter: 657
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 1024000
   RAID Level: 1
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid1
component1 status is: failed.  Skipping label.
/dev/wd1f status is: spare.  Skipping label.
Parity status: DIRTY
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

However, failure and immediate reconstruction did not work:

# raidctl -F component1 raid1
# raidctl -vs raid1
raid1 Components:
           /dev/wd0f: optimal
          component1: reconstructing
Spares:
           /dev/wd1f: used_spare
Component label for /dev/wd0f:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 298644, Mod Counter: 658
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 1024000
   RAID Level: 1
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid1
component1 status is: reconstructing.  Skipping label.
raidctl: ioctl (RAIDFRAME_GET_COMPONENT_LABEL) failed

raidctl -F subsequently fails with "error rewriting parity".

To solve the problem, I disabled autoconfiguration for raid1
and raid2 with raidctl -A no raid[12] and rebootet into the
rescue installation (boot -a, selected wd0a).
There, raidctl -F component1 raid1 succeeded, as well as raid2.

After reconstruction/parity-rewrite, I did _not_ re-enable auto-
configuration for raid1 and raid2, i.e. only raid0 is configured as:
raidctl -A root raid0
Reboot succeded and all raid sets are fine except that only raid0
is autoconfigured now.

Is this correct behaviour/setup? raidctl(8) says about autoconfiguration:
"RAID sets raid0, raid1, and raid2 are all marked as auto-configurable."

I hope this description is somewhat complete.
If any additional information is required, I'd be happy to provide.

Regards, Walter

Reply via email to