Hi all, I'm trying to simulate a disk fail and replacement in
a raidz array and failing myself. What'm I doing wrong? Here's
a transcript with interspersed commentary:

r...@file:~# zpool status
  pool: raid
 state: ONLINE
 scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:20:06 2010
config:

        NAME        STATE     READ WRITE CKSUM
        raid        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad12    ONLINE       0     0     0
            ad13    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0

errors: No known data errors
r...@file:~# zpool offline raid ad12

reboot
dd if=/dev/zero of=/dev/ad12 ..

r...@file:~# zpool replace raid ad12
cannot replace ad12 with ad12: ad12 is busy
r...@file:~# zpool replace -f raid ad12
cannot replace ad12 with ad12: ad12 is busy

        The handbook suggests 'replace' but I guess this is only
        if the disk is physically replaced and gets a new identifier?
        Trying with 'online':

r...@file:~# zpool online raid ad12
r...@file:~# zpool status
  pool: raid
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Sat Nov 27 13:29:14 2010
config:

        NAME        STATE     READ WRITE CKSUM
        raid        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad12    ONLINE       0     0     0  15.5K resilvered
            ad13    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0

errors: No known data errors

        Output remains as such, is this normal?

r...@file:~# zpool scrub raid
r...@file:~# zpool status
  pool: raid
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:30:37 2010
config:

        NAME        STATE     READ WRITE CKSUM
        raid        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad12    ONLINE       0     0 2.11K  87.7M repaired
            ad13    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0

errors: No known data errors
r...@file:~# zpool scrub raid
r...@file:~# zpool status
  pool: raid
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:30:55 2010
config:

        NAME        STATE     READ WRITE CKSUM
        raid        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad12    ONLINE       0     0 2.11K
            ad13    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0

errors: No known data errors

        These are checksum errors? So the disk hasn't been integrated
        properly?

r...@file:~# zpool clear raid ad12
r...@file:~# zpool status
  pool: raid
 state: ONLINE
 scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:39:09 2010
config:

        NAME        STATE     READ WRITE CKSUM
        raid        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad12    ONLINE       0     0     0
            ad13    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0

errors: No known data errors
r...@file:~# zpool status -x
all pools are healthy

        To make sure this's the case I fail a different disk:

r...@file:~# zpool offline raid ad6
r...@file:~# zpool status   
  pool: raid
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: scrub completed after 0h0m with 0 errors on Sat Nov 27 13:40:52 2010
config:

        NAME        STATE     READ WRITE CKSUM
        raid        DEGRADED     0     0     0
          raidz1    DEGRADED     0     0     0
            ad12    ONLINE       0     0     0
            ad13    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     OFFLINE      0     0     0

errors: No known data errors

        on reboot the status changes:

r...@file:~# zpool status
  pool: raid
 state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-72
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        raid        FAULTED      0     0     1  corrupted data
          raidz1    DEGRADED     0     0     6
            ad12    OFFLINE      0     0     0
            ad13    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     1


The same happens if I recreate the array and try again.
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to