On Sun, 18 Sep 2016 10:05:52 -0600
Alan Somers wrote:
> On Sun, Sep 18, 2016 at 7:09 AM, Marc UBM Bocklet via freebsd-stable
> wrote:
> >
> > Hi all,
> >
> > due to two bad cables, I had two drives drop from my striped raidz2
> > pool (built on top of geli encrypted drives). I replaced one of the
> > drives before I realized that the cabling was at fault - that's the
> > drive which is being replaced in the ouput of zpool status below.
> >
> > I have just installed the new cables and all sata errors are gone.
> > However, the resilver of the pool keeps restarting.
> >
> > I see no errors in /var/log/messages, but zpool history -i says:
> >
> > 2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0
> > 2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0
> > 2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0
> > 2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0
> > 2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0
> > 2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3
> > maxtxg=1219391
> >
> > I assume that "scan done complete=0" means that the resilver didn't
> > finish?
> >
> > pool layout is the following:
> >
> > pool: pool
> > state: DEGRADED
> > status: One or more devices is currently being resilvered. The pool
> > will continue to function, possibly in a degraded state.
> > action: Wait for the resilver to complete.
> > scan: resilver in progress since Sun Sep 18 14:51:39 2016
> > 235G scanned out of 9.81T at 830M/s, 3h21m to go
> > 13.2M resilvered, 2.34% done
> > config:
> >
> > NAMESTATE READ WRITE CKSUM
> > poolDEGRADED 0 0 0
> > raidz2-0 ONLINE 0 0 0
> > da6.eli ONLINE 0 0 0
> > da7.eli ONLINE 0 0 0
> > ada1.eliONLINE 0 0 0
> > ada2.eliONLINE 0 0 0
> > da10.eliONLINE 0 0 2
> > da11.eliONLINE 0 0 0
> > da12.eliONLINE 0 0 0
> > da13.eliONLINE 0 0 0
> > raidz2-1 DEGRADED 0 0 0
> > da0.eli ONLINE 0 0 0
> > da1.eli ONLINE 0 0 0
> > da2.eli ONLINE 0 0 1
> > (resilvering)
> > replacing-3 DEGRADED 0 0 1
> > 10699825708166646100 UNAVAIL 0 0 0
> > was /dev/da3.eli da4.eliONLINE 0 0 0
> > (resilvering)
> > da3.eli ONLINE 0 0 0
> > da5.eli ONLINE 0 0 0
> > da8.eli ONLINE 0 0 0
> > da9.eli ONLINE 0 0 0
> >
> > errors: No known data errors
> >
> > system is
> > FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633:
> > Mon Sep 15 22:34:05 CEST 2014
> > root@xxx:/usr/obj/usr/src/sys/xxx amd64
> >
> > controller is
> > SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor]
> >
> > Drives are connected via four four-port sata cables.
> >
> > Should I upgrade to 10.3-release or did I make some sort of
> > configuration error / overlook something?
> >
> > Thanks in advance!
> >
> > Cheers,
> > Marc
>
> Resilver will start over anytime there's new damage. In your case,
> with two failed drives, resilver should've begun after you replaced
> the first drive, and restarted after you replaced the second. Have
> you seen it restart more than that? If so, keep an eye on the error
> counters in "zpool status"; they might give you a clue. You could
> also raise the loglevel of devd to "info" in /etc/syslog.conf and see
> what gets lo