Re: zfs resilver keeps restarting

2016-09-19 Thread Marc UBM Bocklet via freebsd-stable
On Sun, 18 Sep 2016 11:41:54 -0600
Alan Somers  wrote:

> On Sun, Sep 18, 2016 at 10:46 AM, Marc UBM Bocklet via freebsd-stable
>  wrote:
> > On Sun, 18 Sep 2016 10:05:52 -0600
> > Alan Somers  wrote:
> >
> >> On Sun, Sep 18, 2016 at 7:09 AM, Marc UBM Bocklet via freebsd-stable
> >>  wrote:
> >> >
> >> > Hi all,
> >> >
> >> > due to two bad cables, I had two drives drop from my striped raidz2
> >> > pool (built on top of geli encrypted drives). I replaced one of the
> >> > drives before I realized that the cabling was at fault - that's the
> >> > drive which is being replaced in the ouput of zpool status below.
> >> >
> >> > I have just installed the new cables and all sata errors are gone.
> >> > However, the resilver of the pool keeps restarting.
> >> >
> >> > I see no errors in /var/log/messages, but zpool history -i says:
> >> >
> >> > 2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3
> >> > maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0
> >> > 2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3
> >> > maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0
> >> > 2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3
> >> > maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0
> >> > 2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3
> >> > maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0
> >> > 2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3
> >> > maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0
> >> > 2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3
> >> > maxtxg=1219391
> >> >
> >> > I assume that "scan done complete=0" means that the resilver didn't
> >> > finish?
> >> >
> >> > pool layout is the following:
> >> >
> >> >  pool: pool
> >> >  state: DEGRADED
> >> > status: One or more devices is currently being resilvered.  The pool
> >> > will continue to function, possibly in a degraded state.
> >> > action: Wait for the resilver to complete.
> >> >   scan: resilver in progress since Sun Sep 18 14:51:39 2016
> >> > 235G scanned out of 9.81T at 830M/s, 3h21m to go
> >> > 13.2M resilvered, 2.34% done
> >> > config:
> >> >
> >> > NAMESTATE READ WRITE CKSUM
> >> > poolDEGRADED 0 0 0
> >> >   raidz2-0  ONLINE   0 0 0
> >> > da6.eli ONLINE   0 0 0
> >> > da7.eli ONLINE   0 0 0
> >> > ada1.eliONLINE   0 0 0
> >> > ada2.eliONLINE   0 0 0
> >> > da10.eliONLINE   0 0 2
> >> > da11.eliONLINE   0 0 0
> >> > da12.eliONLINE   0 0 0
> >> > da13.eliONLINE   0 0 0
> >> >   raidz2-1  DEGRADED 0 0 0
> >> > da0.eli ONLINE   0 0 0
> >> > da1.eli ONLINE   0 0 0
> >> > da2.eli ONLINE   0 0 1
> >> > (resilvering)
> >> > replacing-3 DEGRADED 0 0 1
> >> >   10699825708166646100  UNAVAIL  0 0 0
> >> > was /dev/da3.eli da4.eliONLINE   0 0 0
> >> > (resilvering)
> >> > da3.eli ONLINE   0 0 0
> >> > da5.eli ONLINE   0 0 0
> >> > da8.eli ONLINE   0 0 0
> >> > da9.eli ONLINE   0 0 0
> >> >
> >> > errors: No known data errors
> >> >
> >> > system is
> >> > FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633:
> >> > Mon Sep 15 22:34:05 CEST 2014
> >> > root@xxx:/usr/obj/usr/src/sys/xxx  amd64
> >> >
> >> > 

Re: zfs resilver keeps restarting

2016-09-18 Thread Marc UBM Bocklet via freebsd-stable
On Sun, 18 Sep 2016 10:05:52 -0600
Alan Somers  wrote:

> On Sun, Sep 18, 2016 at 7:09 AM, Marc UBM Bocklet via freebsd-stable
>  wrote:
> >
> > Hi all,
> >
> > due to two bad cables, I had two drives drop from my striped raidz2
> > pool (built on top of geli encrypted drives). I replaced one of the
> > drives before I realized that the cabling was at fault - that's the
> > drive which is being replaced in the ouput of zpool status below.
> >
> > I have just installed the new cables and all sata errors are gone.
> > However, the resilver of the pool keeps restarting.
> >
> > I see no errors in /var/log/messages, but zpool history -i says:
> >
> > 2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0
> > 2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0
> > 2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0
> > 2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0
> > 2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0
> > 2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3
> > maxtxg=1219391
> >
> > I assume that "scan done complete=0" means that the resilver didn't
> > finish?
> >
> > pool layout is the following:
> >
> >  pool: pool
> >  state: DEGRADED
> > status: One or more devices is currently being resilvered.  The pool
> > will continue to function, possibly in a degraded state.
> > action: Wait for the resilver to complete.
> >   scan: resilver in progress since Sun Sep 18 14:51:39 2016
> > 235G scanned out of 9.81T at 830M/s, 3h21m to go
> > 13.2M resilvered, 2.34% done
> > config:
> >
> > NAMESTATE READ WRITE CKSUM
> > poolDEGRADED 0 0 0
> >   raidz2-0  ONLINE   0 0 0
> > da6.eli ONLINE   0 0 0
> > da7.eli ONLINE   0 0 0
> > ada1.eliONLINE   0 0 0
> > ada2.eliONLINE   0 0 0
> > da10.eliONLINE   0 0 2
> > da11.eliONLINE   0 0 0
> > da12.eliONLINE   0 0 0
> > da13.eliONLINE   0 0 0
> >   raidz2-1  DEGRADED 0 0 0
> > da0.eli ONLINE   0 0 0
> > da1.eli ONLINE   0 0 0
> > da2.eli ONLINE   0 0 1
> > (resilvering)
> > replacing-3 DEGRADED 0 0 1
> >   10699825708166646100  UNAVAIL  0 0 0
> > was /dev/da3.eli da4.eliONLINE   0 0 0
> > (resilvering)
> > da3.eli ONLINE   0 0 0
> > da5.eli ONLINE   0 0 0
> > da8.eli ONLINE   0 0 0
> > da9.eli ONLINE   0 0 0
> >
> > errors: No known data errors
> >
> > system is
> > FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633:
> > Mon Sep 15 22:34:05 CEST 2014
> > root@xxx:/usr/obj/usr/src/sys/xxx  amd64
> >
> > controller is
> > SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor]
> >
> > Drives are connected via four four-port sata cables.
> >
> > Should I upgrade to 10.3-release or did I make some sort of
> > configuration error / overlook something?
> >
> > Thanks in advance!
> >
> > Cheers,
> > Marc
> 
> Resilver will start over anytime there's new damage.  In your case,
> with two failed drives, resilver should've begun after you replaced
> the first drive, and restarted after you replaced the second.  Have
> you seen it restart more than that?  If so, keep an eye on the error
> counters in "zpool status"; they might give you a clue.  You could
> also raise the loglevel of devd to "info" in /etc/syslog.conf and see
> what gets lo