> On Jul 9, 2018, at 2:39 PM, Ken Merry <k...@freebsd.org> wrote:
> 
> Hi ZFS folks,
> 
> We (Spectra Logic) have seen some odd behavior with resilvers in RAIDZ3 pools.
> 
> The codebase in question is FreeBSD stable/11 from July 2017, at 
> approximately FreeBSD SVN version 321310.
> 
> We have customer systems with (sometimes) hundreds of SMR drives in RAIDZ3 
> vdevs in a large pool.  (A typical arrangement is a 23-drive RAIDZ3, and some 
> customers will put everything in one giant pool made up of a number of 
> 23-drive RAIDZ3 arrays.)
> 
> The SMR drives in question have a bug that sometimes causes them to go off 
> the SAS bus for up to two minutes.  (They’re usually gone a lot less than 
> that, up to 10 seconds.)  Once they come back online, zfsd puts the drive 
> back in the pool and makes it online.

ouch

> 
> If a resilver is active on a different drive, once the drive that temporarily 
> left comes back, the resilver apparently starts over from the beginning.
> 
> This leads to resilvers that take forever to complete, especially on systems 
> with high load.
> 
> Is this expected behavior?

scans/resilvers are at the DSL layer. The scan thread goes through each dataset 
and starts at 
the txg needed (full scan starts at txg= effectively 0).

> 
> It seems that only one scan can be active on a pool at any given time.  Is 
> that correct?  If so, is that true for an entire top level pool, or just a 
> given redundancy group?  (In this case, it would be the RAIDZ3 vdev.)

There is one scan thread.

> 
> Is there anything we can do to make sure the resilvers complete in a 
> reasonable period of time or otherwise improve the behavior?  (Short of 
> putting in different drives…I have already suggested that.)

There are ways to change or tune the ZIO scheduler, but that won't make SMR 
drives any faster.
 -- richard

> 
> Thanks,
> 
> Ken
>  —
> Ken Merry
> k...@freebsd.org
> 

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/T2a7340f4c0c48fa9-M1557c2e89aed98caf806a17a
Delivery options: https://openzfs.topicbox.com/groups

Reply via email to