Don’t you think that freeing can be hampered by random intensive I/O of HDDs ? 
If you have resilvering at the same time during large freeing, they can effect 
to each other.


>         1 scanned out of 639T at 1/s, (scan is slow, no estimated time)


So resilvering is also stuck ?

> # iostat -xn |head
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    24.6 8910.7  115.4 56272.5 10.9 13.2    1.2    1.5   9  40 hcpdr01
>    28.9 1823.5  125.7 14127.5  3.0  2.1    1.6    1.1   4  34 hcpdr02
>   160.1 2279.3  687.9 21067.8  8.7  8.3    3.5    3.4   3  22 hcpdr03

Could you find most busy %b and “wait”, %w and others  ? Does it have high 
values ? 

Also it would be helpful to look at output:

"zpool iostat -vyl $pool 10”
and 
"zpool iostat -vyq $pool 10” 

———
Vitaliy Gusev




> On 2 Jun 2020, at 14:14, Schweiss, Chip <[email protected]> wrote:
> 
> Vitaliy,
> 
> Thanks for the details.  I wasn't aware of the 'freeing' property.  That is 
> very useful to see progress.
> 
> There's plenty of space on the pool both now and when the delete started.   
> No checkpoint, no dedup.  This is a raidz3 pool of 90 12TB disks.
> 
> I've been bumping zfs_free_min_time_ms but it has only has minor influence.  
> It currently set to 100000.  Should I keep bumping this by orders of 
> magnitude?  I'd rather hobble the pool temporarily to work through this 
> crippling problem.  
> 
> # zpool list hcpdr03
> NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  
> ALTROOT
> hcpdr03  1.02P   631T   416T        -         -    10%    60%  1.00x  
> DEGRADED  -
> 
> # zpool get freeing hcpdr03
> NAME     PROPERTY  VALUE    SOURCE
> hcpdr03  freeing   125T     default
> 
> # zpool status hcpdr03|head
>   pool: hcpdr03
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>   scan: resilver in progress since Mon Jun  1 21:52:38 2020
>         1 scanned out of 639T at 1/s, (scan is slow, no estimated time)
>     0 resilvered, 0.00% done
> 
> It dropped a disk about two weeks ago and progress is almost non-existant.  
> It was rebooted yesterday.  It was about 5% complete before the reboot.  
> Previously, this pool would resilver in 5-7 days.
> 
> I/O is relatively low for this pool:
> # zpool iostat hcpdr03
>                capacity     operations    bandwidth
> pool        alloc   free   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> hcpdr03      631T   416T    118    553   507K  10.2M
> 
> # iostat -xn |head
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    24.6 8910.7  115.4 56272.5 10.9 13.2    1.2    1.5   9  40 hcpdr01
>    28.9 1823.5  125.7 14127.5  3.0  2.1    1.6    1.1   4  34 hcpdr02
>   160.1 2279.3  687.9 21067.8  8.7  8.3    3.5    3.4   3  22 hcpdr03
> 
> -Chip
> 
> 
> 
> 
>   
> 
> On Mon, Jun 1, 2020 at 11:03 PM Vitaliy Gusev <[email protected] 
> <mailto:[email protected]>> wrote:
> 1. Can you play with zfs_free_min_time_ms ? Default value is 1/5 of the txg 
> sync time (zfs_txg_timeout).
> 
>       unsigned int zfs_free_min_time_ms = 1000; /* min millisecs to free per 
> txg */
> 
> Also It could be that reading metadata for freeing is slow (due to ARC 
> constraints or heavy I/O or fragmented pool on HDD) and this also could lead 
> to side effect then metadata cannot be read effectively enough to be ready 
> within zfs_txg_timeout seconds and blocks’ freeing is postponed to the next 
> spa-sync.   Look at dsl_scan_async_block_should_pause() for details.
> 
> 2. Don't you have set checkpoint on the pool ? It can break reclaiming if 
> there was no enough space, look at spa_suspend_async_destroy() for more 
> details.
> 
> 3. Don’t you have enabled dedup ? Data blocks can be referenced in this case 
> and will not be freed.
> 
> BTW, Do you see "zpool get freeing $pool” shows 150TB ? 
> 
> ———
> Vitaliy Gusev
> 

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T51c43cca03b19c45-Meec62542fdd2053c88020388
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Reply via email to