Re: [discuss] Delete queue priority

Schweiss, Chip Wed, 03 Jun 2020 04:08:46 -0700

I've continued to move zfs_free_min_time_ms  exponentially larger with some
progress.  At 1,000,000 its is making some progress on this large delete
queue.


# zpool get freeing hcpdr03
NAME     PROPERTY  VALUE    SOURCE
hcpdr03  freeing   103T     default

The I/O load is still relatively low on the pool

At least it will now complete before Christmas.

-Chip

On Tue, Jun 2, 2020 at 8:58 AM Vitaliy Gusev <[email protected]>
wrote:

> Don’t you think that freeing can be hampered by random intensive I/O of
> HDDs ? If you have resilvering at the same time during large freeing, they
> can effect to each other.
>
>
>         1 scanned out of 639T at 1/s, (scan is slow, no estimated time)
>
>
> So resilvering is also stuck ?
>
> # iostat -xn |head
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    24.6 8910.7  115.4 56272.5 10.9 13.2    1.2    1.5   9  40 hcpdr01
>    28.9 1823.5  125.7 14127.5  3.0  2.1    1.6    1.1   4  34 hcpdr02
>   160.1 2279.3  687.9 21067.8  8.7  8.3    3.5    3.4   3  22 hcpdr03
>
>
> Could you find most busy %b and “wait”, %w and others  ? Does it have high
> values ?
>
> Also it would be helpful to look at output:
>
> "zpool iostat -vyl $pool 10”
>
> and
>
> "zpool iostat -vyq $pool 10”
>
>
> ———
> Vitaliy Gusev
>
>
>
>
> On 2 Jun 2020, at 14:14, Schweiss, Chip <[email protected]> wrote:
>
> Vitaliy,
>
> Thanks for the details.  I wasn't aware of the 'freeing' property.  That
> is very useful to see progress.
>
> There's plenty of space on the pool both now and when the delete started.
>  No checkpoint, no dedup.  This is a raidz3 pool of 90 12TB disks.
>
> I've been bumping zfs_free_min_time_ms but it has only has minor
> influence.  It currently set to 100000.  Should I keep bumping this by
> orders of magnitude?  I'd rather hobble the pool temporarily to work
> through this crippling problem.
>
> # zpool list hcpdr03
> NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP
>  HEALTH  ALTROOT
> hcpdr03  1.02P   631T   416T        -         -    10%    60%  1.00x
>  DEGRADED  -
>
> # zpool get freeing hcpdr03
> NAME     PROPERTY  VALUE    SOURCE
> hcpdr03  freeing   125T     default
>
> # zpool status hcpdr03|head
>   pool: hcpdr03
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>         continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>   scan: resilver in progress since Mon Jun  1 21:52:38 2020
>         1 scanned out of 639T at 1/s, (scan is slow, no estimated time)
>     0 resilvered, 0.00% done
>
> It dropped a disk about two weeks ago and progress is almost
> non-existant.  It was rebooted yesterday.  It was about 5% complete before
> the reboot.  Previously, this pool would resilver in 5-7 days.
>
> I/O is relatively low for this pool:
> # zpool iostat hcpdr03
>                capacity     operations    bandwidth
> pool        alloc   free   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> hcpdr03      631T   416T    118    553   507K  10.2M
>
> # iostat -xn |head
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    24.6 8910.7  115.4 56272.5 10.9 13.2    1.2    1.5   9  40 hcpdr01
>    28.9 1823.5  125.7 14127.5  3.0  2.1    1.6    1.1   4  34 hcpdr02
>   160.1 2279.3  687.9 21067.8  8.7  8.3    3.5    3.4   3  22 hcpdr03
>
> -Chip
>
>
>
>
>
>
> On Mon, Jun 1, 2020 at 11:03 PM Vitaliy Gusev <[email protected]>
> wrote:
>
>> 1. Can you play with zfs_free_min_time_ms ? Default value is 1/5 of the
>> txg sync time (zfs_txg_timeout).
>>
>> unsigned int zfs_free_min_time_ms = 1000; /* min millisecs to free per
>> txg */
>>
>> Also It could be that reading metadata for freeing is slow (due to ARC
>> constraints or heavy I/O or fragmented pool on HDD) and this also could
>> lead to side effect then metadata cannot be read effectively enough to be
>> ready within zfs_txg_timeout seconds and blocks’ freeing is postponed to
>> the next spa-sync.   Look at dsl_scan_async_block_should_pause() for
>> details.
>>
>> 2. Don't you have set checkpoint on the pool ? It can break reclaiming if
>> there was no enough space, look at spa_suspend_async_destroy() for more
>> details.
>>
>> 3. Don’t you have enabled dedup ? Data blocks can be referenced in this
>> case and will not be freed.
>>
>> BTW, Do you see "zpool get freeing $pool” shows 150TB ?
>>
>> ———
>> Vitaliy Gusev
>>
>>

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T51c43cca03b19c45-M9175c87d1423ace17e1ce569
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Re: [discuss] Delete queue priority

Reply via email to