Vitaliy,
Thanks for the details. I wasn't aware of the 'freeing' property. That is
very useful to see progress.
There's plenty of space on the pool both now and when the delete started.
No checkpoint, no dedup. This is a raidz3 pool of 90 12TB disks.
I've been bumping zfs_free_min_time_ms but it has only has minor
influence. It currently set to 100000. Should I keep bumping this by
orders of magnitude? I'd rather hobble the pool temporarily to work
through this crippling problem.
# zpool list hcpdr03
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP
HEALTH ALTROOT
hcpdr03 1.02P 631T 416T - - 10% 60% 1.00x
DEGRADED -
# zpool get freeing hcpdr03
NAME PROPERTY VALUE SOURCE
hcpdr03 freeing 125T default
# zpool status hcpdr03|head
pool: hcpdr03
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Jun 1 21:52:38 2020
1 scanned out of 639T at 1/s, (scan is slow, no estimated time)
0 resilvered, 0.00% done
It dropped a disk about two weeks ago and progress is almost non-existant.
It was rebooted yesterday. It was about 5% complete before the reboot.
Previously, this pool would resilver in 5-7 days.
I/O is relatively low for this pool:
# zpool iostat hcpdr03
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
hcpdr03 631T 416T 118 553 507K 10.2M
# iostat -xn |head
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
24.6 8910.7 115.4 56272.5 10.9 13.2 1.2 1.5 9 40 hcpdr01
28.9 1823.5 125.7 14127.5 3.0 2.1 1.6 1.1 4 34 hcpdr02
160.1 2279.3 687.9 21067.8 8.7 8.3 3.5 3.4 3 22 hcpdr03
-Chip
On Mon, Jun 1, 2020 at 11:03 PM Vitaliy Gusev <[email protected]>
wrote:
> 1. Can you play with zfs_free_min_time_ms ? Default value is 1/5 of the
> txg sync time (zfs_txg_timeout).
>
> unsigned int zfs_free_min_time_ms = 1000; /* min millisecs to free per txg
> */
>
> Also It could be that reading metadata for freeing is slow (due to ARC
> constraints or heavy I/O or fragmented pool on HDD) and this also could
> lead to side effect then metadata cannot be read effectively enough to be
> ready within zfs_txg_timeout seconds and blocks’ freeing is postponed to
> the next spa-sync. Look at dsl_scan_async_block_should_pause() for
> details.
>
> 2. Don't you have set checkpoint on the pool ? It can break reclaiming if
> there was no enough space, look at spa_suspend_async_destroy() for more
> details.
>
> 3. Don’t you have enabled dedup ? Data blocks can be referenced in this
> case and will not be freed.
>
> BTW, Do you see "zpool get freeing $pool” shows 150TB ?
>
> ———
> Vitaliy Gusev
>
>
>
> On 1 Jun 2020, at 21:34, Schweiss, Chip <[email protected]> wrote:
>
> These are ZFS folders that were destroyed. Snapshots and all.
>
> zfs destroy -r {folder}
>
> It is not instant. This too goes on the delete queue which recycles blocks
> in the background.
>
>
>
------------------------------------------
illumos: illumos-discuss
Permalink:
https://illumos.topicbox.com/groups/discuss/T51c43cca03b19c45-Maaae9a685fb0d747b9281bc9
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription