I've continued to move zfs_free_min_time_ms exponentially larger with some progress. At 1,000,000 its is making some progress on this large delete queue.
# zpool get freeing hcpdr03 NAME PROPERTY VALUE SOURCE hcpdr03 freeing 103T default The I/O load is still relatively low on the pool At least it will now complete before Christmas. -Chip On Tue, Jun 2, 2020 at 8:58 AM Vitaliy Gusev <[email protected]> wrote: > Don’t you think that freeing can be hampered by random intensive I/O of > HDDs ? If you have resilvering at the same time during large freeing, they > can effect to each other. > > > 1 scanned out of 639T at 1/s, (scan is slow, no estimated time) > > > So resilvering is also stuck ? > > # iostat -xn |head > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 24.6 8910.7 115.4 56272.5 10.9 13.2 1.2 1.5 9 40 hcpdr01 > 28.9 1823.5 125.7 14127.5 3.0 2.1 1.6 1.1 4 34 hcpdr02 > 160.1 2279.3 687.9 21067.8 8.7 8.3 3.5 3.4 3 22 hcpdr03 > > > Could you find most busy %b and “wait”, %w and others ? Does it have high > values ? > > Also it would be helpful to look at output: > > "zpool iostat -vyl $pool 10” > > and > > "zpool iostat -vyq $pool 10” > > > ——— > Vitaliy Gusev > > > > > On 2 Jun 2020, at 14:14, Schweiss, Chip <[email protected]> wrote: > > Vitaliy, > > Thanks for the details. I wasn't aware of the 'freeing' property. That > is very useful to see progress. > > There's plenty of space on the pool both now and when the delete started. > No checkpoint, no dedup. This is a raidz3 pool of 90 12TB disks. > > I've been bumping zfs_free_min_time_ms but it has only has minor > influence. It currently set to 100000. Should I keep bumping this by > orders of magnitude? I'd rather hobble the pool temporarily to work > through this crippling problem. > > # zpool list hcpdr03 > NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP > HEALTH ALTROOT > hcpdr03 1.02P 631T 416T - - 10% 60% 1.00x > DEGRADED - > > # zpool get freeing hcpdr03 > NAME PROPERTY VALUE SOURCE > hcpdr03 freeing 125T default > > # zpool status hcpdr03|head > pool: hcpdr03 > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Mon Jun 1 21:52:38 2020 > 1 scanned out of 639T at 1/s, (scan is slow, no estimated time) > 0 resilvered, 0.00% done > > It dropped a disk about two weeks ago and progress is almost > non-existant. It was rebooted yesterday. It was about 5% complete before > the reboot. Previously, this pool would resilver in 5-7 days. > > I/O is relatively low for this pool: > # zpool iostat hcpdr03 > capacity operations bandwidth > pool alloc free read write read write > ---------- ----- ----- ----- ----- ----- ----- > hcpdr03 631T 416T 118 553 507K 10.2M > > # iostat -xn |head > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 24.6 8910.7 115.4 56272.5 10.9 13.2 1.2 1.5 9 40 hcpdr01 > 28.9 1823.5 125.7 14127.5 3.0 2.1 1.6 1.1 4 34 hcpdr02 > 160.1 2279.3 687.9 21067.8 8.7 8.3 3.5 3.4 3 22 hcpdr03 > > -Chip > > > > > > > On Mon, Jun 1, 2020 at 11:03 PM Vitaliy Gusev <[email protected]> > wrote: > >> 1. Can you play with zfs_free_min_time_ms ? Default value is 1/5 of the >> txg sync time (zfs_txg_timeout). >> >> unsigned int zfs_free_min_time_ms = 1000; /* min millisecs to free per >> txg */ >> >> Also It could be that reading metadata for freeing is slow (due to ARC >> constraints or heavy I/O or fragmented pool on HDD) and this also could >> lead to side effect then metadata cannot be read effectively enough to be >> ready within zfs_txg_timeout seconds and blocks’ freeing is postponed to >> the next spa-sync. Look at dsl_scan_async_block_should_pause() for >> details. >> >> 2. Don't you have set checkpoint on the pool ? It can break reclaiming if >> there was no enough space, look at spa_suspend_async_destroy() for more >> details. >> >> 3. Don’t you have enabled dedup ? Data blocks can be referenced in this >> case and will not be freed. >> >> BTW, Do you see "zpool get freeing $pool” shows 150TB ? >> >> ——— >> Vitaliy Gusev >> >> ------------------------------------------ illumos: illumos-discuss Permalink: https://illumos.topicbox.com/groups/discuss/T51c43cca03b19c45-M9175c87d1423ace17e1ce569 Delivery options: https://illumos.topicbox.com/groups/discuss/subscription
