Chip,

> On 3 Jun 2020, at 14:07, Schweiss, Chip <[email protected]> wrote:
> 
> I've continued to move zfs_free_min_time_ms  exponentially larger with some 
> progress.  At 1,000,000 its is making some progress on this large delete 
> queue.   
> 

I assume  setting zfs_free_min_time_ms more than (zfs_txg_timeout * 1000) 
should no have effects.


> # zpool get freeing hcpdr03
> NAME     PROPERTY  VALUE    SOURCE
> hcpdr03  freeing   103T     default

Previously you had 125T, so 22TB is freed during one day - good progress. Is 
resilvering completed ?


> The I/O load is still relatively low on the pool  
> 

 Does "zpool iostat -yl 5” show large values ? And what HDDs do you have (IOPs, 
latency, etc) ?

———
Vitaliy Gusev


> At least it will now complete before Christmas.
> 
> -Chip
> 
> On Tue, Jun 2, 2020 at 8:58 AM Vitaliy Gusev <[email protected] 
> <mailto:[email protected]>> wrote:
> Don’t you think that freeing can be hampered by random intensive I/O of HDDs 
> ? If you have resilvering at the same time during large freeing, they can 
> effect to each other.
> 
> 
>>         1 scanned out of 639T at 1/s, (scan is slow, no estimated time)
> 
> 
> So resilvering is also stuck ?
> 
>> # iostat -xn |head
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>    24.6 8910.7  115.4 56272.5 10.9 13.2    1.2    1.5   9  40 hcpdr01
>>    28.9 1823.5  125.7 14127.5  3.0  2.1    1.6    1.1   4  34 hcpdr02
>>   160.1 2279.3  687.9 21067.8  8.7  8.3    3.5    3.4   3  22 hcpdr03
> 
> Could you find most busy %b and “wait”, %w and others  ? Does it have high 
> values ? 
> 
> Also it would be helpful to look at output:
> 
> "zpool iostat -vyl $pool 10”
> and 
> "zpool iostat -vyq $pool 10” 
> 
> ———
> Vitaliy Gusev
> 
> 
> 
> 
>> On 2 Jun 2020, at 14:14, Schweiss, Chip <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Vitaliy,
>> 
>> Thanks for the details.  I wasn't aware of the 'freeing' property.  That is 
>> very useful to see progress.
>> 
>> There's plenty of space on the pool both now and when the delete started.   
>> No checkpoint, no dedup.  This is a raidz3 pool of 90 12TB disks.
>> 
>> I've been bumping zfs_free_min_time_ms but it has only has minor influence.  
>> It currently set to 100000.  Should I keep bumping this by orders of 
>> magnitude?  I'd rather hobble the pool temporarily to work through this 
>> crippling problem.  
>> 
>> # zpool list hcpdr03
>> NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH 
>>  ALTROOT
>> hcpdr03  1.02P   631T   416T        -         -    10%    60%  1.00x  
>> DEGRADED  -
>> 
>> # zpool get freeing hcpdr03
>> NAME     PROPERTY  VALUE    SOURCE
>> hcpdr03  freeing   125T     default
>> 
>> # zpool status hcpdr03|head
>>   pool: hcpdr03
>>  state: DEGRADED
>> status: One or more devices is currently being resilvered.  The pool will
>>         continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>>   scan: resilver in progress since Mon Jun  1 21:52:38 2020
>>         1 scanned out of 639T at 1/s, (scan is slow, no estimated time)
>>     0 resilvered, 0.00% done
>> 
>> It dropped a disk about two weeks ago and progress is almost non-existant.  
>> It was rebooted yesterday.  It was about 5% complete before the reboot.  
>> Previously, this pool would resilver in 5-7 days.
>> 
>> I/O is relatively low for this pool:
>> # zpool iostat hcpdr03
>>                capacity     operations    bandwidth
>> pool        alloc   free   read  write   read  write
>> ----------  -----  -----  -----  -----  -----  -----
>> hcpdr03      631T   416T    118    553   507K  10.2M
>> 
>> # iostat -xn |head
>>                     extended device statistics
>>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>    24.6 8910.7  115.4 56272.5 10.9 13.2    1.2    1.5   9  40 hcpdr01
>>    28.9 1823.5  125.7 14127.5  3.0  2.1    1.6    1.1   4  34 hcpdr02
>>   160.1 2279.3  687.9 21067.8  8.7  8.3    3.5    3.4   3  22 hcpdr03
>> 
>> -Chip
>> 
>> 
>> 
>> 
>>   
>> 
>> On Mon, Jun 1, 2020 at 11:03 PM Vitaliy Gusev <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 1. Can you play with zfs_free_min_time_ms ? Default value is 1/5 of the txg 
>> sync time (zfs_txg_timeout).
>> 
>>      unsigned int zfs_free_min_time_ms = 1000; /* min millisecs to free per 
>> txg */
>> 
>> Also It could be that reading metadata for freeing is slow (due to ARC 
>> constraints or heavy I/O or fragmented pool on HDD) and this also could lead 
>> to side effect then metadata cannot be read effectively enough to be ready 
>> within zfs_txg_timeout seconds and blocks’ freeing is postponed to the next 
>> spa-sync.   Look at dsl_scan_async_block_should_pause() for details.
>> 
>> 2. Don't you have set checkpoint on the pool ? It can break reclaiming if 
>> there was no enough space, look at spa_suspend_async_destroy() for more 
>> details.
>> 
>> 3. Don’t you have enabled dedup ? Data blocks can be referenced in this case 
>> and will not be freed.
>> 
>> BTW, Do you see "zpool get freeing $pool” shows 150TB ? 
>> 
>> ———
>> Vitaliy Gusev
>> 

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T51c43cca03b19c45-M5fe01157b57d6a1a15b56020
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Reply via email to