On 2018/9/26 下午10:06, David Sterba wrote:
> On Tue, Sep 11, 2018 at 01:38:11PM +0800, Qu Wenruo wrote:
>> This patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
>> The base commit is v4.19-rc1 tag.
> 
> I want to merge this patchset to 4.20, it's been in for-next for some
> time and it addresses a problem that gets reported very often.
> 
> I'm still doing some tests, there are minor issues that are not
> blocking, more code review is always welcome.
> 
>> There are a lot of reports of system hang for balance on quota enabled
>> fs.
>> It's most obvious for large fs.
>>
>> The hang is caused by tons of unmodified extents marked as qgroup dirty.
>> Such unmodified/unrelated sources include:
>> 1) Unmodified subtree
>> 2) Subtree drop for reloc tree
>> (BTW, other sources includes unmodified file extent items)
>>
>> E.g.
>> OO = Old tree blocks from file tree
>> NN = New tree blocks from reloc tree
>>
>>         file tree                              reloc tree
>>            OO (a)                                  NN (a)
>>           /  \                                    /  \
>>     (b) OO    OO (c)                        (b) NN    NN (c)
>>        / \   / \                               / \   / \
>>      OO  OO OO  OO                           OO  OO OO  NN
>>     (d) (e) (f) (g)                         (d) (e) (f) (g)
>>
>> In above case, balance will modify nodeptr in OO(a) to point NN(b) and
>> NN(c), and modify NN(a) to point to OO(B) and OO(c).
>>
>> Before this patch, quota will mark the whole subtree from its parent
>> down to the leaves as dirty.
>> So btrfs quota need to trace all tree block from (a) to (g).
> 
> I find the use of 'trace' a bit confusing as there are the trace events,
> but as it's already used in the qgroup code I think it can stay as is.

I'm pretty open to rename the word "trace".

Any recommendation for a newer name?

> 
>> However tree blocks (d) (e) (f) are shared between both trees, thus
>> there is no need to trace those 3 tree blocks.
>>
>> This patchset will change how this work by only tracing modified tree
>> blocks in reloc tree, and their counter parts in file tree.
>>
>> With this patch, we could skip tree blocks OO(d)~OO(f) in above example,
>> thus reduce some some overhead caused by qgroup.
>>
>> The improvement is mostly related to metadata relocation.
>>
>> Also, for metadata relocation, we don't really need to trace data
>> extents as they're not modified at all.
>>
>>
>> [[Benchmark]]
> [...]
> 
> Thanks for the benchmarks, it would be good to add the information to
> the patch that's bringing the performance improvement.

Could do that.

However the current improved (from around 10 * nr_relocated to 6~7 *
nr_relocated) is the final result of the last 4 patches.

I'm not pretty sure if any single patch could bring obvious improvement.

> 
>> Hardware:
>>      VM 4G vRAM, 8 vCPUs,
>>      disk is using 'unsafe' cache mode,
>>      backing device is SAMSUNG 850 evo SSD.
>>      Host has 16G ram.
>>
>> Mkfs parameter:
>>      --nodesize 4K (To bump up tree size)
>>
>> Initial subvolume contents:
>>      4G data copied from /usr and /lib.
>>      (With enough regular small files)
>>
>> Snapshots:
>>      16 snapshots of the original subvolume.
>>      each snapshot has 3 random files modified.
>>
>> balance parameter:
>>      -m
>>
>> So the content should be pretty similar to a real world root fs layout.
>>
>>                      | v4.19-rc1    | w/ patchset    | diff (*)
>> ---------------------------------------------------------------
>> relocated extents    | 22874        | 22856          | +0.1%
>> qgroup dirty extents | 225251       | 140431         | -37.7%
>> time (sys)           | 40.161s      | 20.574s        | -48.7%
>> time (real)          | 42.163s      | 25.173s        | -40.3%
>>
>> *: (val_new - val_old) / val_old * 100%
>>
>> And the difference is becoming more and more obvious if more snapshots
>> are created.
>>
>> If we increase the number of snapshots to 64 (4 times the number of
>> references, and 64 snapshots is already not recommended to use with
>> quota)
>>
>>                      | v4.19-rc1    | w/ patchset    | diff (*)
>> ---------------------------------------------------------------
>> relocated extents    | 22462        | 22467          | +0.0%
>> qgroup dirty extents | 314380       | 140826         | -55.2%
>> time (sys)           | 158.033s     | 74.292s        | -53.0%
>> time (real)          | 197.395s     | 90.529s        | -67.6%
>>
>> For larger fs the saving should be even more obvious.
>>
>> Changelog:
>> v2:
>>   Rename "tree reloc tree" to "reloc tree".
>>   Add patch "Don't trace subtree if we're dropping reloc tree" into the
>>   patchset.
>>   Fix wrong btrfs_bin_search() call, which leads to unexpected ENOENT
>>   error for btrfs_qgroup_trace_extent_swap(). Now use dst_path->slots[]
>>   directly.
>>   
>> v3:
>>   Add new patch to avoid unnecessary data extents trace for metadata
>>   relocation.
>>   Better delayed ref time root owner detection to avoid unnecessary tree
>>   block tracing.
>>   Add benchmark result for the patchset.
>>
>> Qu Wenruo (7):
>>   btrfs: qgroup: Introduce trace event to analyse the number of dirty
>>     extents accounted
>>   btrfs: qgroup: Introduce function to trace two swaped extents
>>   btrfs: qgroup: Introduce function to find all new tree blocks of reloc
>>     tree
>>   btrfs: qgroup: Use generation aware subtree swap to mark dirty extents
>>   btrfs: qgroup: Don't trace subtree if we're dropping reloc tree
>>   btrfs: delayed-ref: Introduce new parameter for
>>     btrfs_add_delayed_tree_ref() to reduce unnecessary qgroup tracing
>>   btrfs: qgroup: Only trace data extents in leaves if we're relocating
>>     data block group
>>
>>  fs/btrfs/delayed-ref.c       |   5 +-
>>  fs/btrfs/delayed-ref.h       |   1 +
>>  fs/btrfs/extent-tree.c       |  25 ++-
>>  fs/btrfs/qgroup.c            | 347 +++++++++++++++++++++++++++++++++++
>>  fs/btrfs/qgroup.h            |  11 ++
>>  fs/btrfs/relocation.c        |  15 +-
>>  include/trace/events/btrfs.h |  21 +++
>>  7 files changed, 405 insertions(+), 20 deletions(-)
>>
>> -- 
>> 2.18.0

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to