On 2019/1/23 下午3:15, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree
> 
> Which is based on v5.0-rc1.
> 
> This patch address the heavy load subtree scan, but delaying it until
> we're going to modify the swapped tree block.
> 
> The overall workflow is:
> 
> 1) Record the subtree root block get swapped.
> 
>    During subtree swap:
>    O = Old tree blocks
>    N = New tree blocks
>          reloc tree                         subvol tree X
>             Root                               Root
>            /    \                             /    \
>          NA     OB                          OA      OB
>        /  |     |  \                      /  |      |  \
>      NC  ND     OE  OF                   OC  OD     OE  OF
> 
>   In these case, NA and OA is going to be swapped, record (NA, OA) into
>   subvol tree X.
> 
> 2) After subtree swap.
>          reloc tree                         subvol tree X
>             Root                               Root
>            /    \                             /    \
>          OA     OB                          NA      OB
>        /  |     |  \                      /  |      |  \
>      OC  OD     OE  OF                   NC  ND     OE  OF
> 
> 3a) CoW happens for OB
>     If we are going to CoW tree block OB, we check OB's bytenr against
>     tree X's swapped_blocks structure.
>     It doesn't fit any one, nothing will happen.
> 
> 3b) CoW happens for NA
>     Check NA's bytenr against tree X's swapped_blocks, and get a hit.
>     Then we do subtree scan on both subtree OA and NA.
>     Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).
> 
>     Then no matter what we do to subvol tree X, qgroup numbers will
>     still be correct.
>     Then NA's record get removed from X's swapped_blocks.
> 
> 4)  Transaction commit
>     Any record in X's swapped_blocks get removed, since there is no
>     modification to swapped subtrees, no need to trigger heavy qgroup
>     subtree rescan for them.
> 
> [[Benchmark]] (*)
> Hardware:
>       VM 4G vRAM, 8 vCPUs,
>       disk is using 'unsafe' cache mode,
>       backing device is SAMSUNG 850 evo SSD.
>       Host has 16G ram.
> 
> Mkfs parameter:
>       --nodesize 4K (To bump up tree size)
> 
> Initial subvolume contents:
>       4G data copied from /usr and /lib.
>       (With enough regular small files)
> 
> Snapshots:
>       16 snapshots of the original subvolume.
>       each snapshot has 3 random files modified.
> 
> balance parameter:
>       -m
> 
> So the content should be pretty similar to a real world root fs layout.
> 
> And after file system population, there is no other activity, so it
> should be the best case scenario.
> 
>                      | v4.20-rc1            | w/ patchset    | diff
> -----------------------------------------------------------------------
> relocated extents    | 22615                | 22457          | -0.1%
> qgroup dirty extents | 163457               | 121606         | -25.6%
> time (sys)           | 22.884s              | 18.842s        | -17.6%
> time (real)          | 27.724s              | 22.884s        | -17.5%
> 
> *: Due to a bug in v5.0-rc1, balancing metadata with snapshots is
> unacceptably slow even with quota disabled. So the result is from
> v4.20-rc1.
> 
> changelog:
> v2:
> - Rebase to v4.20-rc1.
> 
> - Instead commit transaction after each reloc tree merge, delay it until
>   merge_reloc_roots() finishes.
>   This provides a more natural behavior, and reduce the unnecessary
>   transaction commits.
> 
> v3:
> - Fix backref walk deadlock by not triggering it at all.
>   This also removes the need for @exec_post refactor and replace the
>   patch to allow @old_root unpopulated.
> 
> - Include the patch that fixes the unexpected data rsv free.
> 
> v3.1:
> - Rebased to v4.20-rc1.
>   Minor conflicts with some cleanup code.
> 
> v4:
> - Renaming members from "file_*" to "subv_*".
>   Members like "file_bytenr" is pretty confusing, renaming it to
>   "subv_bytenr" avoid the confusion.
> 
> - Use btrfs_root::reloc_dirty_list to replace dynamic memory allocation
>   One less point of failure, and no need to worry about GFP_KERNEL/NOFS.
>   Furthermore, it's easier to manipulate list than rb tree.
> 
> v5:
> - Use Josef's superior qgroup deadlock fix.
>   No performance regression now.
> 
> - A new patch to allow delayed subtree rescan to insert empty old_roots.

I should double check the cover letter.
This part is incorrect, please just ignore it.

Thanks,
Qu

> 
> - Fix a possible race due to wrong rb_tree node initialization out of
>   critical section.
> 
> - A lot of coding style fixes:
>   * naming change from "file"/"subv" to "subvol"
>   * {} for any else if branch
>   * avoid err/ret confusion by introducing "tmp_ret"
>   * proper errno for non-uptodate extent buffer
>   * struct member re-ordering to avoid unnecessary padding
>   * avoid single letter variable name
>   * less redundant emphasizing
>   * move certain devel-only warning under CONFIG_BTRFS_DEBUG
>   * replace cool-sounding 'hack' with 'optimization'
>   * remove unnecessary inline prefix for btrfs_qgroup_init_swapped_blocks
>   * keep an empty line before #endif
> 
> 
> Josef Bacik (1):
>   btrfs: honor path->skip_locking in backref code
> 
> Qu Wenruo (6):
>   btrfs: qgroup: Move reserved data account from btrfs_delayed_ref_head
>     to btrfs_qgroup_extent_record
>   btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots()
>   btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap()
>   btrfs: qgroup: Introduce per-root swapped blocks infrastructure
>   btrfs: qgroup: Use delayed subtree rescan for balance
>   btrfs: qgroup: Cleanup old subtree swap code
> 
>  fs/btrfs/backref.c           |  16 +-
>  fs/btrfs/ctree.c             |   8 +
>  fs/btrfs/ctree.h             |  29 +++
>  fs/btrfs/delayed-ref.c       |  15 +-
>  fs/btrfs/delayed-ref.h       |  11 --
>  fs/btrfs/disk-io.c           |   2 +
>  fs/btrfs/extent-tree.c       |   3 -
>  fs/btrfs/qgroup.c            | 339 +++++++++++++++++++++++++++--------
>  fs/btrfs/qgroup.h            | 120 +++++++++++--
>  fs/btrfs/relocation.c        | 101 ++++++++---
>  fs/btrfs/transaction.c       |   1 +
>  include/trace/events/btrfs.h |  29 ---
>  12 files changed, 502 insertions(+), 172 deletions(-)
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to