On 5/4/18 1:59 AM, Nikolay Borisov wrote: > > > On 4.05.2018 01:27, Jeff Mahoney wrote: >> On 5/3/18 2:23 AM, Nikolay Borisov wrote: >>> >>> >>> On 3.05.2018 00:11, je...@suse.com wrote: >>>> From: Jeff Mahoney <je...@suse.com> >>>> >>>> Hi Dave - >>>> >>>> Here's the updated patchset for the rescan races. This fixes the issue >>>> where we'd try to start multiple workers. It introduces a new "ready" >>>> bool that we set during initialization and clear while queuing the worker. >>>> The queuer is also now responsible for most of the initialization. >>>> >>>> I have a separate patch set start that gets rid of the racy mess >>>> surrounding >>>> the rescan worker startup. We can handle it in btrfs_run_qgroups and >>>> just set a flag to start it everywhere else. >>> I'd be interested in seeing those patches. Some time ago I did send a >>> patch which cleaned up the way qgroup rescan was initiated. It was done >>> from "btrfs_run_qgroups" and I think this is messy. Whatever we do we >>> ought to really have well-defined semantics when qgroups rescan are run, >>> preferably we shouldn't be conflating rescan + run (unless there is >>> _really_ good reason to do). In the past the rescan from scan was used >>> only during qgroup enabling. >> >> I think btrfs_run_qgroups is the place to do it. Here's why: >> >> 2773 int >> 2774 btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info) >> 2775 { >> 2776 int ret = 0; >> 2777 struct btrfs_trans_handle *trans; >> 2778 >> 2779 ret = qgroup_rescan_init(fs_info, 0, 1); >> 2780 if (ret) >> 2781 return ret; >> 2782 >> 2783 /* >> 2784 * We have set the rescan_progress to 0, which means no more >> 2785 * delayed refs will be accounted by btrfs_qgroup_account_ref. >> 2786 * However, btrfs_qgroup_account_ref may be right after its call >> 2787 * to btrfs_find_all_roots, in which case it would still do the >> 2788 * accounting. >> 2789 * To solve this, we're committing the transaction, which will >> 2790 * ensure we run all delayed refs and only after that, we are >> 2791 * going to clear all tracking information for a clean start. >> 2792 */ >> 2793 >> 2794 trans = btrfs_join_transaction(fs_info->fs_root); >> 2795 if (IS_ERR(trans)) { >> 2796 fs_info->qgroup_flags &= >> ~BTRFS_QGROUP_STATUS_FLAG_RESCAN; >> 2797 return PTR_ERR(trans); >> 2798 } >> 2799 ret = btrfs_commit_transaction(trans); >> 2800 if (ret) { >> 2801 fs_info->qgroup_flags &= >> ~BTRFS_QGROUP_STATUS_FLAG_RESCAN; >> 2802 return ret; >> 2803 } >> 2804 >> 2805 qgroup_rescan_zero_tracking(fs_info); >> 2806 >> 2807 queue_rescan_worker(fs_info); >> 2808 return 0; >> 2809 } >> >> The delayed ref race should exist anywhere we initiate a rescan outside of >> initially enabling qgroups. We already zero the tracking and queue the >> rescan >> worker in btrfs_run_qgroups for when we enable qgroups. Why not just always >> queue the worker there so the initialization and execution has a clear >> starting point? > > This is no longer true in upstream as of commit 5d23515be669 ("btrfs: > Move qgroup rescan on quota enable to btrfs_quota_enable"). Hence my > asking about this. I guess if we make it unconditional it won't increase > the complexity, but the original code which was only run during qgroup > enable was rather iffy I Just don't want to repeat this.
Ah, ok. My repo is still using v4.16. How does this work with the race that is described in btrfs_qgroup_rescan? -Jeff -- Jeff Mahoney SUSE Labs
signature.asc
Description: OpenPGP digital signature