On 3/7/17 7:36 PM, Qu Wenruo wrote: > > > At 03/08/2017 03:21 AM, Jeff Mahoney wrote: >> On 2/27/17 2:10 AM, Qu Wenruo wrote: >>> [BUG] >>> The easist way to reproduce the bug is: >>> ------ >>> # mkfs.btrfs -f $dev -n 16K >>> # mount $dev $mnt -o inode_cache >>> # btrfs quota enable $mnt >>> # btrfs quota rescan -w $mnt >>> # btrfs qgroup show $mnt >>> qgroupid rfer excl >>> -------- ---- ---- >>> 0/5 32.00KiB 32.00KiB >>> ^^ Twice the correct value >>> ------ >>> >>> And fstests/btrfs qgroup test group can easily detect them with >>> inode_cache mount option. >>> Although some of them are false alerts since old test cases are using >>> fixed golden output. >>> While new test cases will use "btrfs check" to detect qgroup mismatch. >>> >>> [CAUSE] >>> Inode_cache mount option will make commit_fs_roots() to call >>> btrfs_save_ino_cache() to update fs/subvol trees, and generate new >>> delayed refs. >>> >>> However we call btrfs_qgroup_prepare_account_extents() too early, before >>> commit_fs_roots(). >>> This makes the "old_roots" for newly generated extents are always NULL. >>> For freeing extent case, this makes both new_roots and old_roots to be >>> empty, while correct old_roots should not be empty. >>> This causing qgroup numbers not decreased correctly. >>> >>> [FIX] >>> Modify the timing of calling btrfs_qgroup_prepare_account_extents() to >>> just before btrfs_qgroup_account_extents(), and add needed delayed_refs >>> handler. >>> So qgroup can handle inode_map mount options correctly. >>> >>> Signed-off-by: Qu Wenruo <[email protected]> >> >> Could we also fix this by excepting the free space inode from qgroups? >> It seems like this is something we'd want to do anyway unless we want to >> handle EDQUOT there too. > > I'm afraid it's not possible here.
Not within the context of this patch set, but I was thinking we could
just never do the tracing up front so there'd be nothing to check later.
I still think this is probably the right approach for non-fsroots but
it does seem like there's no way to do it easily on a per-inode basis.
> As qgroup accounts any tree and data blocks that belong to specified
> tree(fs and subvolumes), not caring the inode it belongs to.
>
> And the design of free inode space cache is to restore it in
> fs/subvolume tree, which has no difference with normal inode, except it
> doesn't have INODE_REF and its ino is FREE_INO.
> (Much the same behavior for space cache inode)
>
> The correct solution for caching free space and inode should be the new
> space cache tree, which puts all these info into their own tree, never
> affecting existing trees.
Agreed.
> The only good news is, inode_cache is not commonly used and IIRC has
> bugs. Maybe it will be good idea to depreciate the option?
I wouldn't object to that.
-Jeff
>>> ---
>>> fs/btrfs/transaction.c | 25 ++++++++++++++++++-------
>>> 1 file changed, 18 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>>> index 6b3e0fc2fe7a..1ff3ec797356 100644
>>> --- a/fs/btrfs/transaction.c
>>> +++ b/fs/btrfs/transaction.c
>>> @@ -2130,13 +2130,6 @@ int btrfs_commit_transaction(struct
>>> btrfs_trans_handle *trans)
>>> goto scrub_continue;
>>> }
>>>
>>> - /* Reocrd old roots for later qgroup accounting */
>>> - ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
>>> - if (ret) {
>>> - mutex_unlock(&fs_info->reloc_mutex);
>>> - goto scrub_continue;
>>> - }
>>> -
>>> /*
>>> * make sure none of the code above managed to slip in a
>>> * delayed item
>>> @@ -2179,6 +2172,24 @@ int btrfs_commit_transaction(struct
>>> btrfs_trans_handle *trans)
>>> btrfs_free_log_root_tree(trans, fs_info);
>>>
>>> /*
>>> + * commit_fs_roots() can call btrfs_save_ino_cache(), which
>>> generates
>>> + * new delayed refs. Must handle them or qgroup can be wrong.
>>> + */
>>> + ret = btrfs_run_delayed_refs(trans, fs_info, (unsigned long)-1);
>>> + if (ret) {
>>> + mutex_unlock(&fs_info->tree_log_mutex);
>>> + mutex_unlock(&fs_info->reloc_mutex);
>>> + goto scrub_continue;
>>> + }
>>> +
>>> + ret = btrfs_qgroup_prepare_account_extents(trans, fs_info);
>>> + if (ret) {
>>> + mutex_unlock(&fs_info->tree_log_mutex);
>>> + mutex_unlock(&fs_info->reloc_mutex);
>>> + goto scrub_continue;
>>> + }
>>> +
>>> + /*
>>> * Since fs roots are all committed, we can get a quite accurate
>>> * new_roots. So let's do quota accounting.
>>> */
>>>
>>
>>
>
>
>
--
Jeff Mahoney
SUSE Labs
signature.asc
Description: OpenPGP digital signature
