On 13.09.19 г. 4:51 ч., Qu Wenruo wrote:
> [BUG]
> The following script can cause btrfs qgroup data space leak:
>
> mkfs.btrfs -f $dev
> mount $dev -o nospace_cache $mnt
>
> btrfs subv create $mnt/subv
> btrfs quota en $mnt
> btrfs quota rescan -w $mnt
> btrfs qgroup limit 128m $mnt/subv
>
> for (( i = 0; i < 3; i++)); do
> # Create 3 64M holes for latter fallocate to fail
> truncate -s 192m $mnt/subv/file
> xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null
> xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null
> sync
>
> # it's supposed to fail, and each failure will leak at least 64M
> # data space
> xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null
> rm $mnt/subv/file
> sync
> done
>
> # Shouldn't fail after we removed the file
> xfs_io -f -c "falloc 0 64m" $mnt/subv/file
>
> [CAUSE]
> Btrfs qgroup data reserve code allows multiple reserve happen on a
^
reservations to happen
> single extent_changeset:
>
> The only usage is in btrfs_fallocate():
> struct extent_changeset *data_reserved = NULL;
> btrfs_qgroup_reserve_data(inode, &data_reserved,
> range_start, range_len);
> ...
> btrfs_qgroup_reserve_data(inode, &data_reserved,
> new_range_start, new_range_len);
> extent_changeset_free(data_reserved);
I take it you refer to the while() loop in btrfs_fallocate. The code
above is really just a _VERY_ condensed version. extent_changeset_free
is at the end of the function. Instead of putting random lines of code
just explicitly state it, something along the lines of:
"The only such pattern is in btrfs_fallocate in the main while loop in
that function".
>
> However in btrfs_qgroup_reserve_data(), if one of the call failed, it
> > will cleanup all reserved space.
> The cleanup itself is OK, but it only cleans up all
> EXTENT_QGROUP_RESERVED flag, forget to release the reserved bytes.
>
> So if multiple btrfs_qgroup_reserve_data() get called, and the last one
> failed, then previously reserved data space will get leaked.
>
> And due to the fact that EXTENT_QGROUP_RESERVED flag is cleaned
> correctly, btrfs_qgroup_check_reserved_leak() won't catch the leakage.
How about rephraing the above 3 paragraphs along the lines of:
"btrfs_qgroup_reserve_data's error handling has a bug in that on error
it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED flag and
doesn't free the reserved bytes. This behavior has a two fold effect:
1. Clearing EXTENT_QGROUP_RESERVED ranges prevents
btrfs_qgroup_check_reserved_leak to catch the leakage.
2. Leak the previously reserved data bytes.
The bug manifests when N calls to btrfs_qgroup_reserve_data are made and
the last one fails, leaking space allocated in the previous ones.
"
>
> [FIX]
> Also free previously reserved data bytes when btrfs_qgroup_reserve_data
> fails.
>
> Fixes: 524725537023 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data
> function")
> Signed-off-by: Qu Wenruo <w...@suse.com>
> ---
> fs/btrfs/qgroup.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 64bdc3e3652d..59f6a9981087 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -3448,6 +3448,9 @@ int btrfs_qgroup_reserve_data(struct inode *inode,
> while ((unode = ulist_next(&reserved->range_changed, &uiter)))
> clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
> unode->aux, EXTENT_QGROUP_RESERVED, 0, 0,
> NULL);
> + /* Also free data bytes of already reserved one */
> + btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid,
> + orig_reserved, BTRFS_QGROUP_RSV_DATA);
> extent_changeset_release(reserved);
> return ret;
> }
>