On 2021/2/19 上午12:14, Josef Bacik wrote:
On 1/8/21 12:36 AM, Qu Wenruo wrote:
There are several qgroup flush related bugs fixed recently, all of them
are caused by the fact that we can trigger qgroup metadata space
reservation holding a transaction handle.

Thankfully the only situation to trigger above reservation is
btrfs_dirty_inode().

Currently btrfs_dirty_inode() will try join transactio first, then
update the inode.
If btrfs_update_inode() fails with -ENOSPC, then it retry to start
transaction to reserve metadata space.

This not only forces us to reserve metadata space with a transaction
handle hold, but can't handle other errors like -EDQUOT.

This patch will make btrfs_dirty_inode() to call
btrfs_start_transaction() directly without first try joining then
starting, so that in try_flush_qgroup() we won't hold a trans handle.

This will slow down btrfs_dirty_inode() but my fstests doesn't show too
much different for most test cases, thus it may be worthy to skip such
performance "optimization".

Signed-off-by: Qu Wenruo <w...@suse.com>

I'm not interested in slowing down the !qgroups case just for qgroups.
We want to short circuit the start here because it has the potential to
be _very_ expensive, when we may very well have space already allocated
for the inode.

The best solution I can think of for this is to add a bool to indicate
that we don't want to attempt to make reservations.

The root problem here is, btrfs_dirty_inode() itself is an odd ball by
itself.

All other call sites are either reserving its metadata space by
btrfs_start_transaction(), or reserve metadata manually like delalloc.

Only btrfs_dirty_inode() is using join transaction and put all the
reserve logical into btrfs_delayed_inode_reserve_metadata().

  The only problem
here is if the inode doesn't have space allocated for it, if it doesn't
we need to fall back anyway.  The speed up comes from inodes that
already have the delayed inode setup.  So simply tell it to error out if
we're not already set up, and then we can fail back to
btrfs_start_transaction().  That'll keep us in line with our performance
for !qgroups and solve your qgroup related deadlock problems.  Thanks,

Can't we do the check earlier? Like before we start or join transaction?

I really hate to do all the error out check deep in
btrfs_delayed_inode_reserve_metadata().

Thanks,
Qu


Josef

Reply via email to