On 2/4/2021 6:11 PM, Filipe Manana wrote:
On Thu, Feb 4, 2021 at 8:48 AM Anand Jain <[email protected]> wrote:
On 2/3/2021 7:17 PM, [email protected] wrote:
From: Filipe Manana <[email protected]>
When we active a swap file, at btrfs_swap_activate(), we acquire the
exclusive operation lock to prevent the physical location of the swap
file extents to be changed by operations such as balance and device
replace/resize/remove. We also call there can_nocow_extent() which,
among other things, checks if the block group of a swap file extent is
currently RO, and if it is we can not use the extent, since a write
into it would result in COWing the extent.
However we have no protection against a scrub operation running after we
activate the swap file, which can result in the swap file extents to be
COWed while the scrub is running and operating on the respective block
group, because scrub turns a block group into RO before it processes it
and then back again to RW mode after processing it. That means an attempt
to write into a swap file extent while scrub is processing the respective
block group, will result in COWing the extent, changing its physical
location on disk.
Fix this by making sure that block groups that have extents that are used
by active swap files can not be turned into RO mode, therefore making it
not possible for a scrub to turn them into RO mode.
When a scrub finds a
block group that can not be turned to RO due to the existence of extents
used by swap files, it proceeds to the next block group and logs a warning
message that mentions the block group was skipped due to active swap
files - this is the same approach we currently use for balance.
It is better if this info is documented in the scrub man-page. IMO.
This ends up removing the need to call btrfs_extent_readonly() from
can_nocow_extent(), as btrfs_swap_activate() now checks if a block group
is RO through the new function btrfs_inc_block_group_swap_extents().
The only other caller of can_nocow_extent() is the direct IO write path,
There is a third caller. check_can_nocow() also calls
can_nocow_extent(), which I missed before. Any idea where does it get to
know that extent is RO in the threads using check_can_nocow()? I have to
back out the RB for this reason for now.
btrfs_get_blocks_direct_write(), but that already checks if a block group
is RO through the call to btrfs_inc_nocow_writers().
In fact, after this
change we end up optimizing the direct IO write path, since we no longer
iterate the block groups rbtree twice, once with btrfs_extent_readonly(),
through can_nocow_extent(), and once again with btrfs_inc_nocow_writers().
This can save time and reduce contention on the lock that protects the
rbtree (specially because it is a spinlock and not a read/write lock) on
very large filesystems, with several thousands of allocated block groups.
Fixes: ed46ff3d42378 ("Btrfs: support swap files")
Signed-off-by: Filipe Manana <[email protected]>
I am not sure about the optimization of direct IO part, but as such
changes looks good.
Clarifying about the optimization part (for both buffered and direct IO)
- After patch 1, and patch 2, now we check on the RO extents after the
functions btrfs_cross_ref_exist(), and csum_exist_in_range(), both of
them have search_slot, whereas, before patch 1, and patch 2, we used to
fail early (if the extent is RO) and avoided the search_slot, so I am
not sure if there is optimization.
Thanks, Anand