On 12/15/20 3:23 PM, Darrick J. Wong wrote:
On Mon, Dec 14, 2020 at 01:19:40PM -0500, Josef Bacik wrote:
Darrick reported a potential issue to me where we could allow mmap
writes after validating a page range matched in the case of dedupe.
Generally we rely on lock page -> lock extent with the ordered flush to
protect us, but this is done after we check the pages because we use the
generic helpers, so we could modify the page in between doing the check
and locking the range.
FWIW I only found that via code inspection because Matthew Wilcox asked
me about whether or not filesystems did the right thing. I wrote the
attached fstest to try to demonstrate the problem on btrfs but it
actually passes on xfs and btrfs. This means either that (a) btrfs is
doing it right through some other means because I don't understand its
locking or (b) the test is wrong.
The test /does/ explode as expected on ocfs2 because mmap io lol there.
The problem exists, it's just the window is super narrow. If you put a
schedule_timeout(HZ) right after btrfs_remap_file_range_prep() you'd hit the
problem every time. We basically have to mkwrite right between doing the
vfs_dedupe_file_range_compare() and calling btrfs_extent_same(), which is going
to be hard to do. You aren't missing something, it's just hard to hit. Thanks,
Josef