On Mon, May 06, 2019 at 04:44:02PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana <fdman...@suse.com>
> 
> When we do a full fsync (the bit BTRFS_INODE_NEEDS_FULL_SYNC is set in the
> inode) that happens to be ranged, which happens during a msync() or writes
> for files opened with O_SYNC for example, we can end up with a corrupt log,
> due to different file extent items representing ranges that overlap with
> each other, or hit some assertion failures.
> 
> When doing a ranged fsync we only flush delalloc and wait for ordered
> exents within that range. If while we are logging items from our inode
> ordered extents for adjacent ranges complete, we end up in a race that can
> make us insert the file extent items that overlap with others we logged
> previously and the assertion failures.
> 
> For example, if tree-log.c:copy_items() receives a leaf that has the
> following file extents items, all with a length of 4K and therefore there
> is an implicit hole in the range 68K to 72K - 1:
> 
>   (257 EXTENT_ITEM 64K), (257 EXTENT_ITEM 72K), (257 EXTENT_ITEM 76K), ...
> 
> It copies them to the log tree. However due to the need to detect implicit
> holes, it may release the path, in order to look at the previous leaf to
> detect an implicit hole, and then later it will search again in the tree
> for the first file extent item key, with the goal of locking again the
> leaf (which might have changed due to concurrent changes to other inodes).
> 
> However when it locks again the leaf containing the first key, the key
> corresponding to the extent at offset 72K may not be there anymore since
> there is an ordered extent for that range that is finishing (that is,
> somewhere in the middle of btrfs_finish_ordered_io()), and it just
> removed the file extent item but has not yet replaced it with a new file
> extent item, so the part of copy_items() that does hole detection will
> decide that there is a hole in the range starting from 68K to 76K - 1,
> and therefore insert a file extent item to represent that hole, having
> a key offset of 68K. After that we now have a log tree with 2 different
> extent items that have overlapping ranges:
> 

Well this kind of sucks.  I wonder if we should be locking the extent range
while we're doing this in order to keep this problem from happening.  A fix for
another day though

Reviewed-by: Josef Bacik <jo...@toxicpanda.com>

Thanks,

Josef

Reply via email to