On Mon, May 06, 2019 at 04:44:02PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana <fdman...@suse.com> > > When we do a full fsync (the bit BTRFS_INODE_NEEDS_FULL_SYNC is set in the > inode) that happens to be ranged, which happens during a msync() or writes > for files opened with O_SYNC for example, we can end up with a corrupt log, > due to different file extent items representing ranges that overlap with > each other, or hit some assertion failures. > > When doing a ranged fsync we only flush delalloc and wait for ordered > exents within that range. If while we are logging items from our inode > ordered extents for adjacent ranges complete, we end up in a race that can > make us insert the file extent items that overlap with others we logged > previously and the assertion failures. > > For example, if tree-log.c:copy_items() receives a leaf that has the > following file extents items, all with a length of 4K and therefore there > is an implicit hole in the range 68K to 72K - 1: > > (257 EXTENT_ITEM 64K), (257 EXTENT_ITEM 72K), (257 EXTENT_ITEM 76K), ... > > It copies them to the log tree. However due to the need to detect implicit > holes, it may release the path, in order to look at the previous leaf to > detect an implicit hole, and then later it will search again in the tree > for the first file extent item key, with the goal of locking again the > leaf (which might have changed due to concurrent changes to other inodes). > > However when it locks again the leaf containing the first key, the key > corresponding to the extent at offset 72K may not be there anymore since > there is an ordered extent for that range that is finishing (that is, > somewhere in the middle of btrfs_finish_ordered_io()), and it just > removed the file extent item but has not yet replaced it with a new file > extent item, so the part of copy_items() that does hole detection will > decide that there is a hole in the range starting from 68K to 76K - 1, > and therefore insert a file extent item to represent that hole, having > a key offset of 68K. After that we now have a log tree with 2 different > extent items that have overlapping ranges: >
Well this kind of sucks. I wonder if we should be locking the extent range while we're doing this in order to keep this problem from happening. A fix for another day though Reviewed-by: Josef Bacik <jo...@toxicpanda.com> Thanks, Josef