The file hole detection logic during a file fsync wasn't correct,
because it didn't look back (in a previous leaf) for the last file
extent item that can be in a leaf to the left of our leaf and that
has a generation lower than the current transaction id. This made it
assume that a hole exists when it really doesn't exist in the file.

Such false positive hole detection happens in the following scenario:

* We have a file that has many file extent items, covering 3 or more
  btree leafs (the first leaf must contain non file extent items too).

* Two ranges of the file are modified, with their extent items being
  located at 2 different leafs and those leafs aren't consecutive.

* When processing the second leaf, we weren't checking if some file
  extent item exists that is located in some leaf that is between
  our 2 leafs, and therefore assumed the range defined between the
  last file extent item in first leaf and the first file extent item
  in the second leaf matched a hole.

Fortunately this didn't result in overriding the log with wrong data,
instead it made the last loop in copy_items() attempt to insert a
duplicated key (for a hole file extent item), which makes the file
fsync code return with -EEXIST to file.c:btrfs_sync_file() which in
turn ends up doing a full transaction commit.

I could trigger this issue with the following test for xfstests (which
never fails, either without or with this patch). The last fsync call
results in a full transaction commit, due to the -EEXIST error mentioned
above. I could also observe this behaviour happening frequently when
running xfstests/generic/075 in a loop.

Test:

    _cleanup()
    {
        _cleanup_flakey
        rm -fr $tmp
    }

    # get standard environment, filters and checks
    . ./common/rc
    . ./common/filter
    . ./common/dmflakey

    # real QA test starts here
    _supported_fs btrfs
    _supported_os Linux
    _require_scratch
    _require_dm_flakey
    _need_to_be_root

    rm -f $seqres.full

    # Create a file with many file extent items, each representing a 4Kb extent.
    # These items span 3 btree leaves, of 16Kb each (default mkfs.btrfs leaf 
size
    # as of btrfs-progs 3.12).
    _scratch_mkfs -l 16384 >/dev/null 2>&1
    _init_flakey
    SAVE_MOUNT_OPTIONS="$MOUNT_OPTIONS"
    MOUNT_OPTIONS="$MOUNT_OPTIONS -o commit=999"
    _mount_flakey

    # First fsync, inode has BTRFS_INODE_NEEDS_FULL_SYNC flag set.
    $XFS_IO_PROG -f -c "pwrite -S 0x01 -b 4096 0 4096" -c "fsync" \
            $SCRATCH_MNT/foo | _filter_xfs_io

    # For any of the following fsync calls, inode doesn't have the flag
    # BTRFS_INODE_NEEDS_FULL_SYNC set.
    for ((i = 1; i <= 500; i++)); do
        OFFSET=$((4096 * i))
        LEN=4096
        $XFS_IO_PROG -c "pwrite -S 0x01 $OFFSET $LEN" -c "fsync" \
                $SCRATCH_MNT/foo | _filter_xfs_io
    done

    # Commit transaction and bump next transaction's id (to 7).
    sync

    # Truncate will set the BTRFS_INODE_NEEDS_FULL_SYNC flag in the btrfs's
    # inode runtime flags.
    $XFS_IO_PROG -c "truncate 2048000" $SCRATCH_MNT/foo

    # Commit transaction and bump next transaction's id (to 8).
    sync

    # Touch 1 extent item from the first leaf and 1 from the last leaf. The leaf
    # in the middle, containing only file extent items, isn't touched. So the
    # next fsync, when calling btrfs_search_forward(), won't visit that middle
    # leaf. First and 3rd leaf have generation 6, while middle one has 
generation 8.
    $XFS_IO_PROG \
        -c "pwrite -S 0xee -b 4096 0 4096" \
        -c "pwrite -S 0xff -b 4096 2043904 4096" \
        -c "fsync" \
        $SCRATCH_MNT/foo | _filter_xfs_io

    _load_flakey_table $FLAKEY_DROP_WRITES
    md5sum $SCRATCH_MNT/foo | _filter_scratch
    _unmount_flakey

    _load_flakey_table $FLAKEY_ALLOW_WRITES
    # During mount, we'll replay the log created by the fsync above, and the 
file's
    # md5 digest should be the same we got before the unmount.
    _mount_flakey
    md5sum $SCRATCH_MNT/foo | _filter_scratch
    _unmount_flakey
    MOUNT_OPTIONS="$SAVE_MOUNT_OPTIONS"

    status=0
    exit

Signed-off-by: Filipe Manana <fdman...@suse.com>
---
 fs/btrfs/tree-log.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index df332dd..5a917a6 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3296,7 +3296,7 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
        struct list_head ordered_sums;
        int skip_csum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
        bool has_extents = false;
-       bool need_find_last_extent = (*last_extent == 0);
+       bool need_find_last_extent = true;
        bool done = false;
 
        INIT_LIST_HEAD(&ordered_sums);
@@ -3350,8 +3350,7 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
                 */
                if (ins_keys[i].type == BTRFS_EXTENT_DATA_KEY) {
                        has_extents = true;
-                       if (need_find_last_extent &&
-                           first_key.objectid == (u64)-1)
+                       if (first_key.objectid == (u64)-1)
                                first_key = ins_keys[i];
                } else {
                        need_find_last_extent = false;
@@ -3425,6 +3424,16 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
        if (!has_extents)
                return ret;
 
+       if (need_find_last_extent && *last_extent == first_key.offset) {
+               /*
+                * We don't have any leafs between our current one and the one
+                * we processed before that can have file extent items for our
+                * inode (and have a generation number smaller than our current
+                * transaction id).
+                */
+               need_find_last_extent = false;
+       }
+
        /*
         * Because we use btrfs_search_forward we could skip leaves that were
         * not modified and then assume *last_extent is valid when it really
@@ -3535,7 +3544,7 @@ fill_holes:
                                               0, 0);
                if (ret)
                        break;
-               *last_extent = offset + len;
+               *last_extent = extent_end;
        }
        /*
         * Need to let the callers know we dropped the path so they should
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to