On Wed, Jun 19, 2019 at 01:05:39PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana <fdman...@suse.com>
> 
> In order to avoid searches on a log tree when unlinking an inode, we check
> if the inode being unlinked was logged in the current transaction, as well
> as the inode of its parent directory. When any of the inodes are logged,
> we proceed to delete directory items and inode reference items from the
> log, to ensure that if a subsequent fsync of only the inode being unlinked
> or only of the parent directory when the other is not fsync'ed as well,
> does not result in the entry still existing after a power failure.
> 
> That check however is not reliable when one of the inodes involved (the
> one being unlinked or its parent directory's inode) is evicted, since the
> logged_trans field is transient, that is, it is not stored on disk, so it
> is lost when the inode is evicted and loaded into memory again (which is
> set to zero on load). As a consequence the checks currently being done by
> btrfs_del_dir_entries_in_log() and btrfs_del_inode_ref_in_log() always
> return true if the inode was evicted before, regardless of the inode
> having been logged or not before (and in the current transaction), this
> results in the dentry being unlinked still existing after a log replay
> if after the unlink operation only one of the inodes involved is fsync'ed.
> 
> Example:
> 
>   $ mkfs.btrfs -f /dev/sdb
>   $ mount /dev/sdb /mnt
> 
>   $ mkdir /mnt/dir
>   $ touch /mnt/dir/foo
>   $ xfs_io -c fsync /mnt/dir/foo
> 
>   # Keep an open file descriptor on our directory while we evict inodes.
>   # We just want to evict the file's inode, the directory's inode must not
>   # be evicted.
>   $ ( cd /mnt/dir; while true; do :; done ) &
>   $ pid=$!
> 
>   # Wait a bit to give time to background process to chdir to our test
>   # directory.
>   $ sleep 0.5
> 
>   # Trigger eviction of the file's inode.
>   $ echo 2 > /proc/sys/vm/drop_caches
> 
>   # Unlink our file and fsync the parent directory. After a power failure
>   # we don't expect to see the file anymore, since we fsync'ed the parent
>   # directory.
>   $ rm -f $SCRATCH_MNT/dir/foo
>   $ xfs_io -c fsync /mnt/dir
> 
>   <power failure>
> 
>   $ mount /dev/sdb /mnt
>   $ ls /mnt/dir
>   foo
>   $
>    --> file still there, unlink not persisted despite explicit fsync on dir
> 
> Fix this by checking if the inode has the full_sync bit set in its runtime
> flags as well, since that bit is set everytime an inode is loaded from
> disk, or for other less common cases such as after a shrinking truncate
> or failure to allocate extent maps for holes, and gets cleared after the
> first fsync. Also consider the inode as possibly logged only if it was
> last modified in the current transaction (besides having the full_fsync
> flag set).
> 
> Fixes: 3a5f1d458ad161 ("Btrfs: Optimize btree walking while logging inodes")
> Signed-off-by: Filipe Manana <fdman...@suse.com>

Added to misc-next, thanks.

Reply via email to