On Thu, Apr 05, 2018 at 10:55:12PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana <fdman...@suse.com>
> 
> Currently if we allocate extents beyond an inode's i_size (through the
> fallocate system call) and then fsync the file, we log the extents but
> after a power failure we replay them and then immediately drop them.
> This behaviour happens since about 2009, commit c71bf099abdd ("Btrfs:
> Avoid orphan inodes cleanup while replaying log"), because it marks
> the inode as an orphan instead of dropping any extents beyond i_size
> before replaying logged extents, so after the log replay, and while
> the mount operation is still ongoing, we find the inode marked as an
> orphan and then perform a truncation (drop extents beyond the inode's
> i_size). Because the processing of orphan inodes is still done
> right after replaying the log and before the mount operation finishes,
> the intention of that commit does not make any sense (at least as
> of today). However reverting that behaviour is not enough, because
> we can not simply discard all extents beyond i_size and then replay
> logged extents, because we risk dropping extents beyond i_size created
> in past transactions, for example:
> 
>   add prealloc extent beyond i_size
>   fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode
>   transaction commit
>   add another prealloc extent beyond i_size
>   fsync - triggers the fast fsync path
>   power failure
> 
> In that scenario, we would drop the first extent and then replay the
> second one. To fix this just make sure that all prealloc extents
> beyond i_size are logged, and if we find too many (which is far from
> a common case), fallback to a full transaction commit (like we do when
> logging regular extents in the fast fsync path).
> 
> Trivial reproducer:
> 
>  $ mkfs.btrfs -f /dev/sdb
>  $ mount /dev/sdb /mnt
>  $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo
>  $ sync
>  $ xfs_io -c "falloc -k 256K 1M" /mnt/foo
>  $ xfs_io -c "fsync" /mnt/foo
>  <power failure>
> 
>  # mount to replay log
>  $ mount /dev/sdb /mnt
>  # at this point the file only has one extent, at offset 0, size 256K
> 
> A test case for fstests follows soon, covering multiple scenarios that
> involve adding prealloc extents with previous shrinking truncates and
> without such truncates.
> 
> Fixes: c71bf099abdd ("Btrfs: Avoid orphan inodes cleanup while replaying log")
> Signed-off-by: Filipe Manana <fdman...@suse.com>

Added to next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to