On Mon, Jul 30, 2018 at 12:28 PM, Filipe Manana <fdman...@gmail.com> wrote: > On Mon, Jul 30, 2018 at 12:08 PM, Filipe Manana <fdman...@gmail.com> wrote: >> On Mon, Jul 30, 2018 at 11:21 AM, robbieko <robbi...@synology.com> wrote: >>> From: Robbie Ko <robbi...@synology.com> >>> >>> Commit e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting") >>> modified the nocow writeback mechanism, if you create a snapshot, >>> it will always switch to cow writeback. >>> >>> This will cause data loss when there is no space, because >>> when the space is full, the write will not reserve any space, only >>> check if it can be nocow write. >> >> This is a bit vague. >> You need to mention where space reservation does not happen (at the >> time of the write syscall) and why, >> and that the snapshot happens before flushing IO (running dealloc). >> Then when running dealloc we fallback >> to COW and fail. >> >> You also need to tell that although the write syscall did not return >> an error, the writeback will >> fail but a subsequent fsync on the file will return an error (ENOSPC) >> because the writeback set the error >> on the inode's mapping, so it's not completely a silent data loss, as >> for buffered writes there's no guarantee >> that if write syscall returns 0 the data will be persisted >> successfully (that can only be guaranteed if a subsequent >> fsync call returns 0). >> >>> >>> So fix this by first flush the nocow data, and then switch to the >>> cow write. > > I'm also not seeing how what you've done is better then we have now > using the root->will_be_snapshotted atomic, > which is essentially used the same way as the new atomic you are > adding, and forces the writeback code no nocow > writes as well.
So what you have done can be made much more simple by flushing delalloc before incrementing root->will_be_snapshotted instead of after incrementing it: https://friendpaste.com/2LY9eLAR9q0RoOtRK7VYmX Just checked the code and failure to allocate space during writeback after falling back to COW mode does indeed set AS_ENOSPC on the inode's mapping, which makes fsync return ENOSPC (through file_check_and_advance_wb_err() and filemap_check_wb_err()). Since fsync reports the error, I'm unsure to call it data loss but rather an optimization to avoid ENOSPC for nocow writes when running low on space. > >> >> >> This seems easy to reproduce using deterministic steps. >> Can you please write a test case for fstests? >> >> Also the subject "Btrfs: fix data lose with snapshot when nospace", >> besides the typo (lose -> loss), should be >> more clear like for example "Btrfs: fix data loss for nocow writes >> after snapshot when low on data space". > > Also I'm not even sure if I would call it data loss. > If there was no error returned from a subsequent fsync, I would > definitely call it data loss. > > So unless the fsync is not returning ENOSPC, I don't see anything that > needs to be fixed. > >> >> Thanks. >>> >>> Fixes: e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting") >>> Signed-off-by: Robbie Ko <robbi...@synology.com> >>> --- >>> fs/btrfs/ctree.h | 1 + >>> fs/btrfs/disk-io.c | 1 + >>> fs/btrfs/inode.c | 26 +++++--------------------- >>> fs/btrfs/ioctl.c | 6 ++++++ >>> 4 files changed, 13 insertions(+), 21 deletions(-) >>> >>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >>> index 118346a..663ce05 100644 >>> --- a/fs/btrfs/ctree.h >>> +++ b/fs/btrfs/ctree.h >>> @@ -1277,6 +1277,7 @@ struct btrfs_root { >>> int send_in_progress; >>> struct btrfs_subvolume_writers *subv_writers; >>> atomic_t will_be_snapshotted; >>> + atomic_t snapshot_force_cow; >>> >>> /* For qgroup metadata reserved space */ >>> spinlock_t qgroup_meta_rsv_lock; >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>> index 205092d..5573916 100644 >>> --- a/fs/btrfs/disk-io.c >>> +++ b/fs/btrfs/disk-io.c >>> @@ -1216,6 +1216,7 @@ static void __setup_root(struct btrfs_root *root, >>> struct btrfs_fs_info *fs_info, >>> atomic_set(&root->log_batch, 0); >>> refcount_set(&root->refs, 1); >>> atomic_set(&root->will_be_snapshotted, 0); >>> + atomic_set(&root->snapshot_force_cow, 0); >>> root->log_transid = 0; >>> root->log_transid_committed = -1; >>> root->last_log_commit = 0; >>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>> index eba61bc..263b852 100644 >>> --- a/fs/btrfs/inode.c >>> +++ b/fs/btrfs/inode.c >>> @@ -1275,7 +1275,7 @@ static noinline int run_delalloc_nocow(struct inode >>> *inode, >>> u64 disk_num_bytes; >>> u64 ram_bytes; >>> int extent_type; >>> - int ret, err; >>> + int ret; >>> int type; >>> int nocow; >>> int check_prev = 1; >>> @@ -1407,11 +1407,9 @@ static noinline int run_delalloc_nocow(struct inode >>> *inode, >>> * if there are pending snapshots for this root, >>> * we fall into common COW way. >>> */ >>> - if (!nolock) { >>> - err = >>> btrfs_start_write_no_snapshotting(root); >>> - if (!err) >>> - goto out_check; >>> - } >>> + if (!nolock && >>> + >>> unlikely(atomic_read(&root->snapshot_force_cow))) >>> + goto out_check; >>> /* >>> * force cow if csum exists in the range. >>> * this ensure that csum for a given extent are >>> @@ -1420,9 +1418,6 @@ static noinline int run_delalloc_nocow(struct inode >>> *inode, >>> ret = csum_exist_in_range(fs_info, disk_bytenr, >>> num_bytes); >>> if (ret) { >>> - if (!nolock) >>> - >>> btrfs_end_write_no_snapshotting(root); >>> - >>> /* >>> * ret could be -EIO if the above fails to >>> read >>> * metadata. >>> @@ -1435,11 +1430,8 @@ static noinline int run_delalloc_nocow(struct inode >>> *inode, >>> WARN_ON_ONCE(nolock); >>> goto out_check; >>> } >>> - if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) >>> { >>> - if (!nolock) >>> - >>> btrfs_end_write_no_snapshotting(root); >>> + if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) >>> goto out_check; >>> - } >>> nocow = 1; >>> } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) { >>> extent_end = found_key.offset + >>> @@ -1453,8 +1445,6 @@ static noinline int run_delalloc_nocow(struct inode >>> *inode, >>> out_check: >>> if (extent_end <= start) { >>> path->slots[0]++; >>> - if (!nolock && nocow) >>> - btrfs_end_write_no_snapshotting(root); >>> if (nocow) >>> btrfs_dec_nocow_writers(fs_info, >>> disk_bytenr); >>> goto next_slot; >>> @@ -1476,8 +1466,6 @@ static noinline int run_delalloc_nocow(struct inode >>> *inode, >>> end, page_started, nr_written, >>> 1, >>> NULL); >>> if (ret) { >>> - if (!nolock && nocow) >>> - >>> btrfs_end_write_no_snapshotting(root); >>> if (nocow) >>> btrfs_dec_nocow_writers(fs_info, >>> >>> disk_bytenr); >>> @@ -1497,8 +1485,6 @@ static noinline int run_delalloc_nocow(struct inode >>> *inode, >>> ram_bytes, BTRFS_COMPRESS_NONE, >>> BTRFS_ORDERED_PREALLOC); >>> if (IS_ERR(em)) { >>> - if (!nolock && nocow) >>> - >>> btrfs_end_write_no_snapshotting(root); >>> if (nocow) >>> btrfs_dec_nocow_writers(fs_info, >>> >>> disk_bytenr); >>> @@ -1537,8 +1523,6 @@ static noinline int run_delalloc_nocow(struct inode >>> *inode, >>> EXTENT_CLEAR_DATA_RESV, >>> PAGE_UNLOCK | >>> PAGE_SET_PRIVATE2); >>> >>> - if (!nolock && nocow) >>> - btrfs_end_write_no_snapshotting(root); >>> cur_offset = extent_end; >>> >>> /* >>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >>> index b077544..43674ef 100644 >>> --- a/fs/btrfs/ioctl.c >>> +++ b/fs/btrfs/ioctl.c >>> @@ -761,6 +761,7 @@ static int create_snapshot(struct btrfs_root *root, >>> struct inode *dir, >>> struct btrfs_pending_snapshot *pending_snapshot; >>> struct btrfs_trans_handle *trans; >>> int ret; >>> + bool snapshot_force_cow = false; >>> >>> if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state)) >>> return -EINVAL; >>> @@ -787,6 +788,9 @@ static int create_snapshot(struct btrfs_root *root, >>> struct inode *dir, >>> if (ret) >>> goto dec_and_free; >>> >>> + atomic_inc(&root->snapshot_force_cow); >>> + snapshot_force_cow = true; >>> + >>> btrfs_wait_ordered_extents(root, U64_MAX, 0, (u64)-1); >>> >>> btrfs_init_block_rsv(&pending_snapshot->block_rsv, >>> @@ -851,6 +855,8 @@ static int create_snapshot(struct btrfs_root *root, >>> struct inode *dir, >>> fail: >>> btrfs_subvolume_release_metadata(fs_info, >>> &pending_snapshot->block_rsv); >>> dec_and_free: >>> + if (snapshot_force_cow) >>> + atomic_dec(&root->snapshot_force_cow); >>> if (atomic_dec_and_test(&root->will_be_snapshotted)) >>> wake_up_var(&root->will_be_snapshotted); >>> free_pending: >>> -- >>> 1.9.1 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Filipe David Manana, >> >> “Whether you think you can, or you think you can't — you're right.” > > > > -- > Filipe David Manana, > > “Whether you think you can, or you think you can't — you're right.” -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html