On Tue, Apr 23, 2019 at 12:43 PM Nikolay Borisov <nbori...@suse.com> wrote: > > BTRFS sports a mechanism to provide exclusion when a snapshot is about > to be created. This is implemented via btrfs_start_write_no_snapshotting > et al. Currently the implementation of that mechanism is some perverse > amalgamation of a percpu variable, an explicit waitqueue, an atomic_t > variable and an implicit wait bit on said atomic_t via wait_var_event > family of API. And for good measure there is a memory barrier thrown in > the mix... > > Astute reader should have concluded by now that it's bordering on > impossible to prove whether this scheme works. What's worse - all of > this is required to achieve something really simple - ensure certain > operations cannot run during snapshot creation. Let's simplify this by > relying on a single atomic_t used as a boolean flag.
Nop, can't work as a boolean, see below. > This commit changes > only the implementation and not the semantics of the existing mechanism. > > Now, if the atomic is 1 (snapshot is in progress) callers of > btrfs_start_write_no_snapshotting will get a ret val of 0 that should be > handled accordingly. > > btrfs_wait_for_snapshot_creation OTOH will block until snapshotting is > in progress and return when current snapshot in progress is finished and > will acquire the right to create a snapshot. > > Signed-off-by: Nikolay Borisov <nbori...@suse.com> > --- > fs/btrfs/extent-tree.c | 20 +++++--------------- > fs/btrfs/ioctl.c | 9 ++------- > 2 files changed, 7 insertions(+), 22 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 8f2b7b29c3fd..d9e2e35700fd 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -11333,25 +11333,15 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, > struct fstrim_range *range) > */ > void btrfs_end_write_no_snapshotting(struct btrfs_root *root) > { > - percpu_counter_dec(&root->subv_writers->counter); > - cond_wake_up(&root->subv_writers->wait); > + ASSERT(atomic_read(&root->will_be_snapshotted) == 1); > + if (atomic_dec_and_test(&root->will_be_snapshotted)) > + wake_up_var(&root->will_be_snapshotted); > } > > int btrfs_start_write_no_snapshotting(struct btrfs_root *root) > { > - if (atomic_read(&root->will_be_snapshotted)) > - return 0; > - > - percpu_counter_inc(&root->subv_writers->counter); > - /* > - * Make sure counter is updated before we check for snapshot creation. > - */ > - smp_mb(); > - if (atomic_read(&root->will_be_snapshotted)) { > - btrfs_end_write_no_snapshotting(root); > - return 0; > - } > - return 1; > + ASSERT(atomic_read(&root->will_be_snapshotted) >= 0); > + return atomic_add_unless(&root->will_be_snapshotted, 1, 1); > } So if two writes call btrfs_start_write_no_snapshotting(), we end up with root->will_be_snapshotted == 1. One task calls btrfs_end_write_no_snapshotting(), it decrements it to 1 - we wake up the snapshot creation task while there's still one nodatacow writer - this is incorrect. Now the second task calls btrfs_end_write_no_snapshotting(), sees root->will_be_snapshotted == 0, assertion failure. > > void btrfs_wait_for_snapshot_creation(struct btrfs_root *root) > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 8774d4be7c97..f9f66c8a5dad 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -794,11 +794,7 @@ static int create_snapshot(struct btrfs_root *root, > struct inode *dir, > * possible. This is to avoid later writeback (running dealloc) to > * fallback to COW mode and unexpectedly fail with ENOSPC. > */ > - atomic_inc(&root->will_be_snapshotted); > - smp_mb__after_atomic(); > - /* wait for no snapshot writes */ > - wait_event(root->subv_writers->wait, > - percpu_counter_sum(&root->subv_writers->counter) == 0); > + btrfs_wait_for_snapshot_creation(root); This naming is also confusing now. The task that creates a snapshot is calling btrfs_wait_for_snapshot_creation(), waiting for itself? > > ret = btrfs_start_delalloc_snapshot(root); > if (ret) > @@ -878,8 +874,7 @@ static int create_snapshot(struct btrfs_root *root, > struct inode *dir, > dec_and_free: > if (snapshot_force_cow) > atomic_dec(&root->snapshot_force_cow); > - if (atomic_dec_and_test(&root->will_be_snapshotted)) > - wake_up_var(&root->will_be_snapshotted); > + btrfs_end_write_no_snapshotting(root); Also confusing. We are not ending a write operation, we are ending snapshot creation. Thanks. > free_pending: > kfree(pending_snapshot->root_item); > btrfs_free_path(pending_snapshot->path); > -- > 2.17.1 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”