On 22.04.19 г. 18:44 ч., fdman...@kernel.org wrote:
> From: Filipe Manana <fdman...@suse.com>
> 
> Send always operates on read-only trees and always expected that while it
> is in progress, nothing changes in those trees. Due to that expectation
> and the fact that send is a read-only operation, it operates on commit
> roots and does not hold transaction handles. However relocation can COW
> nodes and leafs from read-only trees, which can cause unexpected failures
> and crashes (hitting BUG_ONs). while send using a node/leaf, it gets
> COWed, the transaction used to COW it is committed, a new transaction
> starts, the extent previously used for that node/leaf gets allocated,
> possibly for another tree, and the respective extent buffer' content
> changes while send is still using it. When this happens send normally
> fails with EIO being returned to user space and messages like the
> following are found in dmesg/syslog:
> 
>   [ 3408.699121] BTRFS error (device sdc): parent transid verify failed on 
> 58703872 wanted 250 found 253
>   [ 3441.523123] BTRFS error (device sdc): did not find backref in send_root. 
> inode=63211, offset=0, disk_byte=5222825984 found extent=5222825984
> 
> Other times, less often, we hit a BUG_ON() because an extent buffer that
> send is using used to be a node, and while send is still using it, it
> got COWed and got reused as a leaf while send is still using, producing
> the following trace:
> 
>  [ 3478.466280] ------------[ cut here ]------------
>  [ 3478.466282] kernel BUG at fs/btrfs/ctree.c:1806!
>  [ 3478.466965] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
>  [ 3478.467635] CPU: 0 PID: 2165 Comm: btrfs Not tainted 5.0.0-btrfs-next-46 
> #1
>  [ 3478.468311] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
>  [ 3478.469681] RIP: 0010:read_node_slot+0x122/0x130 [btrfs]
>  (...)
>  [ 3478.471758] RSP: 0018:ffffa437826bfaa0 EFLAGS: 00010246
>  [ 3478.472457] RAX: ffff961416ed7000 RBX: 000000000000003d RCX: 
> 0000000000000002
>  [ 3478.473151] RDX: 000000000000003d RSI: ffff96141e387408 RDI: 
> ffff961599b30000
>  [ 3478.473837] RBP: ffffa437826bfb8e R08: 0000000000000001 R09: 
> ffffa437826bfb8e
>  [ 3478.474515] R10: ffffa437826bfa70 R11: 0000000000000000 R12: 
> ffff9614385c8708
>  [ 3478.475186] R13: 0000000000000000 R14: 0000000000000000 R15: 
> 0000000000000000
>  [ 3478.475840] FS:  00007f8e0e9cc8c0(0000) GS:ffff9615b6a00000(0000) 
> knlGS:0000000000000000
>  [ 3478.476489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [ 3478.477127] CR2: 00007f98b67a056e CR3: 0000000005df6005 CR4: 
> 00000000003606f0
>  [ 3478.477762] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
>  [ 3478.478385] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000400
>  [ 3478.479003] Call Trace:
>  [ 3478.479600]  ? do_raw_spin_unlock+0x49/0xc0
>  [ 3478.480202]  tree_advance+0x173/0x1d0 [btrfs]
>  [ 3478.480810]  btrfs_compare_trees+0x30c/0x690 [btrfs]
>  [ 3478.481388]  ? process_extent+0x1280/0x1280 [btrfs]
>  [ 3478.481954]  btrfs_ioctl_send+0x1037/0x1270 [btrfs]
>  [ 3478.482510]  _btrfs_ioctl_send+0x80/0x110 [btrfs]
>  [ 3478.483062]  btrfs_ioctl+0x13fe/0x3120 [btrfs]
>  [ 3478.483581]  ? rq_clock_task+0x2e/0x60
>  [ 3478.484086]  ? wake_up_new_task+0x1f3/0x370
>  [ 3478.484582]  ? do_vfs_ioctl+0xa2/0x6f0
>  [ 3478.485075]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
>  [ 3478.485552]  do_vfs_ioctl+0xa2/0x6f0
>  [ 3478.486016]  ? __fget+0x113/0x200
>  [ 3478.486467]  ksys_ioctl+0x70/0x80
>  [ 3478.486911]  __x64_sys_ioctl+0x16/0x20
>  [ 3478.487337]  do_syscall_64+0x60/0x1b0
>  [ 3478.487751]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>  [ 3478.488159] RIP: 0033:0x7f8e0d7d4dd7
>  (...)
>  [ 3478.489349] RSP: 002b:00007ffcf6fb4908 EFLAGS: 00000202 ORIG_RAX: 
> 0000000000000010
>  [ 3478.489742] RAX: ffffffffffffffda RBX: 0000000000000105 RCX: 
> 00007f8e0d7d4dd7
>  [ 3478.490142] RDX: 00007ffcf6fb4990 RSI: 0000000040489426 RDI: 
> 0000000000000005
>  [ 3478.490548] RBP: 0000000000000005 R08: 00007f8e0d6f3700 R09: 
> 00007f8e0d6f3700
>  [ 3478.490953] R10: 00007f8e0d6f39d0 R11: 0000000000000202 R12: 
> 0000000000000005
>  [ 3478.491343] R13: 00005624e0780020 R14: 0000000000000000 R15: 
> 0000000000000001
>  (...)
>  [ 3478.493352] ---[ end trace d5f537302be4f8c8 ]---
> 
> Another possibility, much less likely to happen, is that send will not
> fail but the contents of the stream it produces may not be correct.
> 
> To avoid this, do not allow send and relocation (balance) to run in
> parallel. In the long term the goal is to allow for both to be able to
> run concurrently without any problems, but that will take a significant
> effort in development and testing.
> 
> Signed-off-by: Filipe Manana <fdman...@suse.com>
> ---
>  fs/btrfs/ctree.h   |  7 +++++++
>  fs/btrfs/disk-io.c |  2 ++
>  fs/btrfs/send.c    | 14 ++++++++++++++
>  fs/btrfs/volumes.c |  8 ++++++++
>  4 files changed, 31 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 753ff68a8e8f..e6284e353dee 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -785,6 +785,7 @@ enum {
>       /*
>        * Indicate that balance has been set up from the ioctl and is in the
>        * main phase. The fs_info::balance_ctl is initialized.
> +      * Set and cleared while holding fs_info::balance_mutex.
>        */
>       BTRFS_FS_BALANCE_RUNNING,
>  
> @@ -1167,6 +1168,12 @@ struct btrfs_fs_info {
>       spinlock_t swapfile_pins_lock;
>       struct rb_root swapfile_pins;
>  
> +     /*
> +      * Number of send operations in progress.
> +      * Updated while holding fs_info::balance_mutex.
> +      */
> +     int send_in_progress;
> +

Rather than introducing yet another variable and more state why not
piggy back on BTRFS_FS_EXCL_OP, this will prevent send running while
balance is running as well as while devices are being removed/added.

>  #ifdef CONFIG_BTRFS_FS_REF_VERIFY
>       spinlock_t ref_verify_lock;
>       struct rb_root block_tree;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 5216e7b3f9ad..4cde9a9654fe 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2785,6 +2785,8 @@ int open_ctree(struct super_block *sb,
>       spin_lock_init(&fs_info->swapfile_pins_lock);
>       fs_info->swapfile_pins = RB_ROOT;
>  
> +     fs_info->send_in_progress = 0;
> +
>       ret = btrfs_alloc_stripe_hash_table(fs_info);
>       if (ret) {
>               err = ret;
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index f91f474f14f6..5c32ac661519 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -6869,9 +6869,23 @@ long btrfs_ioctl_send(struct file *mnt_file, struct 
> btrfs_ioctl_send_args *arg)
>       if (ret)
>               goto out;
>  
> +     mutex_lock(&fs_info->balance_mutex);
> +     if (test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)) {
> +             mutex_unlock(&fs_info->balance_mutex);
> +             btrfs_warn_rl(fs_info,
> +           "Can not run send because a balance operation is in progress");
> +             ret = -EAGAIN;
> +             goto out;
> +     }
> +     fs_info->send_in_progress++;
> +     mutex_unlock(&fs_info->balance_mutex);
> +
>       current->journal_info = BTRFS_SEND_TRANS_STUB;
>       ret = send_subvol(sctx);
>       current->journal_info = NULL;
> +     mutex_lock(&fs_info->balance_mutex);
> +     fs_info->send_in_progress--;
> +     mutex_unlock(&fs_info->balance_mutex);
>       if (ret < 0)
>               goto out;
>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index db934ceae9c1..8145b62e3912 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4203,6 +4203,14 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
>                          get_raid_name(meta_index), 
> get_raid_name(data_index));
>       }
>  
> +     if (fs_info->send_in_progress) {
> +             btrfs_warn_rl(fs_info,
> +"Can not run balance while send operations are in progress (%d in progress)",
> +                           fs_info->send_in_progress);
> +             ret = -EAGAIN;
> +             goto out;
> +     }
> +
>       ret = insert_balance_item(fs_info, bctl);
>       if (ret && ret != -EEXIST)
>               goto out;
> 

Reply via email to