On Tue, Oct 30, 2018 at 05:14:42PM -0700, Omar Sandoval wrote:
>From: Omar Sandoval <osan...@fb.com>
>
>There's a race between close_ctree() and cleaner_kthread().
>close_ctree() sets btrfs_fs_closing(), and the cleaner stops when it
>sees it set, but this is racy; the cleaner might have already checked
>the bit and could be cleaning stuff. In particular, if it deletes unused
>block groups, it will create delayed iputs for the free space cache
>inodes. As of "btrfs: don't run delayed_iputs in commit", we're no
>longer running delayed iputs after a commit. Therefore, if the cleaner
>creates more delayed iputs after delayed iputs are run in
>btrfs_commit_super(), we will leak inodes on unmount and get a busy
>inode crash from the VFS.
>
>Fix it by parking the cleaner before we actually close anything. Then,
>any remaining delayed iputs will always be handled in
>btrfs_commit_super(). This also ensures that the commit in close_ctree()
>is really the last commit, so we can get rid of the commit in
>cleaner_kthread().
>
>Fixes: 30928e9baac2 ("btrfs: don't run delayed_iputs in commit")
>Signed-off-by: Omar Sandoval <osan...@fb.com>
>---
>We found this with a stress test that our containers team runs. I'm
>wondering if this same race could have caused any other issues other
>than this new iput thing, but I couldn't identify any.

I noticed an inode leak issue in generic/475, but whether dropping commit
30928e9baac2 ("btrfs: don't run delayed_iputs in commit") or applying
this patch, the issue still exists.

I have attached the dmesg.

>
> fs/btrfs/disk-io.c | 40 +++++++---------------------------------
> 1 file changed, 7 insertions(+), 33 deletions(-)
>
>diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>index b0ab41da91d1..7c17284ae3c2 100644
>--- a/fs/btrfs/disk-io.c
>+++ b/fs/btrfs/disk-io.c
>@@ -1664,9 +1664,8 @@ static int cleaner_kthread(void *arg)
>       struct btrfs_root *root = arg;
>       struct btrfs_fs_info *fs_info = root->fs_info;
>       int again;
>-      struct btrfs_trans_handle *trans;
> 
>-      do {
>+      while (1) {
>               again = 0;
> 
>               /* Make the cleaner go to sleep early. */
>@@ -1715,42 +1714,16 @@ static int cleaner_kthread(void *arg)
>                */
>               btrfs_delete_unused_bgs(fs_info);
> sleep:
>+              if (kthread_should_park())
>+                      kthread_parkme();
>+              if (kthread_should_stop())
>+                      return 0;
>               if (!again) {
>                       set_current_state(TASK_INTERRUPTIBLE);
>-                      if (!kthread_should_stop())
>-                              schedule();
>+                      schedule();
>                       __set_current_state(TASK_RUNNING);
>               }
>-      } while (!kthread_should_stop());
>-
>-      /*
>-       * Transaction kthread is stopped before us and wakes us up.
>-       * However we might have started a new transaction and COWed some
>-       * tree blocks when deleting unused block groups for example. So
>-       * make sure we commit the transaction we started to have a clean
>-       * shutdown when evicting the btree inode - if it has dirty pages
>-       * when we do the final iput() on it, eviction will trigger a
>-       * writeback for it which will fail with null pointer dereferences
>-       * since work queues and other resources were already released and
>-       * destroyed by the time the iput/eviction/writeback is made.
>-       */
>-      trans = btrfs_attach_transaction(root);
>-      if (IS_ERR(trans)) {
>-              if (PTR_ERR(trans) != -ENOENT)
>-                      btrfs_err(fs_info,
>-                                "cleaner transaction attach returned %ld",
>-                                PTR_ERR(trans));
>-      } else {
>-              int ret;
>-
>-              ret = btrfs_commit_transaction(trans);
>-              if (ret)
>-                      btrfs_err(fs_info,
>-                                "cleaner open transaction commit returned %d",
>-                                ret);
>       }
>-
>-      return 0;
> }
> 
> static int transaction_kthread(void *arg)
>@@ -3931,6 +3904,7 @@ void close_ctree(struct btrfs_fs_info *fs_info)
>       int ret;
> 
>       set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
>+      kthread_park(fs_info->cleaner_kthread);

Since we are not going to call kthread_unpark, I am not sure why
kthread_park is used instead of kthread_stop here. It looks like there
is no significant difference between stopping instantly and delayed stop.

-- 
Thanks,
Lu

> 
>       /* wait for the qgroup rescan worker to stop */
>       btrfs_qgroup_wait_for_completion(fs_info, false);
>-- 
>2.19.1
>
>
>
>


[  366.955193] run fstests generic/475 at 2018-10-31 15:06:43
[  367.495791] BTRFS: device fsid 812f883c-40b2-4456-9769-b94ddf1cb07e devid 1 
transid 5 /dev/nvme0n1p2
[  367.624469] BTRFS info (device dm-3): disk space caching is enabled
[  367.627305] BTRFS info (device dm-3): has skinny extents
[  367.628771] BTRFS info (device dm-3): flagging fs with big metadata feature
[  367.637585] BTRFS info (device dm-3): enabling ssd optimizations
[  367.640079] BTRFS info (device dm-3): checking UUID tree
[  367.666290] btrfs_dev_stat_print_on_error: 86 callbacks suppressed
[  367.666293] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 
1, rd 0, flush 0, corrupt 0, gen 0
[  367.670665] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 
2, rd 0, flush 0, corrupt 0, gen 0
[  367.673332] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 
3, rd 0, flush 0, corrupt 0, gen 0
[  367.674024] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 
4, rd 0, flush 0, corrupt 0, gen 0
[  367.676072] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 
5, rd 0, flush 0, corrupt 0, gen 0
[  367.677316] Buffer I/O error on dev dm-3, logical block 4718576, async page 
read
[  367.682713] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 
6, rd 0, flush 0, corrupt 0, gen 0
[  367.685150] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 
7, rd 0, flush 0, corrupt 0, gen 0
[  367.690053] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 
8, rd 0, flush 0, corrupt 0, gen 0
[  367.690298] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 
9, rd 0, flush 0, corrupt 0, gen 0
[  367.694166] BTRFS error (device dm-3): bdev /dev/mapper/error-test errs: wr 
10, rd 0, flush 0, corrupt 0, gen 0
[  367.700484] BTRFS: error (device dm-3) in btrfs_commit_transaction:2239: 
errno=-5 IO failure (Error while writing out transaction)
[  367.707185] BTRFS info (device dm-3): forced readonly
[  367.708627] BTRFS warning (device dm-3): Skipping commit of aborted 
transaction.
[  367.710526] BTRFS: error (device dm-3) in cleanup_transaction:1849: errno=-5 
IO failure
[  367.712330] BTRFS info (device dm-3): delayed_refs has NO entry
[  367.975623] BTRFS error (device dm-3): cleaner transaction attach returned 
-30
[  367.977998] BTRFS info (device dm-3): at unmount delalloc count 8192
[  367.980424] WARNING: CPU: 2 PID: 14918 at fs/btrfs/disk-io.c:3846 
btrfs_free_fs_root+0x94/0xa0 [btrfs]
[  367.983228] Modules linked in: btrfs xor zstd_decompress zstd_compress 
xxhash raid6_pq efivarfs xfs nvme nvme_core
[  367.986171] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Not tainted 
4.19.0-rc8+ #18
[  367.988154] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
[  367.990175] RIP: 0010:btrfs_free_fs_root+0x94/0xa0 [btrfs]
[  367.997036] RSP: 0018:ffffc9000744bd60 EFLAGS: 00010282
[  367.998595] RAX: ffff880354eb6270 RBX: ffff8803661c4548 RCX: 0000000000000000
[  368.000665] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
[  368.002527] RBP: ffff88033cd70000 R08: 0000000000000000 R09: 0000000000000001
[  368.004426] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000744bd88
[  368.006248] R13: ffff88033cd700c0 R14: ffffc9000744bd80 R15: 0000000000000000
[  368.008082] FS:  00007f206293e780(0000) GS:ffff880373800000(0000) 
knlGS:0000000000000000
[  368.010068] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.011654] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[  368.013440] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.015232] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  368.016979] Call Trace:
[  368.017947]  btrfs_free_fs_roots+0xcf/0x160 [btrfs]
[  368.019317]  ? wait_for_completion+0x126/0x150
[  368.020617]  close_ctree+0x149/0x330 [btrfs]
[  368.021822]  generic_shutdown_super+0x6c/0x100
[  368.023043]  kill_anon_super+0x14/0x30
[  368.024203]  btrfs_kill_super+0x13/0x100 [btrfs]
[  368.025471]  deactivate_locked_super+0x2f/0x70
[  368.027647]  cleanup_mnt+0x3b/0x70
[  368.028712]  task_work_run+0x9e/0xd0
[  368.029875]  do_syscall_64+0x470/0x600
[  368.031009]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  368.032350]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  368.033620] RIP: 0033:0x7f2062ec1b5b
[  368.039127] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 
00000000000000a6
[  368.040906] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[  368.042564] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[  368.044250] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[  368.045862] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[  368.047503] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[  368.049089] irq event stamp: 0
[  368.049979] hardirqs last  enabled at (0): [<0000000000000000>]           
(null)
[  368.051674] hardirqs last disabled at (0): [<ffffffff810cd810>] 
copy_process.part.55+0x3b0/0x1f00
[  368.053587] softirqs last  enabled at (0): [<ffffffff810cd810>] 
copy_process.part.55+0x3b0/0x1f00
[  368.055550] softirqs last disabled at (0): [<0000000000000000>]           
(null)
[  368.058098] ---[ end trace 08fc3163d80dad81 ]---
[  368.061134] WARNING: CPU: 2 PID: 14918 at fs/btrfs/extent-tree.c:123 
btrfs_put_block_group+0x51/0x60 [btrfs]
[  368.063637] Modules linked in: btrfs xor zstd_decompress zstd_compress 
xxhash raid6_pq efivarfs xfs nvme nvme_core
[  368.066230] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Tainted: G        W 
        4.19.0-rc8+ #18
[  368.068615] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
[  368.070406] RIP: 0010:btrfs_put_block_group+0x51/0x60 [btrfs]
[  368.076431] RSP: 0018:ffffc9000744bda0 EFLAGS: 00010206
[  368.077765] RAX: 0000000000000001 RBX: ffff88033cd70000 RCX: ffff8803414f76c0
[  368.079477] RDX: ffff8803414f76c0 RSI: 0000000000000001 RDI: ffff8803414f74e8
[  368.081141] RBP: ffff8803414f74e8 R08: 0000000000000000 R09: 0000000000000000
[  368.082830] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88033cd70118
[  368.084565] R13: ffff88033cd70168 R14: ffff8803414f7680 R15: dead000000000100
[  368.087151] FS:  00007f206293e780(0000) GS:ffff880373800000(0000) 
knlGS:0000000000000000
[  368.089087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.090616] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[  368.092390] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.094103] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  368.095812] Call Trace:
[  368.096700]  btrfs_free_block_groups+0x1da/0x3b0 [btrfs]
[  368.098065]  close_ctree+0x171/0x330 [btrfs]
[  368.099254]  generic_shutdown_super+0x6c/0x100
[  368.100486]  kill_anon_super+0x14/0x30
[  368.101554]  btrfs_kill_super+0x13/0x100 [btrfs]
[  368.102758]  deactivate_locked_super+0x2f/0x70
[  368.103986]  cleanup_mnt+0x3b/0x70
[  368.105002]  task_work_run+0x9e/0xd0
[  368.106038]  do_syscall_64+0x470/0x600
[  368.107123]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  368.108397]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  368.109677] RIP: 0033:0x7f2062ec1b5b
[  368.115270] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 
00000000000000a6
[  368.117866] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[  368.119595] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[  368.121368] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[  368.122983] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[  368.124680] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[  368.126330] irq event stamp: 0
[  368.127242] hardirqs last  enabled at (0): [<0000000000000000>]           
(null)
[  368.128874] hardirqs last disabled at (0): [<ffffffff810cd810>] 
copy_process.part.55+0x3b0/0x1f00
[  368.130836] softirqs last  enabled at (0): [<ffffffff810cd810>] 
copy_process.part.55+0x3b0/0x1f00
[  368.132742] softirqs last disabled at (0): [<0000000000000000>]           
(null)
[  368.134443] ---[ end trace 08fc3163d80dad82 ]---
[  368.140638] WARNING: CPU: 2 PID: 14918 at fs/btrfs/extent-tree.c:9789 
btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[  368.143175] Modules linked in: btrfs xor zstd_decompress zstd_compress 
xxhash raid6_pq efivarfs xfs nvme nvme_core
[  368.145923] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Tainted: G        W 
        4.19.0-rc8+ #18
[  368.148908] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
[  368.150766] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[  368.156851] RSP: 0018:ffffc9000744bdb0 EFLAGS: 00010206
[  368.158212] RAX: ffff8803414f4620 RBX: ffff88033cd70000 RCX: 0000000000000000
[  368.159883] RDX: 0000000000000002 RSI: ffff88033cd71750 RDI: ffff8803414f7a38
[  368.161614] RBP: ffff8803414f0558 R08: 0000000000000000 R09: ffff8803414f7b70
[  368.163340] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88033cd70118
[  368.165029] R13: ffff8803414f4620 R14: 0000000000000000 R15: dead000000000100
[  368.166730] FS:  00007f206293e780(0000) GS:ffff880373800000(0000) 
knlGS:0000000000000000
[  368.168613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.170055] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[  368.171794] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.173501] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  368.175273] Call Trace:
[  368.176969]  close_ctree+0x171/0x330 [btrfs]
[  368.178203]  generic_shutdown_super+0x6c/0x100
[  368.179443]  kill_anon_super+0x14/0x30
[  368.180560]  btrfs_kill_super+0x13/0x100 [btrfs]
[  368.181794]  deactivate_locked_super+0x2f/0x70
[  368.182975]  cleanup_mnt+0x3b/0x70
[  368.184024]  task_work_run+0x9e/0xd0
[  368.185051]  do_syscall_64+0x470/0x600
[  368.186098]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  368.187392]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  368.188675] RIP: 0033:0x7f2062ec1b5b
[  368.194299] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 
00000000000000a6
[  368.195990] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[  368.197690] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[  368.199348] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[  368.200998] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[  368.202651] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[  368.204354] irq event stamp: 0
[  368.205236] hardirqs last  enabled at (0): [<0000000000000000>]           
(null)
[  368.207825] hardirqs last disabled at (0): [<ffffffff810cd810>] 
copy_process.part.55+0x3b0/0x1f00
[  368.209804] softirqs last  enabled at (0): [<ffffffff810cd810>] 
copy_process.part.55+0x3b0/0x1f00
[  368.211807] softirqs last disabled at (0): [<0000000000000000>]           
(null)
[  368.213501] ---[ end trace 08fc3163d80dad83 ]---
[  368.214836] BTRFS info (device dm-3): space_info 1 has 7708672 free, is not 
full
[  368.216595] BTRFS info (device dm-3): space_info total=8388608, used=585728, 
pinned=0, reserved=86016, may_use=8192, readonly=0
[  368.219751] WARNING: CPU: 2 PID: 14918 at fs/btrfs/extent-tree.c:9789 
btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[  368.222497] Modules linked in: btrfs xor zstd_decompress zstd_compress 
xxhash raid6_pq efivarfs xfs nvme nvme_core
[  368.225120] CPU: 2 PID: 14918 Comm: umount Kdump: loaded Tainted: G        W 
        4.19.0-rc8+ #18
[  368.227581] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
[  368.229402] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs]
[  368.235564] RSP: 0018:ffffc9000744bdb0 EFLAGS: 00010206
[  368.237770] RAX: ffff8803414f7b40 RBX: ffff88033cd70000 RCX: 000000000017000c
[  368.239647] RDX: 0000000000000002 RSI: 000000000017000c RDI: ffffffff8125fb74
[  368.241447] RBP: ffff8803414f47d0 R08: 0000000000000000 R09: 0000000000000000
[  368.243206] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803414f4848
[  368.245071] R13: ffff8803414f7b40 R14: 0000000000000000 R15: dead000000000100
[  368.246825] FS:  00007f206293e780(0000) GS:ffff880373800000(0000) 
knlGS:0000000000000000
[  368.248758] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.250245] CR2: 000055d225976b70 CR3: 000000033f992004 CR4: 00000000003606e0
[  368.251982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.253714] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  368.255491] Call Trace:
[  368.256375]  close_ctree+0x171/0x330 [btrfs]
[  368.257584]  generic_shutdown_super+0x6c/0x100
[  368.258786]  kill_anon_super+0x14/0x30
[  368.259900]  btrfs_kill_super+0x13/0x100 [btrfs]
[  368.261180]  deactivate_locked_super+0x2f/0x70
[  368.262433]  cleanup_mnt+0x3b/0x70
[  368.263452]  task_work_run+0x9e/0xd0
[  368.264531]  do_syscall_64+0x470/0x600
[  368.265621]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  368.267805]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  368.269288] RIP: 0033:0x7f2062ec1b5b
[  368.275103] RSP: 002b:00007ffd75ebd748 EFLAGS: 00000246 ORIG_RAX: 
00000000000000a6
[  368.276887] RAX: 0000000000000000 RBX: 000055b8b9bdb2b0 RCX: 00007f2062ec1b5b
[  368.278598] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b8b9bdb490
[  368.280268] RBP: 0000000000000000 R08: 000055b8b9bdb4b0 R09: 00007ffd75ebbfa0
[  368.281917] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b8b9bdb490
[  368.283580] R13: 00007f20631e11a4 R14: 0000000000000000 R15: 00007ffd75ebd9b8
[  368.285271] irq event stamp: 0
[  368.286151] hardirqs last  enabled at (0): [<0000000000000000>]           
(null)
[  368.287851] hardirqs last disabled at (0): [<ffffffff810cd810>] 
copy_process.part.55+0x3b0/0x1f00
[  368.289798] softirqs last  enabled at (0): [<ffffffff810cd810>] 
copy_process.part.55+0x3b0/0x1f00
[  368.291756] softirqs last disabled at (0): [<0000000000000000>]           
(null)
[  368.293419] ---[ end trace 08fc3163d80dad84 ]---
[  368.294684] BTRFS info (device dm-3): space_info 4 has 275857408 free, is 
not full
[  368.297278] BTRFS info (device dm-3): space_info total=276824064, 
used=114688, pinned=0, reserved=0, may_use=786432, readonly=65536
[  368.300677] VFS: Busy inodes after unmount of dm-3. Self-destruct in 5 
seconds.  Have a nice day...

Reply via email to