On Tue, Oct 30, 2018 at 05:14:42PM -0700, Omar Sandoval wrote: >From: Omar Sandoval <osan...@fb.com> > >There's a race between close_ctree() and cleaner_kthread(). >close_ctree() sets btrfs_fs_closing(), and the cleaner stops when it >sees it set, but this is racy; the cleaner might have already checked >the bit and could be cleaning stuff. In particular, if it deletes unused >block groups, it will create delayed iputs for the free space cache >inodes. As of "btrfs: don't run delayed_iputs in commit", we're no >longer running delayed iputs after a commit. Therefore, if the cleaner >creates more delayed iputs after delayed iputs are run in >btrfs_commit_super(), we will leak inodes on unmount and get a busy
Since the assert added via commit e187831e1875 ("btrfs: assert on non-empty delayed iputs") wasn't triggered, it doesn't seem to be the cause of inode leak. -- Thanks, Lu >inode crash from the VFS. > >Fix it by parking the cleaner before we actually close anything. Then, >any remaining delayed iputs will always be handled in >btrfs_commit_super(). This also ensures that the commit in close_ctree() >is really the last commit, so we can get rid of the commit in >cleaner_kthread(). > >Fixes: 30928e9baac2 ("btrfs: don't run delayed_iputs in commit") >Signed-off-by: Omar Sandoval <osan...@fb.com> >--- >We found this with a stress test that our containers team runs. I'm >wondering if this same race could have caused any other issues other >than this new iput thing, but I couldn't identify any. > > fs/btrfs/disk-io.c | 40 +++++++--------------------------------- > 1 file changed, 7 insertions(+), 33 deletions(-) > >diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >index b0ab41da91d1..7c17284ae3c2 100644 >--- a/fs/btrfs/disk-io.c >+++ b/fs/btrfs/disk-io.c >@@ -1664,9 +1664,8 @@ static int cleaner_kthread(void *arg) > struct btrfs_root *root = arg; > struct btrfs_fs_info *fs_info = root->fs_info; > int again; >- struct btrfs_trans_handle *trans; > >- do { >+ while (1) { > again = 0; > > /* Make the cleaner go to sleep early. */ >@@ -1715,42 +1714,16 @@ static int cleaner_kthread(void *arg) > */ > btrfs_delete_unused_bgs(fs_info); > sleep: >+ if (kthread_should_park()) >+ kthread_parkme(); >+ if (kthread_should_stop()) >+ return 0; > if (!again) { > set_current_state(TASK_INTERRUPTIBLE); >- if (!kthread_should_stop()) >- schedule(); >+ schedule(); > __set_current_state(TASK_RUNNING); > } >- } while (!kthread_should_stop()); >- >- /* >- * Transaction kthread is stopped before us and wakes us up. >- * However we might have started a new transaction and COWed some >- * tree blocks when deleting unused block groups for example. So >- * make sure we commit the transaction we started to have a clean >- * shutdown when evicting the btree inode - if it has dirty pages >- * when we do the final iput() on it, eviction will trigger a >- * writeback for it which will fail with null pointer dereferences >- * since work queues and other resources were already released and >- * destroyed by the time the iput/eviction/writeback is made. >- */ >- trans = btrfs_attach_transaction(root); >- if (IS_ERR(trans)) { >- if (PTR_ERR(trans) != -ENOENT) >- btrfs_err(fs_info, >- "cleaner transaction attach returned %ld", >- PTR_ERR(trans)); >- } else { >- int ret; >- >- ret = btrfs_commit_transaction(trans); >- if (ret) >- btrfs_err(fs_info, >- "cleaner open transaction commit returned %d", >- ret); > } >- >- return 0; > } > > static int transaction_kthread(void *arg) >@@ -3931,6 +3904,7 @@ void close_ctree(struct btrfs_fs_info *fs_info) > int ret; > > set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags); >+ kthread_park(fs_info->cleaner_kthread); > > /* wait for the qgroup rescan worker to stop */ > btrfs_qgroup_wait_for_completion(fs_info, false); >-- >2.19.1 > > > >