[PATCH 36/42] btrfs: wait on caching when putting the bg cache

2018-10-12 Thread Josef Bacik
While testing my backport I noticed there was a panic if I ran
generic/416 generic/417 generic/418 all in a row.  This just happened to
uncover a race where we had outstanding IO after we destroy all of our
workqueues, and then we'd go to queue the endio work on those free'd
workqueues.  This is because we aren't waiting for the caching threads
to be done before freeing everything up, so to fix this make sure we
wait on any outstanding caching that's being done before we free up the
block group, so we're sure to be done with all IO by the time we get to
btrfs_stop_all_workers().  This fixes the panic I was seeing
consistently in testing.

[ cut here ]
kernel BUG at fs/btrfs/volumes.c:6112!
SMP PTI
Modules linked in:
CPU: 1 PID: 27165 Comm: kworker/u4:7 Not tainted 
4.16.0-02155-g3553e54a578d-dirty #875
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 
04/01/2014
Workqueue: btrfs-cache btrfs_cache_helper
RIP: 0010:btrfs_map_bio+0x346/0x370
RSP: :c900061e79d0 EFLAGS: 00010202
RAX:  RBX: 880071542e00 RCX: 00533000
RDX: 88006bb74380 RSI: 0008 RDI: 88007816
RBP: 0001 R08: 8800781cd200 R09: 00503000
R10: 88006cd21200 R11:  R12: 
R13:  R14: 8800781cd200 R15: 880071542e00
FS:  () GS:88007fd0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0817ffc4 CR3: 78314000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 btree_submit_bio_hook+0x8a/0xd0
 submit_one_bio+0x5d/0x80
 read_extent_buffer_pages+0x18a/0x320
 btree_read_extent_buffer_pages+0xbc/0x200
 ? alloc_extent_buffer+0x359/0x3e0
 read_tree_block+0x3d/0x60
 read_block_for_search.isra.30+0x1a5/0x360
 btrfs_search_slot+0x41b/0xa10
 btrfs_next_old_leaf+0x212/0x470
 caching_thread+0x323/0x490
 normal_work_helper+0xc5/0x310
 process_one_work+0x141/0x340
 worker_thread+0x44/0x3c0
 kthread+0xf8/0x130
 ? process_one_work+0x340/0x340
 ? kthread_bind+0x10/0x10
 ret_from_fork+0x35/0x40
Code: ff ff 48 8b 4c 24 28 48 89 de 48 8b 7c 24 08 e8 d1 e5 04 00 89 c3 e9 08 
ff ff ff 4d 89 c6 49 89 df e9 27 fe ff ff e8 5a 3a bb ff <0f> 0b 0f 0b e9 57 ff 
ff ff 48 8b 7c 24 08 4c 89 f9 4c 89 ea 48
RIP: btrfs_map_bio+0x346/0x370 RSP: c900061e79d0
---[ end trace 827eb13e50846033 ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception

Signed-off-by: Josef Bacik 
Reviewed-by: Omar Sandoval 
---
 fs/btrfs/extent-tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9b5366932298..4bacb5042d08 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9894,6 +9894,7 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info 
*info)
 
block_group = btrfs_lookup_first_block_group(info, last);
while (block_group) {
+   wait_block_group_cache_done(block_group);
spin_lock(_group->lock);
if (block_group->iref)
break;
-- 
2.14.3



[PATCH 36/42] btrfs: wait on caching when putting the bg cache

2018-10-11 Thread Josef Bacik
While testing my backport I noticed there was a panic if I ran
generic/416 generic/417 generic/418 all in a row.  This just happened to
uncover a race where we had outstanding IO after we destroy all of our
workqueues, and then we'd go to queue the endio work on those free'd
workqueues.  This is because we aren't waiting for the caching threads
to be done before freeing everything up, so to fix this make sure we
wait on any outstanding caching that's being done before we free up the
block group, so we're sure to be done with all IO by the time we get to
btrfs_stop_all_workers().  This fixes the panic I was seeing
consistently in testing.

[ cut here ]
kernel BUG at fs/btrfs/volumes.c:6112!
SMP PTI
Modules linked in:
CPU: 1 PID: 27165 Comm: kworker/u4:7 Not tainted 
4.16.0-02155-g3553e54a578d-dirty #875
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 
04/01/2014
Workqueue: btrfs-cache btrfs_cache_helper
RIP: 0010:btrfs_map_bio+0x346/0x370
RSP: :c900061e79d0 EFLAGS: 00010202
RAX:  RBX: 880071542e00 RCX: 00533000
RDX: 88006bb74380 RSI: 0008 RDI: 88007816
RBP: 0001 R08: 8800781cd200 R09: 00503000
R10: 88006cd21200 R11:  R12: 
R13:  R14: 8800781cd200 R15: 880071542e00
FS:  () GS:88007fd0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0817ffc4 CR3: 78314000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 btree_submit_bio_hook+0x8a/0xd0
 submit_one_bio+0x5d/0x80
 read_extent_buffer_pages+0x18a/0x320
 btree_read_extent_buffer_pages+0xbc/0x200
 ? alloc_extent_buffer+0x359/0x3e0
 read_tree_block+0x3d/0x60
 read_block_for_search.isra.30+0x1a5/0x360
 btrfs_search_slot+0x41b/0xa10
 btrfs_next_old_leaf+0x212/0x470
 caching_thread+0x323/0x490
 normal_work_helper+0xc5/0x310
 process_one_work+0x141/0x340
 worker_thread+0x44/0x3c0
 kthread+0xf8/0x130
 ? process_one_work+0x340/0x340
 ? kthread_bind+0x10/0x10
 ret_from_fork+0x35/0x40
Code: ff ff 48 8b 4c 24 28 48 89 de 48 8b 7c 24 08 e8 d1 e5 04 00 89 c3 e9 08 
ff ff ff 4d 89 c6 49 89 df e9 27 fe ff ff e8 5a 3a bb ff <0f> 0b 0f 0b e9 57 ff 
ff ff 48 8b 7c 24 08 4c 89 f9 4c 89 ea 48
RIP: btrfs_map_bio+0x346/0x370 RSP: c900061e79d0
---[ end trace 827eb13e50846033 ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception

Signed-off-by: Josef Bacik 
Reviewed-by: Omar Sandoval 
---
 fs/btrfs/extent-tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6174d1b7875b..4b74d8a97f7c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9894,6 +9894,7 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info 
*info)
 
block_group = btrfs_lookup_first_block_group(info, last);
while (block_group) {
+   wait_block_group_cache_done(block_group);
spin_lock(_group->lock);
if (block_group->iref)
break;
-- 
2.14.3



[PATCH 36/42] btrfs: wait on caching when putting the bg cache

2018-09-28 Thread Josef Bacik
While testing my backport I noticed there was a panic if I ran
generic/416 generic/417 generic/418 all in a row.  This just happened to
uncover a race where we had outstanding IO after we destroy all of our
workqueues, and then we'd go to queue the endio work on those free'd
workqueues.  This is because we aren't waiting for the caching threads
to be done before freeing everything up, so to fix this make sure we
wait on any outstanding caching that's being done before we free up the
block group, so we're sure to be done with all IO by the time we get to
btrfs_stop_all_workers().  This fixes the panic I was seeing
consistently in testing.

Signed-off-by: Josef Bacik 
Reviewed-by: Omar Sandoval 
---
 fs/btrfs/extent-tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 922dd509591a..262e0f7f2ea1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9890,6 +9890,7 @@ void btrfs_put_block_group_cache(struct btrfs_fs_info 
*info)
 
block_group = btrfs_lookup_first_block_group(info, last);
while (block_group) {
+   wait_block_group_cache_done(block_group);
spin_lock(_group->lock);
if (block_group->iref)
break;
-- 
2.14.3