Re: [PATCH 2/6] Btrfs: fix crash caused by block group removal
On 11/26/2014 10:28 AM, Filipe Manana wrote: If we remove a block group (because it became empty), we might have left a caching_ctl structure in fs_info-caching_block_groups that points to the block group and is accessed at transaction commit time. This results in accessing an invalid or incorrect block group. This issue became visible after Josef's patch Btrfs: remove empty block groups automatically. So if the block group is removed make sure we don't leave a dangling caching_ctl in caching_block_groups. Sample crash trace: [58380.439449] BUG: unable to handle kernel paging request at 8801446eaeb8 [58380.439707] IP: [a03f6d05] block_group_cache_done.isra.21+0xc/0x1c [btrfs] [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE 8001446ea060 [58380.441220] Oops: [#1] SMP DEBUG_PAGEALLOC [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring virtio [last unloaded: btrfs] [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: GW 3.17.0-rc5-btrfs-next-1+ #1 [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [58380.443238] task: 88013ac82090 ti: 88013896c000 task.ti: 88013896c000 [58380.443238] RIP: 0010:[a03f6d05] [a03f6d05] block_group_cache_done.isra.21+0xc/0x1c [btrfs] [58380.443238] RSP: 0018:88013896fdd8 EFLAGS: 00010283 [58380.443238] RAX: 880222cae850 RBX: 880119ba74c0 RCX: [58380.443238] RDX: RSI: 880185e16800 RDI: 8801446eaeb8 [58380.443238] RBP: 88013896fdd8 R08: 8801a9ca9fa8 R09: 88013896fc60 [58380.443238] R10: 88013896fd28 R11: R12: 880222cae000 [58380.443238] R13: 880222cae850 R14: 880222cae6b0 R15: 8801446eae00 [58380.443238] FS: () GS:88023ed8() knlGS: [58380.443238] CS: 0010 DS: ES: CR0: 8005003b [58380.443238] CR2: 8801446eaeb8 CR3: 01811000 CR4: 06e0 [58380.443238] Stack: [58380.443238] 88013896fe18 a03fe2d5 880222cae850 880185e16800 [58380.443238] 88000dc41c20 8801a9ca9f00 [58380.443238] 88013896fe80 a040fbcf 88018b0dcdb0 88013ac82090 [58380.443238] Call Trace: [58380.443238] [a03fe2d5] btrfs_prepare_extent_commit+0x5a/0xd7 [btrfs] [58380.443238] [a040fbcf] btrfs_commit_transaction+0x45c/0x882 [btrfs] [58380.443238] [a040c058] transaction_kthread+0xf2/0x1a4 [btrfs] [58380.443238] [a040bf66] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs] [58380.443238] [8105966b] kthread+0xb7/0xbf [58380.443238] [810595b4] ? __kthread_parkme+0x67/0x67 [58380.443238] [813ebeac] ret_from_fork+0x7c/0xb0 [58380.443238] [810595b4] ? __kthread_parkme+0x67/0x67 Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 27 +++ 2 files changed, 28 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index d3ccd09..7f40a65 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache { unsigned int ro:1; unsigned int dirty:1; unsigned int iref:1; + unsigned int has_caching_ctl:1; Don't do this, just unconditionally call get_caching_control in btrfs_remove_block_group, then if we get something we can do stuff, otherwise we can just continue. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] Btrfs: fix crash caused by block group removal
On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik jba...@fb.com wrote: On 11/26/2014 10:28 AM, Filipe Manana wrote: If we remove a block group (because it became empty), we might have left a caching_ctl structure in fs_info-caching_block_groups that points to the block group and is accessed at transaction commit time. This results in accessing an invalid or incorrect block group. This issue became visible after Josef's patch Btrfs: remove empty block groups automatically. So if the block group is removed make sure we don't leave a dangling caching_ctl in caching_block_groups. Sample crash trace: [58380.439449] BUG: unable to handle kernel paging request at 8801446eaeb8 [58380.439707] IP: [a03f6d05] block_group_cache_done.isra.21+0xc/0x1c [btrfs] [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE 8001446ea060 [58380.441220] Oops: [#1] SMP DEBUG_PAGEALLOC [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring virtio [last unloaded: btrfs] [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: GW 3.17.0-rc5-btrfs-next-1+ #1 [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [58380.443238] task: 88013ac82090 ti: 88013896c000 task.ti: 88013896c000 [58380.443238] RIP: 0010:[a03f6d05] [a03f6d05] block_group_cache_done.isra.21+0xc/0x1c [btrfs] [58380.443238] RSP: 0018:88013896fdd8 EFLAGS: 00010283 [58380.443238] RAX: 880222cae850 RBX: 880119ba74c0 RCX: [58380.443238] RDX: RSI: 880185e16800 RDI: 8801446eaeb8 [58380.443238] RBP: 88013896fdd8 R08: 8801a9ca9fa8 R09: 88013896fc60 [58380.443238] R10: 88013896fd28 R11: R12: 880222cae000 [58380.443238] R13: 880222cae850 R14: 880222cae6b0 R15: 8801446eae00 [58380.443238] FS: () GS:88023ed8() knlGS: [58380.443238] CS: 0010 DS: ES: CR0: 8005003b [58380.443238] CR2: 8801446eaeb8 CR3: 01811000 CR4: 06e0 [58380.443238] Stack: [58380.443238] 88013896fe18 a03fe2d5 880222cae850 880185e16800 [58380.443238] 88000dc41c20 8801a9ca9f00 [58380.443238] 88013896fe80 a040fbcf 88018b0dcdb0 88013ac82090 [58380.443238] Call Trace: [58380.443238] [a03fe2d5] btrfs_prepare_extent_commit+0x5a/0xd7 [btrfs] [58380.443238] [a040fbcf] btrfs_commit_transaction+0x45c/0x882 [btrfs] [58380.443238] [a040c058] transaction_kthread+0xf2/0x1a4 [btrfs] [58380.443238] [a040bf66] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs] [58380.443238] [8105966b] kthread+0xb7/0xbf [58380.443238] [810595b4] ? __kthread_parkme+0x67/0x67 [58380.443238] [813ebeac] ret_from_fork+0x7c/0xb0 [58380.443238] [810595b4] ? __kthread_parkme+0x67/0x67 Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 27 +++ 2 files changed, 28 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index d3ccd09..7f40a65 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache { unsigned int ro:1; unsigned int dirty:1; unsigned int iref:1; + unsigned int has_caching_ctl:1; Don't do this, just unconditionally call get_caching_control in btrfs_remove_block_group, then if we get something we can do stuff, otherwise we can just continue. Thanks, That's what I initially thought too. However, get_caching_control only returns us the caching_ctl if block_group-cached == BTRFS_CACHE_STARTED, so it's not enough to use it exclusively. The has_caching_ctl flag is just to avoid holding the semaphore and search through the list (since block_group-caching_ctl can be NULL but a caching_ctl point to the block group can be in the list). Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org
Re: [PATCH 2/6] Btrfs: fix crash caused by block group removal
On 11/26/2014 11:09 AM, Filipe David Manana wrote: On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik jba...@fb.com wrote: On 11/26/2014 10:28 AM, Filipe Manana wrote: If we remove a block group (because it became empty), we might have left a caching_ctl structure in fs_info-caching_block_groups that points to the block group and is accessed at transaction commit time. This results in accessing an invalid or incorrect block group. This issue became visible after Josef's patch Btrfs: remove empty block groups automatically. So if the block group is removed make sure we don't leave a dangling caching_ctl in caching_block_groups. Sample crash trace: [58380.439449] BUG: unable to handle kernel paging request at 8801446eaeb8 [58380.439707] IP: [a03f6d05] block_group_cache_done.isra.21+0xc/0x1c [btrfs] [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE 8001446ea060 [58380.441220] Oops: [#1] SMP DEBUG_PAGEALLOC [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring virtio [last unloaded: btrfs] [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: GW 3.17.0-rc5-btrfs-next-1+ #1 [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [58380.443238] task: 88013ac82090 ti: 88013896c000 task.ti: 88013896c000 [58380.443238] RIP: 0010:[a03f6d05] [a03f6d05] block_group_cache_done.isra.21+0xc/0x1c [btrfs] [58380.443238] RSP: 0018:88013896fdd8 EFLAGS: 00010283 [58380.443238] RAX: 880222cae850 RBX: 880119ba74c0 RCX: [58380.443238] RDX: RSI: 880185e16800 RDI: 8801446eaeb8 [58380.443238] RBP: 88013896fdd8 R08: 8801a9ca9fa8 R09: 88013896fc60 [58380.443238] R10: 88013896fd28 R11: R12: 880222cae000 [58380.443238] R13: 880222cae850 R14: 880222cae6b0 R15: 8801446eae00 [58380.443238] FS: () GS:88023ed8() knlGS: [58380.443238] CS: 0010 DS: ES: CR0: 8005003b [58380.443238] CR2: 8801446eaeb8 CR3: 01811000 CR4: 06e0 [58380.443238] Stack: [58380.443238] 88013896fe18 a03fe2d5 880222cae850 880185e16800 [58380.443238] 88000dc41c20 8801a9ca9f00 [58380.443238] 88013896fe80 a040fbcf 88018b0dcdb0 88013ac82090 [58380.443238] Call Trace: [58380.443238] [a03fe2d5] btrfs_prepare_extent_commit+0x5a/0xd7 [btrfs] [58380.443238] [a040fbcf] btrfs_commit_transaction+0x45c/0x882 [btrfs] [58380.443238] [a040c058] transaction_kthread+0xf2/0x1a4 [btrfs] [58380.443238] [a040bf66] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs] [58380.443238] [8105966b] kthread+0xb7/0xbf [58380.443238] [810595b4] ? __kthread_parkme+0x67/0x67 [58380.443238] [813ebeac] ret_from_fork+0x7c/0xb0 [58380.443238] [810595b4] ? __kthread_parkme+0x67/0x67 Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 27 +++ 2 files changed, 28 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index d3ccd09..7f40a65 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache { unsigned int ro:1; unsigned int dirty:1; unsigned int iref:1; + unsigned int has_caching_ctl:1; Don't do this, just unconditionally call get_caching_control in btrfs_remove_block_group, then if we get something we can do stuff, otherwise we can just continue. Thanks, That's what I initially thought too. However, get_caching_control only returns us the caching_ctl if block_group-cached == BTRFS_CACHE_STARTED, so it's not enough to use it exclusively. The has_caching_ctl flag is just to avoid holding the semaphore and search through the list (since block_group-caching_ctl can be NULL but a caching_ctl point to the block group can be in the list). Oh God that's not good, we need to change get_caching_control to return if there is a caching control at all, since the other users want to wait for the fast caching to finish too. So change that and then use it unconditionally. I bet this has been causing us the random early ENOSPC problems. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] Btrfs: fix crash caused by block group removal
On Wed, Nov 26, 2014 at 4:24 PM, Josef Bacik jba...@fb.com wrote: On 11/26/2014 11:09 AM, Filipe David Manana wrote: On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik jba...@fb.com wrote: On 11/26/2014 10:28 AM, Filipe Manana wrote: If we remove a block group (because it became empty), we might have left a caching_ctl structure in fs_info-caching_block_groups that points to the block group and is accessed at transaction commit time. This results in accessing an invalid or incorrect block group. This issue became visible after Josef's patch Btrfs: remove empty block groups automatically. So if the block group is removed make sure we don't leave a dangling caching_ctl in caching_block_groups. Sample crash trace: [58380.439449] BUG: unable to handle kernel paging request at 8801446eaeb8 [58380.439707] IP: [a03f6d05] block_group_cache_done.isra.21+0xc/0x1c [btrfs] [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE 8001446ea060 [58380.441220] Oops: [#1] SMP DEBUG_PAGEALLOC [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring virtio [last unloaded: btrfs] [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: G W 3.17.0-rc5-btrfs-next-1+ #1 [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [58380.443238] task: 88013ac82090 ti: 88013896c000 task.ti: 88013896c000 [58380.443238] RIP: 0010:[a03f6d05] [a03f6d05] block_group_cache_done.isra.21+0xc/0x1c [btrfs] [58380.443238] RSP: 0018:88013896fdd8 EFLAGS: 00010283 [58380.443238] RAX: 880222cae850 RBX: 880119ba74c0 RCX: [58380.443238] RDX: RSI: 880185e16800 RDI: 8801446eaeb8 [58380.443238] RBP: 88013896fdd8 R08: 8801a9ca9fa8 R09: 88013896fc60 [58380.443238] R10: 88013896fd28 R11: R12: 880222cae000 [58380.443238] R13: 880222cae850 R14: 880222cae6b0 R15: 8801446eae00 [58380.443238] FS: () GS:88023ed8() knlGS: [58380.443238] CS: 0010 DS: ES: CR0: 8005003b [58380.443238] CR2: 8801446eaeb8 CR3: 01811000 CR4: 06e0 [58380.443238] Stack: [58380.443238] 88013896fe18 a03fe2d5 880222cae850 880185e16800 [58380.443238] 88000dc41c20 8801a9ca9f00 [58380.443238] 88013896fe80 a040fbcf 88018b0dcdb0 88013ac82090 [58380.443238] Call Trace: [58380.443238] [a03fe2d5] btrfs_prepare_extent_commit+0x5a/0xd7 [btrfs] [58380.443238] [a040fbcf] btrfs_commit_transaction+0x45c/0x882 [btrfs] [58380.443238] [a040c058] transaction_kthread+0xf2/0x1a4 [btrfs] [58380.443238] [a040bf66] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs] [58380.443238] [8105966b] kthread+0xb7/0xbf [58380.443238] [810595b4] ? __kthread_parkme+0x67/0x67 [58380.443238] [813ebeac] ret_from_fork+0x7c/0xb0 [58380.443238] [810595b4] ? __kthread_parkme+0x67/0x67 Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 27 +++ 2 files changed, 28 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index d3ccd09..7f40a65 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache { unsigned int ro:1; unsigned int dirty:1; unsigned int iref:1; + unsigned int has_caching_ctl:1; Don't do this, just unconditionally call get_caching_control in btrfs_remove_block_group, then if we get something we can do stuff, otherwise we can just continue. Thanks, That's what I initially thought too. However, get_caching_control only returns us the caching_ctl if block_group-cached == BTRFS_CACHE_STARTED, so it's not enough to use it exclusively. The has_caching_ctl flag is just to avoid holding the semaphore and search through the list (since block_group-caching_ctl can be NULL but a caching_ctl point to the block group can be in the list). Oh God that's not good, we need to change get_caching_control to return if there is a caching control at all, since the other users want to wait for the fast caching to finish too. So change that and then use it unconditionally. I bet this has been causing us the random early ENOSPC problems. Thanks, Right, I think that's a separate change different from what I'm
Re: [PATCH 2/6] Btrfs: fix crash caused by block group removal
On 11/26/2014 11:34 AM, Filipe David Manana wrote: On Wed, Nov 26, 2014 at 4:24 PM, Josef Bacik jba...@fb.com wrote: On 11/26/2014 11:09 AM, Filipe David Manana wrote: On Wed, Nov 26, 2014 at 3:57 PM, Josef Bacik jba...@fb.com wrote: On 11/26/2014 10:28 AM, Filipe Manana wrote: If we remove a block group (because it became empty), we might have left a caching_ctl structure in fs_info-caching_block_groups that points to the block group and is accessed at transaction commit time. This results in accessing an invalid or incorrect block group. This issue became visible after Josef's patch Btrfs: remove empty block groups automatically. So if the block group is removed make sure we don't leave a dangling caching_ctl in caching_block_groups. Sample crash trace: [58380.439449] BUG: unable to handle kernel paging request at 8801446eaeb8 [58380.439707] IP: [a03f6d05] block_group_cache_done.isra.21+0xc/0x1c [btrfs] [58380.440879] PGD 1acb067 PUD 23f5ff067 PMD 23f5db067 PTE 8001446ea060 [58380.441220] Oops: [#1] SMP DEBUG_PAGEALLOC [58380.441486] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse processor i2c_piix4 parport_pc parport pcspkr serio_raw evdev i2ccore thermal_sys microcode button ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy ata_piix e1000 libata virtio_pci scsi_mod virtio_ring virtio [last unloaded: btrfs] [58380.443238] CPU: 3 PID: 25728 Comm: btrfs-transacti Tainted: G W 3.17.0-rc5-btrfs-next-1+ #1 [58380.443238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [58380.443238] task: 88013ac82090 ti: 88013896c000 task.ti: 88013896c000 [58380.443238] RIP: 0010:[a03f6d05] [a03f6d05] block_group_cache_done.isra.21+0xc/0x1c [btrfs] [58380.443238] RSP: 0018:88013896fdd8 EFLAGS: 00010283 [58380.443238] RAX: 880222cae850 RBX: 880119ba74c0 RCX: [58380.443238] RDX: RSI: 880185e16800 RDI: 8801446eaeb8 [58380.443238] RBP: 88013896fdd8 R08: 8801a9ca9fa8 R09: 88013896fc60 [58380.443238] R10: 88013896fd28 R11: R12: 880222cae000 [58380.443238] R13: 880222cae850 R14: 880222cae6b0 R15: 8801446eae00 [58380.443238] FS: () GS:88023ed8() knlGS: [58380.443238] CS: 0010 DS: ES: CR0: 8005003b [58380.443238] CR2: 8801446eaeb8 CR3: 01811000 CR4: 06e0 [58380.443238] Stack: [58380.443238] 88013896fe18 a03fe2d5 880222cae850 880185e16800 [58380.443238] 88000dc41c20 8801a9ca9f00 [58380.443238] 88013896fe80 a040fbcf 88018b0dcdb0 88013ac82090 [58380.443238] Call Trace: [58380.443238] [a03fe2d5] btrfs_prepare_extent_commit+0x5a/0xd7 [btrfs] [58380.443238] [a040fbcf] btrfs_commit_transaction+0x45c/0x882 [btrfs] [58380.443238] [a040c058] transaction_kthread+0xf2/0x1a4 [btrfs] [58380.443238] [a040bf66] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs] [58380.443238] [8105966b] kthread+0xb7/0xbf [58380.443238] [810595b4] ? __kthread_parkme+0x67/0x67 [58380.443238] [813ebeac] ret_from_fork+0x7c/0xb0 [58380.443238] [810595b4] ? __kthread_parkme+0x67/0x67 Signed-off-by: Filipe Manana fdman...@suse.com --- fs/btrfs/ctree.h | 1 + fs/btrfs/extent-tree.c | 27 +++ 2 files changed, 28 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index d3ccd09..7f40a65 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1277,6 +1277,7 @@ struct btrfs_block_group_cache { unsigned int ro:1; unsigned int dirty:1; unsigned int iref:1; + unsigned int has_caching_ctl:1; Don't do this, just unconditionally call get_caching_control in btrfs_remove_block_group, then if we get something we can do stuff, otherwise we can just continue. Thanks, That's what I initially thought too. However, get_caching_control only returns us the caching_ctl if block_group-cached == BTRFS_CACHE_STARTED, so it's not enough to use it exclusively. The has_caching_ctl flag is just to avoid holding the semaphore and search through the list (since block_group-caching_ctl can be NULL but a caching_ctl point to the block group can be in the list). Oh God that's not good, we need to change get_caching_control to return if there is a caching control at all, since the other users want to wait for the fast caching to finish too. So change that and then use it unconditionally. I bet this has been causing us the random early ENOSPC problems. Thanks, Right, I think that's a separate change different from what I'm trying to fix. When caching_thread