Re: Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held
On Wed, Mar 30, 2016 at 7:25 PM, Liu Bowrote: > On Wed, Mar 30, 2016 at 05:24:04PM +0200, Ilya Dryomov wrote: >> Hi, >> >> We are hitting the attached lockup on a somewhat regular basis during >> nightly tests. Looks like a bunch of CPUs spin in find_free_extent() >> on btrfs_free_cluster::lock, which is held by writer, who seems to be >> stuck in an endless loop in btrfs_find_space_cluster(), trying to >> cleanup bitmaps list. Smells like a list corruption to me? > > My objdump shows that find_free_extent() may wait on > down_read(_info->groups_sem); It's spinning on a spinlock: 6228 if (last_ptr) { 6229 spin_lock(_ptr->lock); 6230 if (last_ptr->block_group) > > One possible thing is that there're too many entries in bitmap, and > list_for_each_entry just is just stuck there. I don't think so - look at the two journal_write splats. > > Are these stacks from "Blocked more than 120s"? No, these are all soft lockups. Sorry, I had to edit it to make it readable and concentrated on stack traces. I guess "BUG: soft lockup" were mixed up with modules and object code all over the place. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held
On Wed, Mar 30, 2016 at 05:24:04PM +0200, Ilya Dryomov wrote: > Hi, > > We are hitting the attached lockup on a somewhat regular basis during > nightly tests. Looks like a bunch of CPUs spin in find_free_extent() > on btrfs_free_cluster::lock, which is held by writer, who seems to be > stuck in an endless loop in btrfs_find_space_cluster(), trying to > cleanup bitmaps list. Smells like a list corruption to me? My objdump shows that find_free_extent() may wait on down_read(_info->groups_sem); One possible thing is that there're too many entries in bitmap, and list_for_each_entry just is just stuck there. Are these stacks from "Blocked more than 120s"? Thanks, -liubo > > The kernel is ancient in btrfs terms - ubuntu's 3.13.0-83-generic, but > the surroundings look sufficiently similar to upstream and given recent > patches like 1b9b922a3a60 ("Btrfs: check for empty bitmap list in > setup_cluster_bitmaps") I thought this might be relevant for upstream. > > Thanks, > > Ilya > [74750.641965] CPU: 6 PID: 12768 Comm: btrfs-transacti Not tainted > 3.13.0-83-generic #127-Ubuntu > [74750.650976] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c > 09/07/2015 > [74750.658951] task: 88074b0e8000 ti: 88083c564000 task.ti: > 88083c564000 > [74750.666936] RIP: 0010:[] [] > _raw_spin_lock+0x37/0x50 > [74750.675645] RSP: 0018:88083c565b50 EFLAGS: 0206 > [74750.681469] RAX: 18ca RBX: 88083c565b48 RCX: > 09a0 > [74750.689126] RDX: 09a6 RSI: 09a6 RDI: > 88084f6671b0 > [74750.696784] RBP: 88083c565b50 R08: 88083c565cef R09: > 0001 > [74750.704466] R10: a019c9e6 R11: ea001ff7f6c0 R12: > 8807e829d4e0 > [74750.712140] R13: a02073ab R14: 88083c565ae0 R15: > 8807e829d4e0 > [74750.719811] FS: () GS:88087fd8() > knlGS: > [74750.728450] CS: 0010 DS: ES: CR0: 80050033 > [74750.734744] CR2: 7f4d2b6a9060 CR3: 01c0e000 CR4: > 001407e0 > [74750.742437] Stack: > [74750.745001] 88083c565c28 a01b07a3 88074873a000 > 880530131d20 > [74750.753033] 880791982000 88083c565cef 1000 > 00040002 > [74750.761065] a01a3a97 0020 880523f4a800 > 0001 > [74750.769105] Call Trace: > [74750.772137] [] find_free_extent+0x213/0xc30 [btrfs] > [74750.779259] [] ? btrfs_del_items+0x367/0x470 [btrfs] > [74750.786476] [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs] > [74750.793869] [] __btrfs_prealloc_file_range+0xe5/0x380 > [btrfs] > [74750.801876] [] > btrfs_prealloc_file_range_trans+0x30/0x40 [btrfs] > [74750.810136] [] > btrfs_write_dirty_block_groups+0x4d3/0x620 [btrfs] > [74750.818469] [] commit_cowonly_roots+0x151/0x213 [btrfs] > [74750.825940] [] btrfs_commit_transaction+0x483/0x970 > [btrfs] > [74750.833765] [] transaction_kthread+0x1b5/0x240 [btrfs] > [74750.841156] [] ? btrfs_cleanup_transaction+0x550/0x550 > [btrfs] > [74750.849244] [] kthread+0xd2/0xf0 > [74750.854729] [] ? kthread_create_on_node+0x1c0/0x1c0 > [74750.861861] [] ret_from_fork+0x58/0x90 > [74750.867868] [] ? kthread_create_on_node+0x1c0/0x1c0 > > [74758.651947] CPU: 5 PID: 13299 Comm: kworker/u16:1 Not tainted > 3.13.0-83-generic #127-Ubuntu > [74758.662554] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c > 09/07/2015 > [74758.672309] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-15) > [74758.681114] task: 880734153000 ti: 88078ad98000 task.ti: > 88078ad98000 > [74758.690802] RIP: 0010:[] [] > _raw_spin_lock+0x32/0x50 > [74758.701202] RSP: 0018:88078ad99728 EFLAGS: 0202 > [74758.708662] RAX: 4bf5 RBX: 811f0610 RCX: > 09a0 > [74758.717910] RDX: 09a4 RSI: 09a4 RDI: > 88084f6671b0 > [74758.727114] RBP: 88078ad99728 R08: 88078ad998b7 R09: > 0001 > [74758.736311] R10: R11: ea001b38bc00 R12: > 88084e918770 > [74758.745503] R13: 88084e9188c0 R14: 8114f8ee R15: > 88078ad99698 > [74758.754665] FS: () GS:88087fd4() > knlGS: > [74758.764805] CS: 0010 DS: ES: CR0: 80050033 > [74758.772582] CR2: 7f4d2b695000 CR3: 01c0e000 CR4: > 001407e0 > [74758.781766] Stack: > [74758.785786] 88078ad99800 a01b07a3 8807e6bc5000 > 880853034e70 > [74758.795307] 88078ad997b8 88078ad998b7 880853034e40 > 00040002 > [74758.804838] 88078ad99778 0020 880523f4a800 > 0001 > [74758.814337] Call Trace: > [74758.818811] [] find_free_extent+0x213/0xc30 [btrfs] > [74758.827389] [] ? alloc_extent_state+0x21/0xc0 [btrfs] > [74758.836147] [] ? __lookup_extent_mapping+0xa0/0x150 > [btrfs] > [74758.845413] []
Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held
Hi, We are hitting the attached lockup on a somewhat regular basis during nightly tests. Looks like a bunch of CPUs spin in find_free_extent() on btrfs_free_cluster::lock, which is held by writer, who seems to be stuck in an endless loop in btrfs_find_space_cluster(), trying to cleanup bitmaps list. Smells like a list corruption to me? The kernel is ancient in btrfs terms - ubuntu's 3.13.0-83-generic, but the surroundings look sufficiently similar to upstream and given recent patches like 1b9b922a3a60 ("Btrfs: check for empty bitmap list in setup_cluster_bitmaps") I thought this might be relevant for upstream. Thanks, Ilya [74750.641965] CPU: 6 PID: 12768 Comm: btrfs-transacti Not tainted 3.13.0-83-generic #127-Ubuntu [74750.650976] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 [74750.658951] task: 88074b0e8000 ti: 88083c564000 task.ti: 88083c564000 [74750.666936] RIP: 0010:[] [] _raw_spin_lock+0x37/0x50 [74750.675645] RSP: 0018:88083c565b50 EFLAGS: 0206 [74750.681469] RAX: 18ca RBX: 88083c565b48 RCX: 09a0 [74750.689126] RDX: 09a6 RSI: 09a6 RDI: 88084f6671b0 [74750.696784] RBP: 88083c565b50 R08: 88083c565cef R09: 0001 [74750.704466] R10: a019c9e6 R11: ea001ff7f6c0 R12: 8807e829d4e0 [74750.712140] R13: a02073ab R14: 88083c565ae0 R15: 8807e829d4e0 [74750.719811] FS: () GS:88087fd8() knlGS: [74750.728450] CS: 0010 DS: ES: CR0: 80050033 [74750.734744] CR2: 7f4d2b6a9060 CR3: 01c0e000 CR4: 001407e0 [74750.742437] Stack: [74750.745001] 88083c565c28 a01b07a3 88074873a000 880530131d20 [74750.753033] 880791982000 88083c565cef 1000 00040002 [74750.761065] a01a3a97 0020 880523f4a800 0001 [74750.769105] Call Trace: [74750.772137] [] find_free_extent+0x213/0xc30 [btrfs] [74750.779259] [] ? btrfs_del_items+0x367/0x470 [btrfs] [74750.786476] [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs] [74750.793869] [] __btrfs_prealloc_file_range+0xe5/0x380 [btrfs] [74750.801876] [] btrfs_prealloc_file_range_trans+0x30/0x40 [btrfs] [74750.810136] [] btrfs_write_dirty_block_groups+0x4d3/0x620 [btrfs] [74750.818469] [] commit_cowonly_roots+0x151/0x213 [btrfs] [74750.825940] [] btrfs_commit_transaction+0x483/0x970 [btrfs] [74750.833765] [] transaction_kthread+0x1b5/0x240 [btrfs] [74750.841156] [] ? btrfs_cleanup_transaction+0x550/0x550 [btrfs] [74750.849244] [] kthread+0xd2/0xf0 [74750.854729] [] ? kthread_create_on_node+0x1c0/0x1c0 [74750.861861] [] ret_from_fork+0x58/0x90 [74750.867868] [] ? kthread_create_on_node+0x1c0/0x1c0 [74758.651947] CPU: 5 PID: 13299 Comm: kworker/u16:1 Not tainted 3.13.0-83-generic #127-Ubuntu [74758.662554] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 [74758.672309] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-15) [74758.681114] task: 880734153000 ti: 88078ad98000 task.ti: 88078ad98000 [74758.690802] RIP: 0010:[] [] _raw_spin_lock+0x32/0x50 [74758.701202] RSP: 0018:88078ad99728 EFLAGS: 0202 [74758.708662] RAX: 4bf5 RBX: 811f0610 RCX: 09a0 [74758.717910] RDX: 09a4 RSI: 09a4 RDI: 88084f6671b0 [74758.727114] RBP: 88078ad99728 R08: 88078ad998b7 R09: 0001 [74758.736311] R10: R11: ea001b38bc00 R12: 88084e918770 [74758.745503] R13: 88084e9188c0 R14: 8114f8ee R15: 88078ad99698 [74758.754665] FS: () GS:88087fd4() knlGS: [74758.764805] CS: 0010 DS: ES: CR0: 80050033 [74758.772582] CR2: 7f4d2b695000 CR3: 01c0e000 CR4: 001407e0 [74758.781766] Stack: [74758.785786] 88078ad99800 a01b07a3 8807e6bc5000 880853034e70 [74758.795307] 88078ad997b8 88078ad998b7 880853034e40 00040002 [74758.804838] 88078ad99778 0020 880523f4a800 0001 [74758.814337] Call Trace: [74758.818811] [] find_free_extent+0x213/0xc30 [btrfs] [74758.827389] [] ? alloc_extent_state+0x21/0xc0 [btrfs] [74758.836147] [] ? __lookup_extent_mapping+0xa0/0x150 [btrfs] [74758.845413] [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs] [74758.854224] [] cow_file_range+0x135/0x430 [btrfs] [74758.862593] [] run_delalloc_range+0x312/0x350 [btrfs] [74758.871310] [] ? find_lock_delalloc_range.constprop.43+0x1b9/0x1f0 [btrfs] [74758.881857] [] ? ata_scsi_queuecmd+0x133/0x400 [74758.889974] [] __extent_writepage+0x2f4/0x760 [btrfs] [74758.898703] [] ? btrfs_add_delayed_iput+0x61/0xc0 [btrfs] [74758.907763] [] ? __blk_run_queue+0x33/0x40 [74758.915511] [] ? find_get_pages_tag+0xd1/0x180 [74758.923600] [] ? kmem_cache_alloc_trace+0x3c/0x1f0 [74758.932051] []