Re: Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held

2016-03-30 Thread Ilya Dryomov
On Wed, Mar 30, 2016 at 7:25 PM, Liu Bo  wrote:
> On Wed, Mar 30, 2016 at 05:24:04PM +0200, Ilya Dryomov wrote:
>> Hi,
>>
>> We are hitting the attached lockup on a somewhat regular basis during
>> nightly tests.  Looks like a bunch of CPUs spin in find_free_extent()
>> on btrfs_free_cluster::lock, which is held by writer, who seems to be
>> stuck in an endless loop in btrfs_find_space_cluster(), trying to
>> cleanup bitmaps list.  Smells like a list corruption to me?
>
> My objdump shows that find_free_extent() may wait on 
> down_read(_info->groups_sem);

It's spinning on a spinlock:

6228 if (last_ptr) {
6229 spin_lock(_ptr->lock);
6230 if (last_ptr->block_group)

>
> One possible thing is that there're too many entries in bitmap, and
> list_for_each_entry just is just stuck there.

I don't think so - look at the two journal_write splats.

>
> Are these stacks from "Blocked more than 120s"?

No, these are all soft lockups.  Sorry, I had to edit it to make it
readable and concentrated on stack traces.  I guess "BUG: soft lockup"
were mixed up with modules and object code all over the place.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held

2016-03-30 Thread Liu Bo
On Wed, Mar 30, 2016 at 05:24:04PM +0200, Ilya Dryomov wrote:
> Hi,
> 
> We are hitting the attached lockup on a somewhat regular basis during
> nightly tests.  Looks like a bunch of CPUs spin in find_free_extent()
> on btrfs_free_cluster::lock, which is held by writer, who seems to be
> stuck in an endless loop in btrfs_find_space_cluster(), trying to
> cleanup bitmaps list.  Smells like a list corruption to me?

My objdump shows that find_free_extent() may wait on 
down_read(_info->groups_sem);

One possible thing is that there're too many entries in bitmap, and
list_for_each_entry just is just stuck there.

Are these stacks from "Blocked more than 120s"?

Thanks,

-liubo

> 
> The kernel is ancient in btrfs terms - ubuntu's 3.13.0-83-generic, but
> the surroundings look sufficiently similar to upstream and given recent
> patches like 1b9b922a3a60 ("Btrfs: check for empty bitmap list in
> setup_cluster_bitmaps") I thought this might be relevant for upstream.
> 
> Thanks,
> 
> Ilya

> [74750.641965] CPU: 6 PID: 12768 Comm: btrfs-transacti Not tainted 
> 3.13.0-83-generic #127-Ubuntu
> [74750.650976] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 
> 09/07/2015
> [74750.658951] task: 88074b0e8000 ti: 88083c564000 task.ti: 
> 88083c564000
> [74750.666936] RIP: 0010:[]  [] 
> _raw_spin_lock+0x37/0x50
> [74750.675645] RSP: 0018:88083c565b50  EFLAGS: 0206
> [74750.681469] RAX: 18ca RBX: 88083c565b48 RCX: 
> 09a0
> [74750.689126] RDX: 09a6 RSI: 09a6 RDI: 
> 88084f6671b0
> [74750.696784] RBP: 88083c565b50 R08: 88083c565cef R09: 
> 0001
> [74750.704466] R10: a019c9e6 R11: ea001ff7f6c0 R12: 
> 8807e829d4e0
> [74750.712140] R13: a02073ab R14: 88083c565ae0 R15: 
> 8807e829d4e0
> [74750.719811] FS:  () GS:88087fd8() 
> knlGS:
> [74750.728450] CS:  0010 DS:  ES:  CR0: 80050033
> [74750.734744] CR2: 7f4d2b6a9060 CR3: 01c0e000 CR4: 
> 001407e0
> [74750.742437] Stack:
> [74750.745001]  88083c565c28 a01b07a3 88074873a000 
> 880530131d20
> [74750.753033]  880791982000 88083c565cef 1000 
> 00040002
> [74750.761065]  a01a3a97 0020 880523f4a800 
> 0001
> [74750.769105] Call Trace:
> [74750.772137]  [] find_free_extent+0x213/0xc30 [btrfs]
> [74750.779259]  [] ? btrfs_del_items+0x367/0x470 [btrfs]
> [74750.786476]  [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
> [74750.793869]  [] __btrfs_prealloc_file_range+0xe5/0x380 
> [btrfs]
> [74750.801876]  [] 
> btrfs_prealloc_file_range_trans+0x30/0x40 [btrfs]
> [74750.810136]  [] 
> btrfs_write_dirty_block_groups+0x4d3/0x620 [btrfs]
> [74750.818469]  [] commit_cowonly_roots+0x151/0x213 [btrfs]
> [74750.825940]  [] btrfs_commit_transaction+0x483/0x970 
> [btrfs]
> [74750.833765]  [] transaction_kthread+0x1b5/0x240 [btrfs]
> [74750.841156]  [] ? btrfs_cleanup_transaction+0x550/0x550 
> [btrfs]
> [74750.849244]  [] kthread+0xd2/0xf0
> [74750.854729]  [] ? kthread_create_on_node+0x1c0/0x1c0
> [74750.861861]  [] ret_from_fork+0x58/0x90
> [74750.867868]  [] ? kthread_create_on_node+0x1c0/0x1c0
> 
> [74758.651947] CPU: 5 PID: 13299 Comm: kworker/u16:1 Not tainted 
> 3.13.0-83-generic #127-Ubuntu
> [74758.662554] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 
> 09/07/2015
> [74758.672309] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-15)
> [74758.681114] task: 880734153000 ti: 88078ad98000 task.ti: 
> 88078ad98000
> [74758.690802] RIP: 0010:[]  [] 
> _raw_spin_lock+0x32/0x50
> [74758.701202] RSP: 0018:88078ad99728  EFLAGS: 0202
> [74758.708662] RAX: 4bf5 RBX: 811f0610 RCX: 
> 09a0
> [74758.717910] RDX: 09a4 RSI: 09a4 RDI: 
> 88084f6671b0
> [74758.727114] RBP: 88078ad99728 R08: 88078ad998b7 R09: 
> 0001
> [74758.736311] R10:  R11: ea001b38bc00 R12: 
> 88084e918770
> [74758.745503] R13: 88084e9188c0 R14: 8114f8ee R15: 
> 88078ad99698
> [74758.754665] FS:  () GS:88087fd4() 
> knlGS:
> [74758.764805] CS:  0010 DS:  ES:  CR0: 80050033
> [74758.772582] CR2: 7f4d2b695000 CR3: 01c0e000 CR4: 
> 001407e0
> [74758.781766] Stack:
> [74758.785786]  88078ad99800 a01b07a3 8807e6bc5000 
> 880853034e70
> [74758.795307]  88078ad997b8 88078ad998b7 880853034e40 
> 00040002
> [74758.804838]  88078ad99778 0020 880523f4a800 
> 0001
> [74758.814337] Call Trace:
> [74758.818811]  [] find_free_extent+0x213/0xc30 [btrfs]
> [74758.827389]  [] ? alloc_extent_state+0x21/0xc0 [btrfs]
> [74758.836147]  [] ? __lookup_extent_mapping+0xa0/0x150 
> [btrfs]
> [74758.845413]  [] 

Infinite loop in in btrfs_find_space_cluster() with btrfs_free_cluster::lock held

2016-03-30 Thread Ilya Dryomov
Hi,

We are hitting the attached lockup on a somewhat regular basis during
nightly tests.  Looks like a bunch of CPUs spin in find_free_extent()
on btrfs_free_cluster::lock, which is held by writer, who seems to be
stuck in an endless loop in btrfs_find_space_cluster(), trying to
cleanup bitmaps list.  Smells like a list corruption to me?

The kernel is ancient in btrfs terms - ubuntu's 3.13.0-83-generic, but
the surroundings look sufficiently similar to upstream and given recent
patches like 1b9b922a3a60 ("Btrfs: check for empty bitmap list in
setup_cluster_bitmaps") I thought this might be relevant for upstream.

Thanks,

Ilya
[74750.641965] CPU: 6 PID: 12768 Comm: btrfs-transacti Not tainted 
3.13.0-83-generic #127-Ubuntu
[74750.650976] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 
09/07/2015
[74750.658951] task: 88074b0e8000 ti: 88083c564000 task.ti: 
88083c564000
[74750.666936] RIP: 0010:[]  [] 
_raw_spin_lock+0x37/0x50
[74750.675645] RSP: 0018:88083c565b50  EFLAGS: 0206
[74750.681469] RAX: 18ca RBX: 88083c565b48 RCX: 09a0
[74750.689126] RDX: 09a6 RSI: 09a6 RDI: 88084f6671b0
[74750.696784] RBP: 88083c565b50 R08: 88083c565cef R09: 0001
[74750.704466] R10: a019c9e6 R11: ea001ff7f6c0 R12: 8807e829d4e0
[74750.712140] R13: a02073ab R14: 88083c565ae0 R15: 8807e829d4e0
[74750.719811] FS:  () GS:88087fd8() 
knlGS:
[74750.728450] CS:  0010 DS:  ES:  CR0: 80050033
[74750.734744] CR2: 7f4d2b6a9060 CR3: 01c0e000 CR4: 001407e0
[74750.742437] Stack:
[74750.745001]  88083c565c28 a01b07a3 88074873a000 
880530131d20
[74750.753033]  880791982000 88083c565cef 1000 
00040002
[74750.761065]  a01a3a97 0020 880523f4a800 
0001
[74750.769105] Call Trace:
[74750.772137]  [] find_free_extent+0x213/0xc30 [btrfs]
[74750.779259]  [] ? btrfs_del_items+0x367/0x470 [btrfs]
[74750.786476]  [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
[74750.793869]  [] __btrfs_prealloc_file_range+0xe5/0x380 
[btrfs]
[74750.801876]  [] btrfs_prealloc_file_range_trans+0x30/0x40 
[btrfs]
[74750.810136]  [] btrfs_write_dirty_block_groups+0x4d3/0x620 
[btrfs]
[74750.818469]  [] commit_cowonly_roots+0x151/0x213 [btrfs]
[74750.825940]  [] btrfs_commit_transaction+0x483/0x970 
[btrfs]
[74750.833765]  [] transaction_kthread+0x1b5/0x240 [btrfs]
[74750.841156]  [] ? btrfs_cleanup_transaction+0x550/0x550 
[btrfs]
[74750.849244]  [] kthread+0xd2/0xf0
[74750.854729]  [] ? kthread_create_on_node+0x1c0/0x1c0
[74750.861861]  [] ret_from_fork+0x58/0x90
[74750.867868]  [] ? kthread_create_on_node+0x1c0/0x1c0

[74758.651947] CPU: 5 PID: 13299 Comm: kworker/u16:1 Not tainted 
3.13.0-83-generic #127-Ubuntu
[74758.662554] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 
09/07/2015
[74758.672309] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-15)
[74758.681114] task: 880734153000 ti: 88078ad98000 task.ti: 
88078ad98000
[74758.690802] RIP: 0010:[]  [] 
_raw_spin_lock+0x32/0x50
[74758.701202] RSP: 0018:88078ad99728  EFLAGS: 0202
[74758.708662] RAX: 4bf5 RBX: 811f0610 RCX: 09a0
[74758.717910] RDX: 09a4 RSI: 09a4 RDI: 88084f6671b0
[74758.727114] RBP: 88078ad99728 R08: 88078ad998b7 R09: 0001
[74758.736311] R10:  R11: ea001b38bc00 R12: 88084e918770
[74758.745503] R13: 88084e9188c0 R14: 8114f8ee R15: 88078ad99698
[74758.754665] FS:  () GS:88087fd4() 
knlGS:
[74758.764805] CS:  0010 DS:  ES:  CR0: 80050033
[74758.772582] CR2: 7f4d2b695000 CR3: 01c0e000 CR4: 001407e0
[74758.781766] Stack:
[74758.785786]  88078ad99800 a01b07a3 8807e6bc5000 
880853034e70
[74758.795307]  88078ad997b8 88078ad998b7 880853034e40 
00040002
[74758.804838]  88078ad99778 0020 880523f4a800 
0001
[74758.814337] Call Trace:
[74758.818811]  [] find_free_extent+0x213/0xc30 [btrfs]
[74758.827389]  [] ? alloc_extent_state+0x21/0xc0 [btrfs]
[74758.836147]  [] ? __lookup_extent_mapping+0xa0/0x150 
[btrfs]
[74758.845413]  [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
[74758.854224]  [] cow_file_range+0x135/0x430 [btrfs]
[74758.862593]  [] run_delalloc_range+0x312/0x350 [btrfs]
[74758.871310]  [] ? 
find_lock_delalloc_range.constprop.43+0x1b9/0x1f0 [btrfs]
[74758.881857]  [] ? ata_scsi_queuecmd+0x133/0x400
[74758.889974]  [] __extent_writepage+0x2f4/0x760 [btrfs]
[74758.898703]  [] ? btrfs_add_delayed_iput+0x61/0xc0 [btrfs]
[74758.907763]  [] ? __blk_run_queue+0x33/0x40
[74758.915511]  [] ? find_get_pages_tag+0xd1/0x180
[74758.923600]  [] ? kmem_cache_alloc_trace+0x3c/0x1f0
[74758.932051]  []