> On Jan 13, 2018, at 18:27, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
> On 2018年01月14日 09:36, Johannes Ernst wrote:
>> Summary: frequent “hangs” of the system with dmesg saying:
>> 
>> task systemd:22229 blocked for more than 120 seconds.
>> [ 2948.928653]       Tainted: G         C O    4.14.9-1-ARCH #1
>> [ 2948.928658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> this message.
>> [ 2948.928665] systemd         D    0 22229  22226 0x00000100
>> [ 2948.928671] Call Trace:
>> [ 2948.928684]  ? __schedule+0x290/0x890
>> [ 2948.928690]  schedule+0x2f/0x90
>> [ 2948.928744]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
>> [ 2948.928752]  ? wait_woken+0x80/0x80
>> [ 2948.928799]  find_parent_nodes+0x341/0xfd0 [btrfs]
>> [ 2948.928827]  ? btrfs_search_slot+0x84c/0x9f0 [btrfs]
>> [ 2948.928873]  ? btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
>> [ 2948.928912]  btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
>> [ 2948.928950]  btrfs_find_all_roots+0x45/0x60 [btrfs]
>> [ 2948.928987]  btrfs_qgroup_trace_extent_post+0x30/0x60 [btrfs]
> 
> You're using qgroup, and the timing is to find the old_roots of one
> extent, which will only search commit roots.
> 
> Normally this shouldn't cause any problem, especially for commit roots.
> 
> Is there any special operation happening?

Nope. It appears it happens right when systemd-nspawn begins to run and I am 
not executing any other commands.

I did not realize qgroups are involved … all I did is mkfs.btrfs and running 
systemd-nspawn :-) Perhaps the defaults should be qgroups off? (But I digress)

>> That works well … but not for long. Often we don’t make it through the test 
>> suite and the starting of new containers hangs. Other filesystem operations 
>> also hang. The above stacktrace, or something rather similar shows up in 
>> dmesg (not in the journal, because that hangs, too!) This is repeated, but I 
>> don’t see any other relevant messages. Only a reboot seems to allows to 
>> recover.
> 
> So Qgroup is used to limit disk usage of each container?
> 
> Maybe it's related to subvolume deletion?
> 
> Anyway, if qgroup is not necessary, disabling qgroup should fix your
> problem.
> 
> Despite of that, did that really hangs?
> Qgroup dramatically increase overhead to delete a subvolume or balance
> the fs.
> Maybe it's just a little slow?

I have waited for several hours and the system has not recovered (me walking 
away from the running tests, returning hours later).

Update: so I executed "btrfs quota disable” on all relevant file systems. 
(right?) Running tests again, this morning I’m getting:

INFO: task systemd-journal:20876 blocked for more than 120 seconds.
[ 5037.962603]       Tainted: G         C O    4.14.9-1-ARCH #1
[ 5037.962609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 5037.962616] systemd-journal D    0 20876  20860 0x00000100
[ 5037.962622] Call Trace:
[ 5037.962635]  ? __schedule+0x290/0x890
[ 5037.962640]  ? __slab_free+0x14e/0x300
[ 5037.962645]  ? _copy_to_iter+0x8f/0x3d0
[ 5037.962651]  schedule+0x2f/0x90
[ 5037.962704]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
[ 5037.962713]  ? wait_woken+0x80/0x80
[ 5037.962739]  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[ 5037.962767]  btrfs_search_slot+0x703/0x9f0 [btrfs]
[ 5037.962796]  btrfs_insert_empty_items+0x66/0xb0 [btrfs]
[ 5037.962841]  btrfs_insert_orphan_item+0x66/0xa0 [btrfs]
[ 5037.962880]  btrfs_orphan_add+0xa1/0x200 [btrfs]
[ 5037.962919]  btrfs_setattr+0x123/0x3b0 [btrfs]
[ 5037.962926]  notify_change+0x2fd/0x420
[ 5037.962933]  do_truncate+0x75/0xc0
[ 5037.962940]  do_sys_ftruncate.constprop.19+0xe7/0x100
[ 5037.962947]  do_syscall_64+0x55/0x110
[ 5037.962952]  entry_SYSCALL64_slow_path+0x25/0x25
[ 5037.962956] RIP: 0033:0x7fd9423697ba
[ 5037.962958] RSP: 002b:00007fff1179cc18 EFLAGS: 00000206 ORIG_RAX: 
000000000000004d
[ 5037.962962] RAX: ffffffffffffffda RBX: 000055bd48cbe9f0 RCX: 00007fd9423697ba
[ 5037.962965] RDX: 000055bd48cbe660 RSI: 0000000000640000 RDI: 000000000000000f
[ 5037.962966] RBP: 00007fff1179cc50 R08: 000055bd48cbc62c R09: 000055bd48cbea6c
[ 5037.962968] R10: 000055bd48cbe9f0 R11: 0000000000000206 R12: 00007fff1179cc48
[ 5037.962970] R13: 0000000000000003 R14: 000562b7234e71a9 R15: 000055bd47749ca0

and doing a simple “touch /build/tmp/foo” never returns. 10+ hours have passed 
since the previous command was issued.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to