On 22.04.19 г. 23:37 ч., Nathan Dehnel wrote:
> I have a raid10 volume that frequently locks up when I try to write to
> it or delete things. Any command that touches it will hang (and can't
> be killed) and I have to start a new ssh session to get into the
> computer again. Nothing fixes it besides a reboot, and the volume will
> fail to unmount while the computer is shutting down.
>
> [ 302.360912] sysrq: SysRq : Show Blocked State
> [ 302.360951] task PC stack pid father
> [ 302.360987] btrfs-transacti D 0 2187 2 0x80000000
> [ 302.360993] Call Trace:
> [ 302.361007] ? __schedule+0x59d/0x5f1
> [ 302.361012] schedule+0x6a/0x85
> [ 302.361019] btrfs_commit_transaction+0x219/0x7ac
> [ 302.361027] ? wait_woken+0x6d/0x6d
> [ 302.361031] transaction_kthread+0xc9/0x135
> [ 302.361036] ? btrfs_cleanup_transaction+0x4c7/0x4c7
> [ 302.361041] kthread+0x115/0x11d
> [ 302.361046] ? kthread_park+0x76/0x76
> [ 302.361050] ret_from_fork+0x35/0x40
BTRFS is waiting to commit its transaction
> [ 302.361064] nfsd D 0 2292 2 0x80000000
> [ 302.361067] Call Trace:
> [ 302.361072] ? __schedule+0x59d/0x5f1
> [ 302.361077] schedule+0x6a/0x85
> [ 302.361120] wait_current_trans+0x9b/0xd8
> [ 302.361126] ? wait_woken+0x6d/0x6d
> [ 302.361131] start_transaction+0x1ae/0x38e
> [ 302.361135] btrfs_create+0x59/0x1d0
> [ 302.361142] vfs_create+0xbf/0xef
> [ 302.361160] do_nfsd_create+0x2be/0x41d [nfsd]
> [ 302.361214] nfsd4_open+0x223/0x578 [nfsd]
> [ 302.361229] nfsd4_proc_compound+0x44a/0x562 [nfsd]
> [ 302.361240] nfsd_dispatch+0xb9/0x16e [nfsd]
> [ 302.361258] svc_process+0x524/0x6e2 [sunrpc]
> [ 302.361270] ? nfsd_destroy+0x5f/0x5f [nfsd]
> [ 302.361278] nfsd+0xf9/0x150 [nfsd]
> [ 302.361284] kthread+0x115/0x11d
> [ 302.361289] ? kthread_park+0x76/0x76
> [ 302.361292] ret_from_fork+0x35/0x40
Here it seems btrfs is exposed via NFS and a client requested a file to
be created and it's waiting for current transaction to finish.
> [ 302.361297] nfsd D 0 2293 2 0x80000000
> [ 302.361300] Call Trace:
> [ 302.361305] ? __schedule+0x59d/0x5f1
> [ 302.361309] schedule+0x6a/0x85
> [ 302.361314] rwsem_down_write_failed+0x1af/0x210
> [ 302.361325] ? nfsd_permission+0xa3/0xe8 [nfsd]
> [ 302.361330] call_rwsem_down_write_failed+0x13/0x20
> [ 302.361335] down_write+0x20/0x2e
> [ 302.361345] nfsd_unlink+0xb1/0x16b [nfsd]
> [ 302.361359] nfsd4_remove+0x4e/0x10a [nfsd]
> [ 302.361371] nfsd4_proc_compound+0x44a/0x562 [nfsd]
> [ 302.361381] nfsd_dispatch+0xb9/0x16e [nfsd]
> [ 302.361395] svc_process+0x524/0x6e2 [sunrpc]
> [ 302.361401] ? __mutex_unlock_slowpath.isra.6+0x1e8/0x20a
> [ 302.361410] ? nfsd_destroy+0x5f/0x5f [nfsd]
> [ 302.361419] nfsd+0xf9/0x150 [nfsd]
> [ 302.361424] kthread+0x115/0x11d
> [ 302.361428] ? kthread_park+0x76/0x76
> [ 302.361434] ret_from_fork+0x35/0x40
Here NFSD is waiting on a lock of its own, presumably acquired by PID
2292, which in turn is waiting for btrfs pid 2187
> [ 302.361441] rm D 0 2388 2334 0x00000004
> [ 302.361444] Call Trace:
> [ 302.361449] ? __schedule+0x59d/0x5f1
> [ 302.361453] schedule+0x6a/0x85
> [ 302.361457] wait_current_trans+0x9b/0xd8
> [ 302.361462] ? wait_woken+0x6d/0x6d
> [ 302.361466] start_transaction+0x1ae/0x38e
> [ 302.361471] btrfs_start_transaction_fallback_global_rsv+0x32/0x127
> [ 302.361475] btrfs_unlink+0x30/0xc0
> [ 302.361478] vfs_unlink+0xd2/0x147
> [ 302.361482] do_unlinkat+0x112/0x223
> [ 302.361488] do_syscall_64+0x7e/0x133
> [ 302.361492] entry_SYSCALL_64_after_hwframe+0x44/0xa9
This rm is waiting again waiting for btrfs' current transaction to finish.
> [ 302.361496] RIP: 0033:0x7f681509b5d7
> [ 302.361504] Code: Bad RIP value.
> [ 302.361506] RSP: 002b:00007fffb1aed668 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000107
> [ 302.361510] RAX: ffffffffffffffda RBX: 000055672760c6c0 RCX:
> 00007f681509b5d7
> [ 302.361512] RDX: 0000000000000000 RSI: 000055672760b490 RDI:
> 00000000ffffff9c
> [ 302.361514] RBP: 0000000000000000 R08: 0000000000000003 R09:
> 0000000000000000
> [ 302.361516] R10: fffffffffffff12b R11: 0000000000000202 R12:
> 00007fffb1aed848
> [ 302.361518] R13: 000055672760b400 R14: 0000000000000002 R15:
> 0000000000000000
>
There isn't a lot to be done with the information you have provided. At
the very least:
1. Provide backtrace of all threads on the system via "echo t >
/proc/sysrq-trigger"
2. Provide source code line number of
btrfs_commit_transaction+0x219/0x7ac . This can be done by executing the
./faddr2line[0] vmlinux btrfs_commit_transaction+0x219/0x7ac
3. State your kernel version
Of course you will need the unstripped vmlinux image of your kernel.
[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/faddr2line