On 22.04.19 г. 23:37 ч., Nathan Dehnel wrote:
> I have a raid10 volume that frequently locks up when I try to write to
> it or delete things. Any command that touches it will hang (and can't
> be killed) and I have to start a new ssh session to get into the
> computer again. Nothing fixes it besides a reboot, and the volume will
> fail to unmount while the computer is shutting down.
> 
> [  302.360912] sysrq: SysRq : Show Blocked State
> [  302.360951]   task                        PC stack   pid father
> [  302.360987] btrfs-transacti D    0  2187      2 0x80000000
> [  302.360993] Call Trace:
> [  302.361007]  ? __schedule+0x59d/0x5f1
> [  302.361012]  schedule+0x6a/0x85
> [  302.361019]  btrfs_commit_transaction+0x219/0x7ac
> [  302.361027]  ? wait_woken+0x6d/0x6d
> [  302.361031]  transaction_kthread+0xc9/0x135
> [  302.361036]  ? btrfs_cleanup_transaction+0x4c7/0x4c7
> [  302.361041]  kthread+0x115/0x11d
> [  302.361046]  ? kthread_park+0x76/0x76
> [  302.361050]  ret_from_fork+0x35/0x40

BTRFS is waiting to commit its transaction


> [  302.361064] nfsd            D    0  2292      2 0x80000000
> [  302.361067] Call Trace:
> [  302.361072]  ? __schedule+0x59d/0x5f1
> [  302.361077]  schedule+0x6a/0x85
> [  302.361120]  wait_current_trans+0x9b/0xd8
> [  302.361126]  ? wait_woken+0x6d/0x6d
> [  302.361131]  start_transaction+0x1ae/0x38e
> [  302.361135]  btrfs_create+0x59/0x1d0
> [  302.361142]  vfs_create+0xbf/0xef
> [  302.361160]  do_nfsd_create+0x2be/0x41d [nfsd]
> [  302.361214]  nfsd4_open+0x223/0x578 [nfsd]
> [  302.361229]  nfsd4_proc_compound+0x44a/0x562 [nfsd]
> [  302.361240]  nfsd_dispatch+0xb9/0x16e [nfsd]
> [  302.361258]  svc_process+0x524/0x6e2 [sunrpc]
> [  302.361270]  ? nfsd_destroy+0x5f/0x5f [nfsd]
> [  302.361278]  nfsd+0xf9/0x150 [nfsd]
> [  302.361284]  kthread+0x115/0x11d
> [  302.361289]  ? kthread_park+0x76/0x76
> [  302.361292]  ret_from_fork+0x35/0x40

Here it seems btrfs is exposed via NFS and a client requested a file to
be created and it's waiting for current transaction to finish.

> [  302.361297] nfsd            D    0  2293      2 0x80000000
> [  302.361300] Call Trace:
> [  302.361305]  ? __schedule+0x59d/0x5f1
> [  302.361309]  schedule+0x6a/0x85
> [  302.361314]  rwsem_down_write_failed+0x1af/0x210
> [  302.361325]  ? nfsd_permission+0xa3/0xe8 [nfsd]
> [  302.361330]  call_rwsem_down_write_failed+0x13/0x20
> [  302.361335]  down_write+0x20/0x2e
> [  302.361345]  nfsd_unlink+0xb1/0x16b [nfsd]
> [  302.361359]  nfsd4_remove+0x4e/0x10a [nfsd]
> [  302.361371]  nfsd4_proc_compound+0x44a/0x562 [nfsd]
> [  302.361381]  nfsd_dispatch+0xb9/0x16e [nfsd]
> [  302.361395]  svc_process+0x524/0x6e2 [sunrpc]
> [  302.361401]  ? __mutex_unlock_slowpath.isra.6+0x1e8/0x20a
> [  302.361410]  ? nfsd_destroy+0x5f/0x5f [nfsd]
> [  302.361419]  nfsd+0xf9/0x150 [nfsd]
> [  302.361424]  kthread+0x115/0x11d
> [  302.361428]  ? kthread_park+0x76/0x76
> [  302.361434]  ret_from_fork+0x35/0x40

Here NFSD is waiting on a lock of its own, presumably acquired by PID
2292, which in turn is waiting for btrfs pid 2187

> [  302.361441] rm              D    0  2388   2334 0x00000004
> [  302.361444] Call Trace:
> [  302.361449]  ? __schedule+0x59d/0x5f1
> [  302.361453]  schedule+0x6a/0x85
> [  302.361457]  wait_current_trans+0x9b/0xd8
> [  302.361462]  ? wait_woken+0x6d/0x6d
> [  302.361466]  start_transaction+0x1ae/0x38e
> [  302.361471]  btrfs_start_transaction_fallback_global_rsv+0x32/0x127
> [  302.361475]  btrfs_unlink+0x30/0xc0
> [  302.361478]  vfs_unlink+0xd2/0x147
> [  302.361482]  do_unlinkat+0x112/0x223
> [  302.361488]  do_syscall_64+0x7e/0x133
> [  302.361492]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This rm is waiting again waiting for btrfs' current transaction to finish.

> [  302.361496] RIP: 0033:0x7f681509b5d7
> [  302.361504] Code: Bad RIP value.
> [  302.361506] RSP: 002b:00007fffb1aed668 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000107
> [  302.361510] RAX: ffffffffffffffda RBX: 000055672760c6c0 RCX: 
> 00007f681509b5d7
> [  302.361512] RDX: 0000000000000000 RSI: 000055672760b490 RDI: 
> 00000000ffffff9c
> [  302.361514] RBP: 0000000000000000 R08: 0000000000000003 R09: 
> 0000000000000000
> [  302.361516] R10: fffffffffffff12b R11: 0000000000000202 R12: 
> 00007fffb1aed848
> [  302.361518] R13: 000055672760b400 R14: 0000000000000002 R15: 
> 0000000000000000
> 

There isn't a lot to be done with the information you have provided. At
the very least:

1. Provide backtrace of all threads on the system via "echo t >
/proc/sysrq-trigger"

2. Provide source code line number of
btrfs_commit_transaction+0x219/0x7ac . This can be done by executing the
./faddr2line[0] vmlinux btrfs_commit_transaction+0x219/0x7ac

3. State your kernel version

Of course you will need the unstripped vmlinux image of your kernel.

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/faddr2line

Reply via email to