>What kernel version? Is it reproducible with something current? i.e.
5.0.6 or ideally 5.1rc6?

4.19.27-gentoo-r1, haven't tried newer.

>And is this actually writes/deletes to NFS as an intermediate to the
Btrfs volume? I can't really tell from the call trace if this is an
issue in nfsd or use case specific problem with NFS on Btrfs. You're
able to directly write/delete with this Btrfs volume?

This happens when directly writing to the volume locally. The volume
is also being used as an NFS share concurrently though. Killing
nfs-server doesn't seem to have any effect.

>I'm wondering if you can issue sysrq+t during the
hang?

It happens randomly. If it happens again soon I'll try this.

On Mon, Apr 22, 2019 at 9:39 PM Chris Murphy <li...@colorremedies.com> wrote:
>
> On Mon, Apr 22, 2019 at 2:38 PM Nathan Dehnel <ncdeh...@gmail.com> wrote:
> >
> > I have a raid10 volume that frequently locks up when I try to write to
> > it or delete things. Any command that touches it will hang (and can't
> > be killed) and I have to start a new ssh session to get into the
> > computer again. Nothing fixes it besides a reboot, and the volume will
> > fail to unmount while the computer is shutting down.
> >
> > [  302.360912] sysrq: SysRq : Show Blocked State
> > [  302.360951]   task                        PC stack   pid father
> > [  302.360987] btrfs-transacti D    0  2187      2 0x80000000
> > [  302.360993] Call Trace:
> > [  302.361007]  ? __schedule+0x59d/0x5f1
> > [  302.361012]  schedule+0x6a/0x85
> > [  302.361019]  btrfs_commit_transaction+0x219/0x7ac
> > [  302.361027]  ? wait_woken+0x6d/0x6d
> > [  302.361031]  transaction_kthread+0xc9/0x135
> > [  302.361036]  ? btrfs_cleanup_transaction+0x4c7/0x4c7
> > [  302.361041]  kthread+0x115/0x11d
> > [  302.361046]  ? kthread_park+0x76/0x76
> > [  302.361050]  ret_from_fork+0x35/0x40
> > [  302.361064] nfsd            D    0  2292      2 0x80000000
> > [  302.361067] Call Trace:
> > [  302.361072]  ? __schedule+0x59d/0x5f1
> > [  302.361077]  schedule+0x6a/0x85
> > [  302.361120]  wait_current_trans+0x9b/0xd8
> > [  302.361126]  ? wait_woken+0x6d/0x6d
> > [  302.361131]  start_transaction+0x1ae/0x38e
> > [  302.361135]  btrfs_create+0x59/0x1d0
> > [  302.361142]  vfs_create+0xbf/0xef
> > [  302.361160]  do_nfsd_create+0x2be/0x41d [nfsd]
> > [  302.361214]  nfsd4_open+0x223/0x578 [nfsd]
> > [  302.361229]  nfsd4_proc_compound+0x44a/0x562 [nfsd]
> > [  302.361240]  nfsd_dispatch+0xb9/0x16e [nfsd]
> > [  302.361258]  svc_process+0x524/0x6e2 [sunrpc]
> > [  302.361270]  ? nfsd_destroy+0x5f/0x5f [nfsd]
> > [  302.361278]  nfsd+0xf9/0x150 [nfsd]
> > [  302.361284]  kthread+0x115/0x11d
> > [  302.361289]  ? kthread_park+0x76/0x76
> > [  302.361292]  ret_from_fork+0x35/0x40
> > [  302.361297] nfsd            D    0  2293      2 0x80000000
> > [  302.361300] Call Trace:
> > [  302.361305]  ? __schedule+0x59d/0x5f1
> > [  302.361309]  schedule+0x6a/0x85
> > [  302.361314]  rwsem_down_write_failed+0x1af/0x210
> > [  302.361325]  ? nfsd_permission+0xa3/0xe8 [nfsd]
> > [  302.361330]  call_rwsem_down_write_failed+0x13/0x20
> > [  302.361335]  down_write+0x20/0x2e
> > [  302.361345]  nfsd_unlink+0xb1/0x16b [nfsd]
> > [  302.361359]  nfsd4_remove+0x4e/0x10a [nfsd]
> > [  302.361371]  nfsd4_proc_compound+0x44a/0x562 [nfsd]
> > [  302.361381]  nfsd_dispatch+0xb9/0x16e [nfsd]
> > [  302.361395]  svc_process+0x524/0x6e2 [sunrpc]
> > [  302.361401]  ? __mutex_unlock_slowpath.isra.6+0x1e8/0x20a
> > [  302.361410]  ? nfsd_destroy+0x5f/0x5f [nfsd]
> > [  302.361419]  nfsd+0xf9/0x150 [nfsd]
> > [  302.361424]  kthread+0x115/0x11d
> > [  302.361428]  ? kthread_park+0x76/0x76
> > [  302.361434]  ret_from_fork+0x35/0x40
> > [  302.361441] rm              D    0  2388   2334 0x00000004
> > [  302.361444] Call Trace:
> > [  302.361449]  ? __schedule+0x59d/0x5f1
> > [  302.361453]  schedule+0x6a/0x85
> > [  302.361457]  wait_current_trans+0x9b/0xd8
> > [  302.361462]  ? wait_woken+0x6d/0x6d
> > [  302.361466]  start_transaction+0x1ae/0x38e
> > [  302.361471]  btrfs_start_transaction_fallback_global_rsv+0x32/0x127
> > [  302.361475]  btrfs_unlink+0x30/0xc0
> > [  302.361478]  vfs_unlink+0xd2/0x147
> > [  302.361482]  do_unlinkat+0x112/0x223
> > [  302.361488]  do_syscall_64+0x7e/0x133
> > [  302.361492]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  302.361496] RIP: 0033:0x7f681509b5d7
> > [  302.361504] Code: Bad RIP value.
> > [  302.361506] RSP: 002b:00007fffb1aed668 EFLAGS: 00000202 ORIG_RAX:
> > 0000000000000107
> > [  302.361510] RAX: ffffffffffffffda RBX: 000055672760c6c0 RCX: 
> > 00007f681509b5d7
> > [  302.361512] RDX: 0000000000000000 RSI: 000055672760b490 RDI: 
> > 00000000ffffff9c
> > [  302.361514] RBP: 0000000000000000 R08: 0000000000000003 R09: 
> > 0000000000000000
> > [  302.361516] R10: fffffffffffff12b R11: 0000000000000202 R12: 
> > 00007fffb1aed848
> > [  302.361518] R13: 000055672760b400 R14: 0000000000000002 R15: 
> > 0000000000000000
>
>
> What kernel version? Is it reproducible with something current? i.e.
> 5.0.6 or ideally 5.1rc6?
>
> And is this actually writes/deletes to NFS as an intermediate to the
> Btrfs volume? I can't really tell from the call trace if this is an
> issue in nfsd or use case specific problem with NFS on Btrfs. You're
> able to directly write/delete with this Btrfs volume?
>
> Since you're getting some information out of the system when this
> happens (call trace) I'm wondering if you can issue sysrq+t during the
> hang? I find setting up sysrq and writing out the trigger command in a
> console (either a tty if you have physical access, or netconsole),
> then reproduce the hang, and then hit return on the console with the
> pre-typed sysrq command. Sometimes the sysrq output is quite a lot for
> the kernel buffer and will overflow dmesg,  so you'll either need to
> use  `log_buf_len=1M` boot parameter, or you can get sysrq output from
> journalctl if it's a system system.
>
>
> --
> Chris Murphy

Reply via email to