Re: INFO: task hung in lock_sock_nested (2)
On Mon, Feb 24, 2020 at 09:44:28PM +0800, Hillf Danton wrote: > > On Mon, 24 Feb 2020 11:08:53 +0100 Stefano Garzarella wrote: > > On Sun, Feb 23, 2020 at 03:50:25PM +0800, Hillf Danton wrote: > > > > > > Seems like vsock needs a word to track lock owner in an attempt to > > > avoid trying to lock sock while the current is the lock owner. > > > > Thanks for this possible solution. > > What about using sock_owned_by_user()? > > > No chance for vsock_locked() if it works. > > > We should fix also hyperv_transport, because it could suffer from the same > > problem. > > > You're right. My diff is at most for introducing vsk's lock owner. Sure, thanks for this! > > > At this point, it might be better to call vsk->transport->release(vsk) > > always with the lock taken and remove it in the transports as in the > > following patch. > > > > What do you think? > > > Yes and ... please take a look at the output of grep > > grep -n lock_sock linux/net/vmw_vsock/af_vsock.c > > as it drove me mad. :-) I'll go in this direction and I'll check all the cases. We should avoid to take lock_sock in the transports when it is possible. Thanks for the help, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: INFO: task hung in lock_sock_nested (2)
On Tue, Feb 25, 2020 at 05:44:03AM +, Dexuan Cui wrote: > > From: Stefano Garzarella > > Sent: Monday, February 24, 2020 2:09 AM > > ... > > > > syz-executor280 D27912 9768 9766 0x > > > > Call Trace: > > > > context_switch kernel/sched/core.c:3386 [inline] > > > > __schedule+0x934/0x1f90 kernel/sched/core.c:4082 > > > > schedule+0xdc/0x2b0 kernel/sched/core.c:4156 > > > > __lock_sock+0x165/0x290 net/core/sock.c:2413 > > > > lock_sock_nested+0xfe/0x120 net/core/sock.c:2938 > > > > virtio_transport_release+0xc4/0xd60 > > net/vmw_vsock/virtio_transport_common.c:832 > > > > vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454 > > > > vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288 > > > > __sys_connect_file+0x161/0x1c0 net/socket.c:1857 > > > > __sys_connect+0x174/0x1b0 net/socket.c:1874 > > > > __do_sys_connect net/socket.c:1885 [inline] > > > > __se_sys_connect net/socket.c:1882 [inline] > > > > __x64_sys_connect+0x73/0xb0 net/socket.c:1882 > > > > do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 > > My understanding about the call trace is: in vsock_stream_connect() > after we call lock_sock(sk), we call vsock_assign_transport(), which may > call vsk->transport->release(vsk), i.e. virtio_transport_release(), and in > virtio_transport_release() we try to get the same lock and hang. Yes, that's what I got. > > > > Seems like vsock needs a word to track lock owner in an attempt to > > > avoid trying to lock sock while the current is the lock owner. > > I'm unfamilar with the g2h/h2g :-) > Here I'm wondering if it's acceptable to add an 'already_locked' > parameter like this: > bool already_locked = true; > vsk->transport->release(vsk, already_locked) ? Could be acceptable, but I prefer to avoid. > > > Thanks for this possible solution. > > What about using sock_owned_by_user()? > > > We should fix also hyperv_transport, because it could suffer from the same > > problem. > > IIUC hyperv_transport doesn't supprot the h2g/g2h feature, so it should not > suffers from the deadlock issue here? The h2g/g2h is in the vsock core, and the hyperv_transport is one of the g2h transports available. If we have a L1 VM with hyperv_transport (G2H) to communicate with L0 and a nested KVM VM (L2) we need to load also vhost_transport (H2G). If the user creates a socket and it tries the following: connect(fd, <2,1234>) - socket assigned to hyperv_transport (because the user wants to reach the host using CID_HOST) fails connect(fd, <3,1234>) - socket must be reassigned to vhost_transport (because the user wants to reach a nested guest) So, I think in this case we can have the deadlock. > > > At this point, it might be better to call vsk->transport->release(vsk) > > always with the lock taken and remove it in the transports as in the > > following patch. > > > > What do you think? > > > > > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c > > index 9c5b2a91baad..a073d8efca33 100644 > > --- a/net/vmw_vsock/af_vsock.c > > +++ b/net/vmw_vsock/af_vsock.c > > @@ -753,20 +753,18 @@ static void __vsock_release(struct sock *sk, int > > level) > > vsk = vsock_sk(sk); > > pending = NULL; /* Compiler warning. */ > > > > - /* The release call is supposed to use lock_sock_nested() > > -* rather than lock_sock(), if a sock lock should be acquired. > > -*/ > > - if (vsk->transport) > > - vsk->transport->release(vsk); > > - else if (sk->sk_type == SOCK_STREAM) > > - vsock_remove_sock(vsk); > > - > > /* When "level" is SINGLE_DEPTH_NESTING, use the nested > > * version to avoid the warning "possible recursive locking > > * detected". When "level" is 0, lock_sock_nested(sk, level) > > * is the same as lock_sock(sk). > > */ > > lock_sock_nested(sk, level); > > + > > + if (vsk->transport) > > + vsk->transport->release(vsk); > > + else if (sk->sk_type == SOCK_STREAM) > > + vsock_remove_sock(vsk); > > + > > sock_orphan(sk); > > sk->sk_shutdown = SHUTDOWN_MASK; > > > > diff --git a/net/vmw_vsock/hyperv_transport.c > > b/net/vmw_vsock/hyperv_transport.c > > index 3492c021925f..510f25f4a856 100644 > > --- a/net/vmw_vsock/hyperv_transport.c > > +++ b/net/vmw_vsock/hyperv_transport.c > > @@ -529,9 +529,7 @@ static void hvs_release(struct vsock_sock *vsk) > > struct sock *sk = sk_vsock(vsk); > > bool remove_sock; > > > > - lock_sock_nested(sk, SINGLE_DEPTH_NESTING); > > remove_sock = hvs_close_lock_held(vsk); > > - release_sock(sk); > > if (remove_sock) > > vsock_remove_sock(vsk); > > } > > This looks good to me, but do we know why vsk->transport->release(vsk) > is called without
Re: INFO: task hung in lock_sock_nested (2)
On Mon, 24 Feb 2020 11:08:53 +0100 Stefano Garzarella wrote: > On Sun, Feb 23, 2020 at 03:50:25PM +0800, Hillf Danton wrote: > > > > Seems like vsock needs a word to track lock owner in an attempt to > > avoid trying to lock sock while the current is the lock owner. > > Thanks for this possible solution. > What about using sock_owned_by_user()? > No chance for vsock_locked() if it works. > We should fix also hyperv_transport, because it could suffer from the same > problem. > You're right. My diff is at most for introducing vsk's lock owner. > At this point, it might be better to call vsk->transport->release(vsk) > always with the lock taken and remove it in the transports as in the > following patch. > > What do you think? > Yes and ... please take a look at the output of grep grep -n lock_sock linux/net/vmw_vsock/af_vsock.c as it drove me mad. > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c > index 9c5b2a91baad..a073d8efca33 100644 > --- a/net/vmw_vsock/af_vsock.c > +++ b/net/vmw_vsock/af_vsock.c > @@ -753,20 +753,18 @@ static void __vsock_release(struct sock *sk, int level) > vsk = vsock_sk(sk); > pending = NULL; /* Compiler warning. */ > > - /* The release call is supposed to use lock_sock_nested() > - * rather than lock_sock(), if a sock lock should be acquired. > - */ > - if (vsk->transport) > - vsk->transport->release(vsk); > - else if (sk->sk_type == SOCK_STREAM) > - vsock_remove_sock(vsk); > - > /* When "level" is SINGLE_DEPTH_NESTING, use the nested >* version to avoid the warning "possible recursive locking >* detected". When "level" is 0, lock_sock_nested(sk, level) >* is the same as lock_sock(sk). >*/ > lock_sock_nested(sk, level); > + > + if (vsk->transport) > + vsk->transport->release(vsk); > + else if (sk->sk_type == SOCK_STREAM) > + vsock_remove_sock(vsk); > + > sock_orphan(sk); > sk->sk_shutdown = SHUTDOWN_MASK; > > diff --git a/net/vmw_vsock/hyperv_transport.c > b/net/vmw_vsock/hyperv_transport.c > index 3492c021925f..510f25f4a856 100644 > --- a/net/vmw_vsock/hyperv_transport.c > +++ b/net/vmw_vsock/hyperv_transport.c > @@ -529,9 +529,7 @@ static void hvs_release(struct vsock_sock *vsk) > struct sock *sk = sk_vsock(vsk); > bool remove_sock; > > - lock_sock_nested(sk, SINGLE_DEPTH_NESTING); > remove_sock = hvs_close_lock_held(vsk); > - release_sock(sk); > if (remove_sock) > vsock_remove_sock(vsk); > } > diff --git a/net/vmw_vsock/virtio_transport_common.c > b/net/vmw_vsock/virtio_transport_common.c > index d9f0c9c5425a..f3c4bab2f737 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -829,7 +829,6 @@ void virtio_transport_release(struct vsock_sock *vsk) > struct sock *sk = >sk; > bool remove_sock = true; > > - lock_sock_nested(sk, SINGLE_DEPTH_NESTING); > if (sk->sk_type == SOCK_STREAM) > remove_sock = virtio_transport_close(vsk); > > @@ -837,7 +836,6 @@ void virtio_transport_release(struct vsock_sock *vsk) > list_del(>list); > virtio_transport_free_pkt(pkt); > } > - release_sock(sk); > > if (remove_sock) > vsock_remove_sock(vsk); > > Thanks, > Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: INFO: task hung in lock_sock_nested (2)
On Sun, Feb 23, 2020 at 03:50:25PM +0800, Hillf Danton wrote: > On Sat, 22 Feb 2020 10:58:12 -0800 > > syzbot found the following crash on: > > > > HEAD commit:2bb07f4e tc-testing: updated tdc tests for basic filter > > git tree: net-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=122efdede0 > > kernel config: https://syzkaller.appspot.com/x/.config?x=768cc3d3e277cc16 > > dashboard link: https://syzkaller.appspot.com/bug?extid=731710996d79d0d58fbc > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14887de9e0 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=149eec81e0 > > > > The bug was bisected to: > > > > commit 408624af4c89989117bb2c6517bd50b7708a2fcd > > Author: Stefano Garzarella > > Date: Tue Dec 10 10:43:06 2019 + > > > > vsock: use local transport when it is loaded > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1011e27ee0 > > final crash:https://syzkaller.appspot.com/x/report.txt?x=1211e27ee0 > > console output: https://syzkaller.appspot.com/x/log.txt?x=1411e27ee0 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+731710996d79d0d58...@syzkaller.appspotmail.com > > Fixes: 408624af4c89 ("vsock: use local transport when it is loaded") > > > > INFO: task syz-executor280:9768 blocked for more than 143 seconds. > > Not tainted 5.6.0-rc1-syzkaller #0 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > syz-executor280 D27912 9768 9766 0x > > Call Trace: > > context_switch kernel/sched/core.c:3386 [inline] > > __schedule+0x934/0x1f90 kernel/sched/core.c:4082 > > schedule+0xdc/0x2b0 kernel/sched/core.c:4156 > > __lock_sock+0x165/0x290 net/core/sock.c:2413 > > lock_sock_nested+0xfe/0x120 net/core/sock.c:2938 > > virtio_transport_release+0xc4/0xd60 > > net/vmw_vsock/virtio_transport_common.c:832 > > vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454 > > vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288 > > __sys_connect_file+0x161/0x1c0 net/socket.c:1857 > > __sys_connect+0x174/0x1b0 net/socket.c:1874 > > __do_sys_connect net/socket.c:1885 [inline] > > __se_sys_connect net/socket.c:1882 [inline] > > __x64_sys_connect+0x73/0xb0 net/socket.c:1882 > > do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > RIP: 0033:0x440209 > > Code: Bad RIP value. > > RSP: 002b:7ffdb9f67718 EFLAGS: 0246 ORIG_RAX: 002a > > RAX: ffda RBX: 004002c8 RCX: 00440209 > > RDX: 0010 RSI: 2440 RDI: 0003 > > RBP: 006ca018 R08: 004002c8 R09: 004002c8 > > R10: 004002c8 R11: 0246 R12: 00401a90 > > R13: 00401b20 R14: R15: > > > > Showing all locks held in the system: > > 1 lock held by khungtaskd/951: > > #0: 89bac240 (rcu_read_lock){}, at: > > debug_show_all_locks+0x5f/0x279 kernel/locking/lockdep.c:5333 > > 1 lock held by rsyslogd/9652: > > #0: 8880a6533120 (>f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110 > > fs/file.c:821 > > 2 locks held by getty/9742: > > #0: 8880a693f090 (>ldisc_sem){}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: c900061bb2e0 (>atomic_read_lock){+.+.}, at: > > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/9743: > > #0: 88809f7a1090 (>ldisc_sem){}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: c900061b72e0 (>atomic_read_lock){+.+.}, at: > > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/9744: > > #0: 88809be3e090 (>ldisc_sem){}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: c900061632e0 (>atomic_read_lock){+.+.}, at: > > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/9745: > > #0: 88808eb1e090 (>ldisc_sem){}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: c900061bf2e0 (>atomic_read_lock){+.+.}, at: > > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/9746: > > #0: 88808d33a090 (>ldisc_sem){}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: c900061732e0 (>atomic_read_lock){+.+.}, at: > > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/9747: > > #0: 8880a6a0c090 (>ldisc_sem){}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: c900061c32e0 (>atomic_read_lock){+.+.}, at: > > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/9748: > > #0: 8880a6e4d090 (>ldisc_sem){}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: c900061332e0
Re: INFO: task hung in lock_sock_nested (2)
On Sat, 22 Feb 2020 10:58:12 -0800 > syzbot found the following crash on: > > HEAD commit:2bb07f4e tc-testing: updated tdc tests for basic filter > git tree: net-next > console output: https://syzkaller.appspot.com/x/log.txt?x=122efdede0 > kernel config: https://syzkaller.appspot.com/x/.config?x=768cc3d3e277cc16 > dashboard link: https://syzkaller.appspot.com/bug?extid=731710996d79d0d58fbc > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14887de9e0 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=149eec81e0 > > The bug was bisected to: > > commit 408624af4c89989117bb2c6517bd50b7708a2fcd > Author: Stefano Garzarella > Date: Tue Dec 10 10:43:06 2019 + > > vsock: use local transport when it is loaded > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1011e27ee0 > final crash:https://syzkaller.appspot.com/x/report.txt?x=1211e27ee0 > console output: https://syzkaller.appspot.com/x/log.txt?x=1411e27ee0 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+731710996d79d0d58...@syzkaller.appspotmail.com > Fixes: 408624af4c89 ("vsock: use local transport when it is loaded") > > INFO: task syz-executor280:9768 blocked for more than 143 seconds. > Not tainted 5.6.0-rc1-syzkaller #0 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > syz-executor280 D27912 9768 9766 0x > Call Trace: > context_switch kernel/sched/core.c:3386 [inline] > __schedule+0x934/0x1f90 kernel/sched/core.c:4082 > schedule+0xdc/0x2b0 kernel/sched/core.c:4156 > __lock_sock+0x165/0x290 net/core/sock.c:2413 > lock_sock_nested+0xfe/0x120 net/core/sock.c:2938 > virtio_transport_release+0xc4/0xd60 > net/vmw_vsock/virtio_transport_common.c:832 > vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454 > vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288 > __sys_connect_file+0x161/0x1c0 net/socket.c:1857 > __sys_connect+0x174/0x1b0 net/socket.c:1874 > __do_sys_connect net/socket.c:1885 [inline] > __se_sys_connect net/socket.c:1882 [inline] > __x64_sys_connect+0x73/0xb0 net/socket.c:1882 > do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x440209 > Code: Bad RIP value. > RSP: 002b:7ffdb9f67718 EFLAGS: 0246 ORIG_RAX: 002a > RAX: ffda RBX: 004002c8 RCX: 00440209 > RDX: 0010 RSI: 2440 RDI: 0003 > RBP: 006ca018 R08: 004002c8 R09: 004002c8 > R10: 004002c8 R11: 0246 R12: 00401a90 > R13: 00401b20 R14: R15: > > Showing all locks held in the system: > 1 lock held by khungtaskd/951: > #0: 89bac240 (rcu_read_lock){}, at: > debug_show_all_locks+0x5f/0x279 kernel/locking/lockdep.c:5333 > 1 lock held by rsyslogd/9652: > #0: 8880a6533120 (>f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110 > fs/file.c:821 > 2 locks held by getty/9742: > #0: 8880a693f090 (>ldisc_sem){}, at: ldsem_down_read+0x33/0x40 > drivers/tty/tty_ldsem.c:340 > #1: c900061bb2e0 (>atomic_read_lock){+.+.}, at: > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > 2 locks held by getty/9743: > #0: 88809f7a1090 (>ldisc_sem){}, at: ldsem_down_read+0x33/0x40 > drivers/tty/tty_ldsem.c:340 > #1: c900061b72e0 (>atomic_read_lock){+.+.}, at: > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > 2 locks held by getty/9744: > #0: 88809be3e090 (>ldisc_sem){}, at: ldsem_down_read+0x33/0x40 > drivers/tty/tty_ldsem.c:340 > #1: c900061632e0 (>atomic_read_lock){+.+.}, at: > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > 2 locks held by getty/9745: > #0: 88808eb1e090 (>ldisc_sem){}, at: ldsem_down_read+0x33/0x40 > drivers/tty/tty_ldsem.c:340 > #1: c900061bf2e0 (>atomic_read_lock){+.+.}, at: > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > 2 locks held by getty/9746: > #0: 88808d33a090 (>ldisc_sem){}, at: ldsem_down_read+0x33/0x40 > drivers/tty/tty_ldsem.c:340 > #1: c900061732e0 (>atomic_read_lock){+.+.}, at: > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > 2 locks held by getty/9747: > #0: 8880a6a0c090 (>ldisc_sem){}, at: ldsem_down_read+0x33/0x40 > drivers/tty/tty_ldsem.c:340 > #1: c900061c32e0 (>atomic_read_lock){+.+.}, at: > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > 2 locks held by getty/9748: > #0: 8880a6e4d090 (>ldisc_sem){}, at: ldsem_down_read+0x33/0x40 > drivers/tty/tty_ldsem.c:340 > #1: c900061332e0 (>atomic_read_lock){+.+.}, at: > n_tty_read+0x220/0x1bf0 drivers/tty/n_tty.c:2156 > 1 lock held by syz-executor280/9768: > #0: 8880987cb8d0 (sk_lock-AF_VSOCK){+.+.}, at: lock_sock > include/net/sock.h:1516 [inline] > #0: 8880987cb8d0