On Mon, Apr 27, 2026 at 7:15 PM Stefano Garzarella <[email protected]> wrote: > > On Fri, Apr 24, 2026 at 08:33:10PM +0530, Deepanshu Kartikey wrote: > >Two bugs exist in virtio_transport_recv_listen(): > > Two bugs, two fixes, two patches usually. > > > > >1. On the transport assignment error path, sk_acceptq_added() is called > > but sk_acceptq_removed() is never called when vsock_assign_transport() > > fails or assigns a different transport than expected. This causes the > > parent listener's accept backlog counter to be permanently inflated, > > eventually causing sk_acceptq_is_full() to reject legitimate incoming > > connections. > > Wait, I can't see this issue. sk_acceptq_added() is called after the > vsock_assign_transport(), so why we should call sk_acceptq_removed() > in the error path of vsock_assign_transport()? > > Maybe I'm missing something. > > > > >2. There is a race between __vsock_release() and vsock_enqueue_accept(). > > __vsock_release() sets sk->sk_shutdown to SHUTDOWN_MASK and flushes > > the accept queue under the parent socket lock. However, > > virtio_transport_recv_listen() checks sk_shutdown and subsequently > > calls vsock_enqueue_accept() without holding the parent socket lock. > > Are you sure about this? > > virtio_transport_recv_listen() is called only by > virtio_transport_recv_pkt() after calling lock_sock(sk), so I'm really > confused. > > > This means a child socket can be enqueued after __vsock_release() has > > already flushed the queue, causing the child socket and its associated > > resources to leak > > permanently. The existing comment in the code hints at this race but > > the fix was never implemented. > > Are you referring to: > /* __vsock_release() might have already flushed accept_queue. > * Subsequent enqueues would lead to a memory leak. > */ > if (sk->sk_shutdown == SHUTDOWN_MASK) { > virtio_transport_reset_no_sock(t, skb, sock_net(sk)); > return -ESHUTDOWN; > } > > In this case I think we are saying that we are doing this check exactly > to avoid that issue. > > > > >Fix both issues: add sk_acceptq_removed() on the transport error path, > > Again, better to fix the 2 issues with 2 patches (same series is fine). > > >and re-check sk->sk_shutdown under the parent socket lock before calling > >vsock_enqueue_accept() to close the race window. The child socket lock > >is released before acquiring the parent socket lock to maintain correct > >lock ordering (parent before child). > > > > We are missing the Fixes tag, and I think we can target the `net` tree > with this patch (i.e. [PATCH net]), see: > https://www.kernel.org/doc/html/next/process/maintainer-netdev.html > > >Reported-by: [email protected] > >Closes: https://syzkaller.appspot.com/bug?extid=1b2c9c4a0f8708082678 > >Tested-by: [email protected] > >Signed-off-by: Deepanshu Kartikey <[email protected]> > >--- > > net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > >diff --git a/net/vmw_vsock/virtio_transport_common.c > >b/net/vmw_vsock/virtio_transport_common.c > >index 416d533f493d..fad5fa4a4296 100644 > >--- a/net/vmw_vsock/virtio_transport_common.c > >+++ b/net/vmw_vsock/virtio_transport_common.c > >@@ -1578,6 +1578,7 @@ virtio_transport_recv_listen(struct sock *sk, struct > >sk_buff *skb, > > */ > > if (ret || vchild->transport != &t->transport) { > > release_sock(child); > >+ sk_acceptq_removed(sk); > > Ditto, are we sure about this? > > > virtio_transport_reset_no_sock(t, skb, sock_net(sk)); > > sock_put(child); > > return ret; > >@@ -1588,11 +1589,19 @@ virtio_transport_recv_listen(struct sock *sk, struct > >sk_buff *skb, > > child->sk_write_space(child); > > > > vsock_insert_connected(vchild); > >+ release_sock(child); > >+ lock_sock(sk); > > IMO this is a deadlock with the lock_sock(sk) called by the caller. > > Also a comment would be helpful here to explain why we're doing this. > > >+ if (sk->sk_shutdown == SHUTDOWN_MASK) { > >+ release_sock(sk); > >+ sk_acceptq_removed(sk); > >+ virtio_transport_reset_no_sock(t, skb, sock_net(sk)); > >+ sock_put(child); > >+ return -ESHUTDOWN; > > Since this is very similar to the error path of > vsock_assign_transport(), I think it would be better to start by > defining a common error path for the function and use labels to exit, so > we can avoid duplicating the code multiple times. > > >+ } > > vsock_enqueue_accept(sk, child); > >+ release_sock(sk); > > virtio_transport_send_response(vchild, skb); > > > >- release_sock(child); > >- > > TBH I'm really worried about this patch since both fixes are completely > wrong IMO. > > Thanks, > Stefano > > > sk->sk_data_ready(sk); > > return 0; > > } > >-- > >2.43.0 > > >
Hi Stefano, Thank you for the detailed review! You are correct on both points. I apologize for the confusion — I was looking at an older version of the code where sk_acceptq_added() was called BEFORE vsock_assign_transport(), which made the sk_acceptq_removed() fix appear necessary. In the current kernel, sk_acceptq_added() is already moved to after vsock_assign_transport(), so that issue no longer exists. Regarding the lock_sock(sk) fix — you are also correct that virtio_transport_recv_pkt() already holds lock_sock(sk) before calling virtio_transport_recv_listen(), so our second fix would indeed cause a deadlock. I missed that completely. I am still investigating the root cause of the memory leak reported by syzbot. The backtrace points to the vsock loopback path (vsock_loopback_work), so I am looking there next. I will send a v2 once I have a correct analysis and fix. Sorry again for the noise. Thanks, Deeranshu Kartikey

