On Mon, Apr 27, 2026 at 7:15 PM Stefano Garzarella <[email protected]> wrote:
>
> On Fri, Apr 24, 2026 at 08:33:10PM +0530, Deepanshu Kartikey wrote:
> >Two bugs exist in virtio_transport_recv_listen():
>
> Two bugs, two fixes, two patches usually.
>
> >
> >1. On the transport assignment error path, sk_acceptq_added() is called
> >   but sk_acceptq_removed() is never called when vsock_assign_transport()
> >   fails or assigns a different transport than expected. This causes the
> >   parent listener's accept backlog counter to be permanently inflated,
> >   eventually causing sk_acceptq_is_full() to reject legitimate incoming
> >   connections.
>
> Wait, I can't see this issue. sk_acceptq_added() is called after the
> vsock_assign_transport(), so why we should call sk_acceptq_removed()
> in the error path of vsock_assign_transport()?
>
> Maybe I'm missing something.
>
> >
> >2. There is a race between __vsock_release() and vsock_enqueue_accept().
> >   __vsock_release() sets sk->sk_shutdown to SHUTDOWN_MASK and flushes
> >   the accept queue under the parent socket lock. However,
> >   virtio_transport_recv_listen() checks sk_shutdown and subsequently
> >   calls vsock_enqueue_accept() without holding the parent socket lock.
>
> Are you sure about this?
>
> virtio_transport_recv_listen() is called only by
> virtio_transport_recv_pkt() after calling lock_sock(sk), so I'm really
> confused.
>
> >   This means a child socket can be enqueued after __vsock_release() has
> >   already flushed the queue, causing the child socket and its associated
> >   resources to leak
> >   permanently. The existing comment in the code hints at this race but
> >   the fix was never implemented.
>
> Are you referring to:
>         /* __vsock_release() might have already flushed accept_queue.
>          * Subsequent enqueues would lead to a memory leak.
>          */
>         if (sk->sk_shutdown == SHUTDOWN_MASK) {
>                 virtio_transport_reset_no_sock(t, skb, sock_net(sk));
>                 return -ESHUTDOWN;
>         }
>
> In this case I think we are saying that we are doing this check exactly
> to avoid that issue.
>
> >
> >Fix both issues: add sk_acceptq_removed() on the transport error path,
>
> Again, better to fix the 2 issues with 2 patches (same series is fine).
>
> >and re-check sk->sk_shutdown under the parent socket lock before calling
> >vsock_enqueue_accept() to close the race window. The child socket lock
> >is released before acquiring the parent socket lock to maintain correct
> >lock ordering (parent before child).
> >
>
> We are missing the Fixes tag, and I think we can target the `net` tree
> with this patch (i.e. [PATCH net]), see:
> https://www.kernel.org/doc/html/next/process/maintainer-netdev.html
>
> >Reported-by: [email protected]
> >Closes: https://syzkaller.appspot.com/bug?extid=1b2c9c4a0f8708082678
> >Tested-by: [email protected]
> >Signed-off-by: Deepanshu Kartikey <[email protected]>
> >---
> > net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++--
> > 1 file changed, 11 insertions(+), 2 deletions(-)
> >
> >diff --git a/net/vmw_vsock/virtio_transport_common.c 
> >b/net/vmw_vsock/virtio_transport_common.c
> >index 416d533f493d..fad5fa4a4296 100644
> >--- a/net/vmw_vsock/virtio_transport_common.c
> >+++ b/net/vmw_vsock/virtio_transport_common.c
> >@@ -1578,6 +1578,7 @@ virtio_transport_recv_listen(struct sock *sk, struct 
> >sk_buff *skb,
> >        */
> >       if (ret || vchild->transport != &t->transport) {
> >               release_sock(child);
> >+              sk_acceptq_removed(sk);
>
> Ditto, are we sure about this?
>
> >               virtio_transport_reset_no_sock(t, skb, sock_net(sk));
> >               sock_put(child);
> >               return ret;
> >@@ -1588,11 +1589,19 @@ virtio_transport_recv_listen(struct sock *sk, struct 
> >sk_buff *skb,
> >               child->sk_write_space(child);
> >
> >       vsock_insert_connected(vchild);
> >+      release_sock(child);
> >+      lock_sock(sk);
>
> IMO this is a deadlock with the lock_sock(sk) called by the caller.
>
> Also a comment would be helpful here to explain why we're doing this.
>
> >+      if (sk->sk_shutdown == SHUTDOWN_MASK) {
> >+              release_sock(sk);
> >+              sk_acceptq_removed(sk);
> >+              virtio_transport_reset_no_sock(t, skb, sock_net(sk));
> >+              sock_put(child);
> >+              return -ESHUTDOWN;
>
> Since this is very similar to the error path of
> vsock_assign_transport(), I think it would be better to start by
> defining a common error path for the function and use labels to exit, so
> we can avoid duplicating the code multiple times.
>
> >+      }
> >       vsock_enqueue_accept(sk, child);
> >+      release_sock(sk);
> >       virtio_transport_send_response(vchild, skb);
> >
> >-      release_sock(child);
> >-
>
> TBH I'm really worried about this patch since both fixes are completely
> wrong IMO.
>
> Thanks,
> Stefano
>
> >       sk->sk_data_ready(sk);
> >       return 0;
> > }
> >--
> >2.43.0
> >
>


Hi Stefano,

Thank you for the detailed review!

You are correct on both points. I apologize for the confusion — I was
looking at an older version of the code where sk_acceptq_added() was
called BEFORE vsock_assign_transport(), which made the
sk_acceptq_removed() fix appear necessary. In the current kernel,
sk_acceptq_added() is already moved to after vsock_assign_transport(),
so that issue no longer exists.

Regarding the lock_sock(sk) fix — you are also correct that
virtio_transport_recv_pkt() already holds lock_sock(sk) before calling
virtio_transport_recv_listen(), so our second fix would indeed cause a
deadlock. I missed that completely.

I am still investigating the root cause of the memory leak reported by
syzbot. The backtrace points to the vsock loopback path
(vsock_loopback_work), so I am looking there next. I will send a v2
once I have a correct analysis and fix.

Sorry again for the noise.

Thanks,
Deeranshu Kartikey

Reply via email to