Re: [Devel] [PATCH RFC] vhost/vsock: Refuse the connection immediately when guest isn't ready

Konstantin Khorenko Tue, 12 May 2026 04:16:32 -0700

On 5/7/26 22:13, Polina Vishneva wrote:

From: "Denis V. Lunev" <[email protected]>


When the host initiates an AF_VSOCK connect() to a guest that has not
yet loaded the virtio-vsock transport (i.e. still booting), the caller
blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT (2 seconds), because
vhost_transport_do_send_pkt() silently exits when
vhost_vq_get_backend(vq) returns NULL.

If the guest doesn't start listening within this timeout, connect()
returns ETIMEDOUT.

This delay is usually pointless and it doesn't well align with our
behavior at other initialization stages: for example, if a connection is
attempted when the guest driver is already loaded, but when nothing is
listening yet, it returns ECONNRESET immediately without any wait.

Fix this by checking the RX virtqueue backend in
vhost_transport_send_pkt() before queuing. If the backend is NULL,
return -ECONNREFUSED immediately.

Signed-off-by: Denis V. Lunev <[email protected]>
Co-authored-by: Polina Vishneva <[email protected]>
Signed-off-by: Polina Vishneva <[email protected]>
---
  drivers/vhost/vsock.c | 17 ++++++++++++++---
  1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 1d8ec6bed53e..e6de1e23121b 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -302,6 +302,20 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net 
*net)
                return -ENODEV;
        }

+ /* If the guest has not yet initialized the RX virtqueue, fail

+        * immediately rather than queueing the packet and letting the
+        * caller wait for VSOCK_DEFAULT_CONNECT_TIMEOUT.
+        *
+        * Reading private_data without vq->mutex is a deliberate racy
+        * check: if the backend is NULL the guest driver is definitely
+        * not ready; if it becomes NULL right after, the worker
+        * (do_send_pkt) rechecks under the mutex. */
+       if (!READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data)) {
+               rcu_read_unlock();
+               kfree_skb(skb);
+               return -ECONNREFUSED;


i'm a bit hesitating about the proper error code returned here.
Who receives this error code eventually and how does it process it?

i mean - we are in a process on a VM start, but it has not been fully 
initialized yet.
But we believe it will be initialized soon, so i'd expect the attempt should be 
repeated in a while.

On the other hand i'm not sure the process when gets -ECONNREFUSED, will 
definitely retries the attempt.

May be to use -EAGAIN here - this error code definitely is expected when a new 
attempt is expected.

AI also suggests -EHOSTUNREACH (and by the way - AI does not recommend EAGAIN 
he-he :)))  ).

  EHOSTUNREACH as the error code for "guest transport not ready"

Semantics: EHOSTUNREACH means "the destination host cannot be reached" - the peer existsconceptually but thecommunication path to it is currently unavailable. This maps precisely to the situation: the guestVM exists, QEMU hasopened the vhost-vsock device and assigned a CID, but the guest has not yet loaded its virtio-vsockdriver, so the

  transport path is not established.

  Existing usage in vsock subsystem:

• vmci_transport.c:95 - VMCI_ERROR_INVALID_RESOURCE is mapped to EHOSTUNREACH. This is the casewhere the VMCIendpoint for the peer cannot be located - the peer's transport resource does not exist yet or hasbeen destroyed.

• vmci_transport_notify.c:436,525 - returned when send_waiting_read() / send_waiting_write() fails,meaning the

    notification could not reach the peer. The peer is considered unreachable.

Both cases share the same pattern: the peer is known to exist (has a CID, was previously connected,etc.) but the

  transport layer cannot deliver data to it right now.

  Why it fits better than ECONNREFUSED:

• ECONNREFUSED implies the peer received the request and actively rejected it (e.g., nothinglistening on that port).Here the guest never sees the request at all - the virtqueue backend is NULL, so the packetcannot even enter the

    guest.

• EHOSTUNREACH implies the packet could not be routed/delivered to the destination. This is exactlywhat happens - the

     RX virtqueue has no backend, so delivery is impossible.

  Userspace behavior:

• Programs and retry frameworks commonly treat EHOSTUNREACH as a transient condition worth retrying(the host may comeup), whereas ECONNREFUSED is typically treated as "service does not exist at this address" andnot retried.

• For the specific use case (host connecting to a guest that is still booting), retry is thecorrect behavior - the

    guest will eventually load its driver and become reachable.

It is a standard connect() error code - unlike EAGAIN, which is not expected from connect() andwould confuse most

  userspace socket code.

+       }
+
        if (virtio_vsock_skb_reply(skb))
                atomic_inc(&vsock->queued_replies);

@@ -624,9 +638,6 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)

                mutex_unlock(&vq->mutex);
        }

- /* Some packets may have been queued before the device was started,

-        * let's kick the send worker to send them.
-        */
        vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work);


i think the vhost_vq_work_queue() call should be removed as well here, not only 
the comment.


  Before the patch: packets accumulate while backend is NULL

  Timeline from the QEMU/host perspective:

1. QEMU opens /dev/vhost-vsock - struct vhost_vsock is created, but virtqueue backend(private_data) is still NULL.

2. QEMU issues ioctl(VHOST_VSOCK_SET_GUEST_CID) - sets vsock->guest_cid, inserts vsock intovhost_vsock_hash. From this point vhost_vsock_get(cid) can find it.

3. Guest is still booting, virtio-vsock driver not loaded yet. But the vsock is alreadydiscoverable by CID lookup.


  4. Host calls connect() - the packet gets queued but cannot be delivered:

  connect(fd, {AF_VSOCK, guest_cid, port})
    vsock_connect()                                [af_vsock.c:1650]
      transport->connect(vsk)                      [af_vsock.c:1730]
        virtio_transport_connect()                 
[virtio_transport_common.c:1076]
          virtio_transport_send_pkt_info()         
[virtio_transport_common.c:328]
            t_ops->send_pkt(skb, net)
              vhost_transport_send_pkt()           [vsock.c:289]
                vhost_vsock_get(dst_cid) -> found  (CID already in hash)
                virtio_vsock_skb_queue_tail()      ← PACKET QUEUED
                vhost_vq_work_queue()              ← WORKER KICKED
                return len                         ← SUCCESS (positive)

  Worker wakes up but cannot deliver:

  vhost_transport_send_pkt_work()
    vhost_transport_do_send_pkt(vsock, vq)         [vsock.c:107]
      mutex_lock(&vq->mutex)
      vhost_vq_get_backend(vq) == NULL             ← guest not ready
      goto out                                     ← PACKET STAYS IN QUEUE
      mutex_unlock(&vq->mutex)

Back in vsock_connect() - transport->connect() returned success (len > 0), so the code enters thewait loop:


      sk->sk_state = TCP_SYN_SENT;
      err = transport->connect(vsk);     → returns len (success)
      if (err < 0) goto out;             → NOT taken
      ...
      while (sk->sk_state != TCP_ESTABLISHED && ...) {
          timeout = schedule_timeout(timeout);     ← SLEEPS 2 SECONDS
          if (timeout == 0) {
              err = -ETIMEDOUT;                    ← GIVES UP
          }
      }

The guest never receives the CONNECT request (it is stuck in the queue), so no response arrives,and connect() returns ETIMEDOUT after 2 seconds.

5. Later the guest finishes booting, loads the virtio-vsock driver, negotiates virtqueues. QEMUissues ioctl(VHOST_VSOCK_SET_RUNNING, 1) which calls vhost_vsock_start():


  vhost_vsock_start()                              [vsock.c:609]
    for each vq:
      mutex_lock(&vq->mutex)
      vhost_vq_set_backend(vq, vsock)              ← backend becomes NON-NULL
      mutex_unlock(&vq->mutex)
    vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX],  ← KICKS WORKER AGAIN
                        &vsock->send_pkt_work)

Worker wakes up, now vhost_vq_get_backend(vq) != NULL, delivers the queued packet to the guest. Butit is too late - connect() on the host side already timed out.

Why the kick in vhost_vsock_start() is essential here: between steps 4 and 5 nobody else will wakethe worker. The kick from step 4 already fired and did nothing (backend was NULL). No new packets arecoming - the only connect() caller is sleeping. Without this kick the packet would remain in the queueforever.


  ────────────────────────────────────────

  After the patch: packets no longer accumulate

  Same initial conditions - QEMU has set the CID, guest is still booting.

  Host calls connect():

  connect(fd, {AF_VSOCK, guest_cid, port})
    vsock_connect()                                [af_vsock.c:1650]
      transport->connect(vsk)                      [af_vsock.c:1730]
        virtio_transport_connect()                 
[virtio_transport_common.c:1076]
          virtio_transport_send_pkt_info()         
[virtio_transport_common.c:328]
            t_ops->send_pkt(skb, net)
              vhost_transport_send_pkt()           [vsock.c:289]
                vhost_vsock_get(dst_cid) -> found
                READ_ONCE(vsock->vqs[VSOCK_VQ_RX].private_data) == NULL
                kfree_skb(skb)                     ← PACKET FREED
                return -ECONNREFUSED               ← ERROR RETURNED

  The error propagates back immediately:

          virtio_transport_send_pkt_info():
            ret = t_ops->send_pkt(skb, net)  → -ECONNREFUSED
            if (ret < 0) break               → breaks out
        virtio_transport_connect() returns -ECONNREFUSED
      vsock_connect():
        err = transport->connect(vsk)        → -ECONNREFUSED
        if (err < 0) goto out                → TAKEN, skips wait loop
    connect() returns ECONNREFUSED to userspace immediately

The packet never enters send_pkt_queue. When vhost_vsock_start() runs later, the queue isguaranteed to be empty - there is nothing for the worker kick to flush.


  ────────────────────────────────────────

Summary: SET_GUEST_CID makes the vsock discoverable, SET_RUNNING actually enables the virtqueues.Between these two ioctls there is a window where packets are accepted into the queue but cannot bedelivered. The kick in vhost_vsock_start() existed to drain this backlog. The patch closes the windowat the entry point instead - refusing packets outright - so the backlog can never form.

mutex_unlock(&vsock->dev.mutex);


_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH RFC] vhost/vsock: Refuse the connection immediately when guest isn't ready

Reply via email to