On 9/15/25 11:42 AM, David Marchand wrote:
On Thu, 11 Sept 2025 at 10:36, Maxime Coquelin
<[email protected]> wrote:

Add workaround to poll virtqueue ready states before starting device
when VIRTIO_DEVICE_STATUS_DRIVER_OK is set in vduse_events_handler().

For each virtqueue, poll using VDUSE_VQ_GET_INFO ioctl to check
vq_info->ready state with configurable retry limit. This addresses
timing issues where device start was attempted before all virtqueues
were properly initialized and ready.

A notification mechanism will be introduced in the next version of
the VDUSE uAPI. When it lands, we would only apply this workaround
when the kernel does not support it.

Fixes: a9120db8b98b ("vhost: add VDUSE device startup")
Cc: [email protected]

Signed-off-by: Maxime Coquelin <[email protected]>
---
  lib/vhost/vduse.c | 62 +++++++++++++++++++++++++++++++++++++++++++++--
  1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/lib/vhost/vduse.c b/lib/vhost/vduse.c
index 9de7f04a4f..5a6025d702 100644
--- a/lib/vhost/vduse.c
+++ b/lib/vhost/vduse.c
@@ -272,6 +272,56 @@ vduse_vring_cleanup(struct virtio_net *dev, unsigned int 
index)
         vq->last_avail_idx = 0;
  }

+

Nit: no need for double empty lines.

+/*
+ * Tests show that it succeeds at the first retry at worst,

it?

Changing to:
"Tests show that virtqueues get ready at the first retry at worst..."


+ * but let's be on the safe side and allow more retries.
+ */
+#define VDUSE_VQ_READY_POLL_MAX_RETRIES 100
+
+static int
+vduse_wait_for_virtqueues_ready(struct virtio_net *dev)
+{
+       struct vduse_vq_info vq_info;
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < dev->nr_vring; i++) {
+               int retry_count = 0;
+
+               while (retry_count < VDUSE_VQ_READY_POLL_MAX_RETRIES) {
+                       vq_info.index = i;

It is not clear which part of the vduse_vq_info structure is r/o, r/w
or w/o in uapi header
I see that vduse_vring_setup() does nothing more than setting index.
I am probably paranoid but do we need an explicit reset of the whole
vq_info on retry?

Moving the definition of vq_info in this loop (right before setting
vq_info.index) seems better on that topic.


The Kernel side only look for the index field (for now at least), but I agree that could change, so zeroing vq_info should be done.

I will also send a separate patch for vduse_vring_setup().

+                       ret = ioctl(dev->vduse_dev_fd, VDUSE_VQ_GET_INFO, 
&vq_info);
+                       if (ret) {
+                               VHOST_CONFIG_LOG(dev->ifname, ERR,
+                                       "Failed to get VQ %u info while polling 
ready state: %s",
+                                       i, strerror(errno));
+                               return -1;
+                       }
+
+                       if (vq_info.ready) {
+                               VHOST_CONFIG_LOG(dev->ifname, DEBUG,
+                                       "VQ %u is ready after %u retries", i, 
retry_count);
+                               break;
+                       }
+
+                       retry_count++;
+                       /* Small delay between retries */

I would remove this Lapalissade comment.


+                       usleep(1000);
+               }
+
+               if (retry_count >= VDUSE_VQ_READY_POLL_MAX_RETRIES) {
+                       VHOST_CONFIG_LOG(dev->ifname, ERR,
+                               "VQ %u ready state polling timeout after %u 
retries",
+                               i, VDUSE_VQ_READY_POLL_MAX_RETRIES);
+                       return -1;
+               }
+       }
+
+       VHOST_CONFIG_LOG(dev->ifname, INFO, "All virtqueues are ready after 
polling");
+       return 0;
+}
+
  static void
  vduse_device_start(struct virtio_net *dev, bool reconnect)
  {
@@ -414,10 +464,18 @@ vduse_events_handler(int fd, void *arg, int *close 
__rte_unused)
         }

         if ((old_status ^ dev->status) & VIRTIO_DEVICE_STATUS_DRIVER_OK) {
-               if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK)
+               if (dev->status & VIRTIO_DEVICE_STATUS_DRIVER_OK) {
+                       /* Poll virtqueues ready states before starting device 
*/
+                       ret = vduse_wait_for_virtqueues_ready(dev);
+                       if (ret < 0) {
+                               VHOST_CONFIG_LOG(dev->ifname, ERR,
+                                       "Failed to wait for virtqueues ready, 
aborting device start");
+                               return;
+                       }
                         vduse_device_start(dev, false);
-               else
+               } else {
                         vduse_device_stop(dev);
+               }
         }

         VHOST_CONFIG_LOG(dev->ifname, INFO, "Request %s (%u) handled 
successfully",
--
2.51.0


Aside from those nits, it looks an acceptable workaround for now.
Reviewed-by: David Marchand <[email protected]>



Reply via email to