> From: "Michael S. Tsirkin"<[email protected]>
> Date:  Mon, Mar 2, 2026, 16:52
> Subject:  [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
> To: <[email protected]>
> Cc: "ShuangYu"<[email protected]>, "Stefano 
> Garzarella"<[email protected]>, "Stefan Hajnoczi"<[email protected]>, 
> "Jason Wang"<[email protected]>, "Eugenio Pérez"<[email protected]>, 
> <[email protected]>, <[email protected]>, 
> <[email protected]>
> vhost_get_avail_idx is supposed to report whether it has updated
> vq->avail_idx. Instead, it returns whether all entries have been
> consumed, which is usually the same. But not always - in
> drivers/vhost/net.c and when mergeable buffers have been enabled, the
> driver checks whether the combined entries are big enough to store an
> incoming packet. If not, the driver re-enables notifications with
> available entries still in the ring. The incorrect return value from
> vhost_get_avail_idx propagates through vhost_enable_notify and causes
> the host to livelock if the guest is not making progress, as vhost will
> immediately disable notifications and retry using the available entries.
> 
> The obvious fix is to make vhost_get_avail_idx do what the comment
> says it does and report whether new entries have been added.
> 
> Reported-by: ShuangYu <[email protected]>
> Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
> Cc: Stefano Garzarella <[email protected]>
> Cc: Stefan Hajnoczi <[email protected]>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
> 
> Lightly tested, posting early to simplify testing for the reporter.
> 
>  drivers/vhost/vhost.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 2f2c45d20883..db329a6f6145 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
>  static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
>  {
>          __virtio16 idx;
> +        u16 avail_idx;
>          int r;
>  
>          r = vhost_get_avail(vq, idx, &vq->avail->idx);
> @@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct 
> vhost_virtqueue *vq)
>          }
>  
>          /* Check it isn't doing very strange thing with available indexes */
> -        vq->avail_idx = vhost16_to_cpu(vq, idx);
> -        if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) {
> +        avail_idx = vhost16_to_cpu(vq, idx);
> +        if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
>                  vq_err(vq, "Invalid available index change from %u to %u",
> -                       vq->last_avail_idx, vq->avail_idx);
> +                       vq->last_avail_idx, avail_idx);
>                  return -EINVAL;
>          }
>  
>          /* We're done if there is nothing new */
> -        if (vq->avail_idx == vq->last_avail_idx)
> +        if (avail_idx == vq->avail_idx)
>                  return 0;
>  
> +        vq->avail_idx = avail_idx;
> +
>          /*
>           * We updated vq->avail_idx so we need a memory barrier between
>           * the index read above and the caller reading avail ring entries.
> -- 
> MST
> 

Hi,

Sorry for the delay, it took me some time to build a reliable test environment. 

I've verified the patch on 6.18.10. With 16 concurrent TCP flows over
virtio-net (TSO enabled, vCPU throttled to 1%, 300s duration):

  Before patch (bpftrace + CPU monitor):
    - vhost_discard_vq_desc: up to 3,716,272/3s
    - vhost_enable_notify false positives: up to 3,716,278/3s
      (nearly 1:1 with discard — every partial alloc triggers re-loop)
    - vhost worker CPU: 0~99%, frequently 50-99%
    - successful receives: as low as 137/3s

  After patch:
    - vhost_discard_vq_desc: 9~33/3s
    - vhost_enable_notify false positives: 0 (all 100 sample windows)
    - vhost worker CPU: 0% (all 149 sample points)
    - successful receives: 873~2,667/3s

The partial allocations still occur under load, but after the fix
vhost_get_avail_idx correctly returns 0, so the worker sleeps
instead of spinning. Thanks.

Tested-by: ShuangYu <[email protected]>

Best regards,
ShuangYu

Reply via email to