Thanks for the patch!
Yes something to improve:

On Sun, Jun 30, 2024 at 02:36:15PM +0800, Wencheng Yang wrote:
> From: thomas <east.moutain.y...@gmail.com>
> 
> Patch 06b12970174 ("virtio-net: fix network stall under load")
> added double-check to test whether the available buffer size
> can satisfy the request or not, in case the guest has added
> some buffers to the avail ring simultaneously after the first
> check. It will be lucky if the available buffer size becomes
> okay after the double-check, then the host can send the packet
> to the guest. If the buffer size still can't satisfy the request,
> even if the guest has added some buffers, viritio-net would
> stall at the host side forever.
> 
> The patch checks whether the guest has added some buffers
> after last check of avail idx when the available buffers are
> not sufficient, if so then recheck the available buffers
> in the loop.
> 
> The patch also reverts patch "06b12970174".
> 
> The case below can reproduce the stall.
> 
>                                        Guest 0
>                                      +--------+
>                                      | iperf  |
>                     ---------------> | server |
>          Host       |                +--------+
>        +--------+   |                    ...
>        | iperf  |----
>        | client |----                  Guest n
>        +--------+   |                +--------+
>                     |                | iperf  |
>                     ---------------> | server |
>                                      +--------+
> 
> Boot many guests from qemu with virtio network:
>  qemu ... -netdev tap,id=net_x \
>     -device virtio-net-pci-non-transitional,\
>     iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
> 
> Each guest acts as iperf server with commands below:
>  iperf3 -s -D -i 10 -p 8001
>  iperf3 -s -D -i 10 -p 8002
> 
> The host as iperf client:
>  iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
>  iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000
> 
> After some time, the host loses connection to the guest,
> the guest can send packet to the host, but can't receive
> packet from host.
> 
> It's more likely to happen if SWIOTLB is enabled in the guest,
> allocating and freeing bounce buffer takes some CPU ticks,
> copying from/to bounce buffer takes more CPU ticks, compared
> with that there is no bounce buffer in the guest.
> Once the rate of producing packets from the host approximates
> the rate of receiveing packets in the guest, the guest would
> loop in NAPI.
> 
>          receive packets    ---
>                |             |
>                v             |
>            free buf      virtnet_poll
>                |             |
>                v             |
>      add buf to avail ring  ---
>                |
>                |  need kick the host?
>                |  NAPI continues
>                v
>          receive packets    ---
>                |             |
>                v             |
>            free buf      virtnet_poll
>                |             |
>                v             |
>      add buf to avail ring  ---
>                |
>                v
>               ...           ...
> 
> On the other hand, the host fetches free buf from avail
> ring, if the buf in the avail ring is not enough, the
> host notifies the guest the event by writing the avail
> idx read from avail ring to the event idx of used ring,
> then the host goes to sleep, waiting for the kick signal
> from the guest.
> 
> Once the guest finds the host is waiting for kick singal
> (in virtqueue_kick_prepare_split()), it kicks the host.
> 
> The host may stall forever at the sequences below:
> 
>          Host                        Guest
>      ------------                 -----------
>  fetch buf, send packet           receive packet ---
>          ...                          ...         |
>  fetch buf, send packet             add buf       |
>          ...                        add buf   virtnet_poll
>     buf not enough      avail idx-> add buf       |
>     read avail idx                  add buf       |
>                                     add buf      ---
>                                   receive packet ---
>     write event idx                   ...         |
>     waiting for kick                 add buf   virtnet_poll
>                                       ...         |
>                                                  ---
>                                  no more packet, exit NAPI
> 
> In the first loop of NAPI above, indicated in the range of
> virtnet_poll above, the host is sending packets while the
> guest is receiving packets and adding buf.
>  step 1: The buf is not enough, for example, a big packet
>          needs 5 buf, but the available buf count is 3.
>          The host read current avail idx.
>  step 2: The guest adds some buf, then checks whether the
>          host is waiting for kick signal, not at this time.
>          The used ring is not empty, the guest continues
>          the second loop of NAPI.
>  step 3: The host writes the avail idx read from avail
>          ring to used ring as event idx via
>          virtio_queue_set_notification(q->rx_vq, 1).
>  step 4: At the end of the second loop of NAPI, recheck
>          whether kick is needed, as the event idx in the
>          used ring written by the host is beyound the
>          range of kick condition, the guest will not
>          send kick signal to the host.

Put changelog after --- please.

> Changelog:
> v6:
> - Take packed queue into cosideration
> - Adjust function sequence to fix compilation issue
> 
> v5:
> - Modify return type of virtio_queue_set_notification() to
>   bool to indicate whether the guest has added some buffers
>   after last check of avail idx
> - Loop in virtio_net_has_buffers() if the available buffers
>   are not sufficient and the guest has added some buffers.
> - Revert patch "06b12970174"
> - Update the subject
> 
> v4:
> - Correct spelling mistake in the subject
> - Describe the issue that virtio-net is blocked at host side
> 
> v3:
> - Add virtio-net tag in the subject
> - Refine commit log
> 
> v2:
> - Add SOB tag at the end of the commit message
> - Place Fixes tag at the end of the commit message
> 
> v1:
> - Initial patch
> 
> Fixes: 06b12970174 ("virtio-net: fix network stall under load")
> Signed-off-by: Wencheng Yang <east.moutain.y...@gmail.com>
> ---
>  hw/net/virtio-net.c        |  19 ++---
>  hw/virtio/virtio.c         | 157 ++++++++++++++++++++-----------------
>  include/hw/virtio/virtio.h |   2 +-
>  3 files changed, 96 insertions(+), 82 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 9c7e85caea..13affc1f35 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1642,18 +1642,15 @@ static bool virtio_net_can_receive(NetClientState *nc)
>  static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
>  {
>      VirtIONet *n = q->n;
> -    if (virtio_queue_empty(q->rx_vq) ||
> -        (n->mergeable_rx_bufs &&
> -         !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> -        virtio_queue_set_notification(q->rx_vq, 1);
> -
> -        /* To avoid a race condition where the guest has made some buffers
> -         * available after the above check but before notification was
> -         * enabled, check for available buffers again.
> -         */
> -        if (virtio_queue_empty(q->rx_vq) ||
> +
> +    while (virtio_queue_empty(q->rx_vq) ||
>              (n->mergeable_rx_bufs &&
> -             !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> +            !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
> +        /* guest may have made some buf, try again */
> +        if (virtio_queue_set_notification(q->rx_vq, 1)) {
> +            virtio_queue_set_notification(q->rx_vq, 0);
> +            continue;
> +        } else {
>              return 0;
>          }
>      }
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..24eaf4d88d 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -507,76 +507,6 @@ static inline void vring_set_avail_event(VirtQueue *vq, 
> uint16_t val)
>      address_space_cache_invalidate(&caches->used, pa, sizeof(val));
>  }
>  
> -static void virtio_queue_split_set_notification(VirtQueue *vq, int enable)
> -{
> -    RCU_READ_LOCK_GUARD();
> -
> -    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> -        vring_set_avail_event(vq, vring_avail_idx(vq));
> -    } else if (enable) {
> -        vring_used_flags_unset_bit(vq, VRING_USED_F_NO_NOTIFY);
> -    } else {
> -        vring_used_flags_set_bit(vq, VRING_USED_F_NO_NOTIFY);
> -    }
> -    if (enable) {
> -        /* Expose avail event/used flags before caller checks the avail idx. 
> */
> -        smp_mb();
> -    }
> -}
> -
> -static void virtio_queue_packed_set_notification(VirtQueue *vq, int enable)
> -{
> -    uint16_t off_wrap;
> -    VRingPackedDescEvent e;
> -    VRingMemoryRegionCaches *caches;
> -
> -    RCU_READ_LOCK_GUARD();
> -    caches = vring_get_region_caches(vq);
> -    if (!caches) {
> -        return;
> -    }
> -
> -    vring_packed_event_read(vq->vdev, &caches->used, &e);
> -
> -    if (!enable) {
> -        e.flags = VRING_PACKED_EVENT_FLAG_DISABLE;
> -    } else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> -        off_wrap = vq->shadow_avail_idx | vq->shadow_avail_wrap_counter << 
> 15;
> -        vring_packed_off_wrap_write(vq->vdev, &caches->used, off_wrap);
> -        /* Make sure off_wrap is wrote before flags */
> -        smp_wmb();
> -        e.flags = VRING_PACKED_EVENT_FLAG_DESC;
> -    } else {
> -        e.flags = VRING_PACKED_EVENT_FLAG_ENABLE;
> -    }
> -
> -    vring_packed_flags_write(vq->vdev, &caches->used, e.flags);
> -    if (enable) {
> -        /* Expose avail event/used flags before caller checks the avail idx. 
> */
> -        smp_mb();
> -    }
> -}
> -
> -bool virtio_queue_get_notification(VirtQueue *vq)
> -{
> -    return vq->notification;
> -}
> -
> -void virtio_queue_set_notification(VirtQueue *vq, int enable)
> -{
> -    vq->notification = enable;
> -
> -    if (!vq->vring.desc) {
> -        return;
> -    }
> -
> -    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> -        virtio_queue_packed_set_notification(vq, enable);
> -    } else {
> -        virtio_queue_split_set_notification(vq, enable);
> -    }
> -}
> -
>  int virtio_queue_ready(VirtQueue *vq)
>  {
>      return vq->vring.avail != 0;
> @@ -668,6 +598,93 @@ static inline bool is_desc_avail(uint16_t flags, bool 
> wrap_counter)
>      return (avail != used) && (avail == wrap_counter);
>  }
>  
> +static bool virtio_queue_split_set_notification(VirtQueue *vq, int enable)
> +{
> +    uint16_t shadow_idx;
> +
> +    RCU_READ_LOCK_GUARD();
> +
> +    shadow_idx = vq->shadow_avail_idx;
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> +        vring_set_avail_event(vq, vring_avail_idx(vq));
> +    } else if (enable) {
> +        vring_used_flags_unset_bit(vq, VRING_USED_F_NO_NOTIFY);
> +    } else {
> +        vring_used_flags_set_bit(vq, VRING_USED_F_NO_NOTIFY);
> +    }
> +    if (enable) {
> +        /* Expose avail event/used flags before caller checks the avail idx. 
> */
> +        smp_mb();
> +
> +        /* If shadow_avail_idx is changed, guest has added some bufs */
> +        return shadow_idx != vring_avail_idx(vq);
> +    }
> +
> +    return false;
> +}
> +
> +static bool virtio_queue_packed_set_notification(VirtQueue *vq, int enable)
> +{
> +    uint16_t off_wrap;
> +    VRingPackedDesc desc;
> +    VRingPackedDescEvent e;
> +    VRingMemoryRegionCaches *caches;
> +
> +    RCU_READ_LOCK_GUARD();
> +    caches = vring_get_region_caches(vq);
> +    if (!caches) {
> +        return false;
> +    }
> +
> +    vring_packed_event_read(vq->vdev, &caches->used, &e);
> +
> +    if (!enable) {
> +        e.flags = VRING_PACKED_EVENT_FLAG_DISABLE;
> +    } else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
> +        off_wrap = vq->shadow_avail_idx | vq->shadow_avail_wrap_counter << 
> 15;
> +        vring_packed_off_wrap_write(vq->vdev, &caches->used, off_wrap);
> +        /* Make sure off_wrap is wrote before flags */
> +        smp_wmb();
> +        e.flags = VRING_PACKED_EVENT_FLAG_DESC;
> +    } else {
> +        e.flags = VRING_PACKED_EVENT_FLAG_ENABLE;
> +    }
> +
> +    vring_packed_flags_write(vq->vdev, &caches->used, e.flags);
> +    if (enable) {
> +        /* Expose avail event/used flags before caller checks the avail idx. 
> */
> +        smp_mb();
> +
> +        /* If shadow_avail_idx becomes available, guest has added some bufs 
> */
> +        vring_packed_desc_read(vq->vdev, &desc, &caches->desc,
> +                               vq->shadow_avail_idx, true);
> +        if (is_desc_avail(desc.flags, vq->shadow_avail_wrap_counter))
> +            return true;
> +    }
> +
> +    return false;
> +}
> +
> +bool virtio_queue_get_notification(VirtQueue *vq)
> +{
> +    return vq->notification;
> +}
> +
> +bool virtio_queue_set_notification(VirtQueue *vq, int enable)
> +{
> +    vq->notification = enable;
> +
> +    if (!vq->vring.desc) {
> +        return false;
> +    }
> +
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        return virtio_queue_packed_set_notification(vq, enable);
> +    } else {
> +        return virtio_queue_split_set_notification(vq, enable);
> +    }
> +}
> +

I think that the patch would be much smaller if you just
added virtio_queue_set_notification_and_check.
This way you are also not slowing down existing users
of virtio_queue_set_notification.


Also, I went over users of virtio_queue_set_notification:
blk,scsi all look ok, but I found at least another buggy one:
virtio_ballloon_get_free_page_hints 



I also wonder about the users in virtio.c:

static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
{
    VirtQueue *vq = container_of(n, VirtQueue, host_notifier);

    /* Caller polls once more after this to catch requests that race with us */
    virtio_queue_set_notification(vq, 1);
}
    
void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
{   
    /*  
     * virtio_queue_aio_detach_host_notifier() can leave notifications disabled.
     * Re-enable them.  (And if detach has not been used before, notifications
     * being enabled is still the default state while a notifier is attached;
     * see virtio_queue_host_notifier_aio_poll_end(), which will always leave
     * notifications enabled once the polling section is left.)
     */ 
    if (!virtio_queue_get_notification(vq)) {
        virtio_queue_set_notification(vq, 1);
    }

    aio_set_event_notifier(ctx, &vq->host_notifier,
                           virtio_queue_host_notifier_read,
                           virtio_queue_host_notifier_aio_poll,
                           virtio_queue_host_notifier_aio_poll_ready);
    aio_set_event_notifier_poll(ctx, &vq->host_notifier,
                                virtio_queue_host_notifier_aio_poll_begin,
                                virtio_queue_host_notifier_aio_poll_end);

    /*
     * We will have ignored notifications about new requests from the guest
     * while no notifiers were attached, so "kick" the virt queue to process
     * those requests now.
     */
    event_notifier_set(&vq->host_notifier);
}

void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext 
*ctx)
{
    /* See virtio_queue_aio_attach_host_notifier() */
    if (!virtio_queue_get_notification(vq)) {
        virtio_queue_set_notification(vq, 1);
    }

    aio_set_event_notifier(ctx, &vq->host_notifier,
                           virtio_queue_host_notifier_read,
                           NULL, NULL);

    /*
     * See virtio_queue_aio_attach_host_notifier().
     * Note that this may be unnecessary for the type of virtqueues this
     * function is used for.  Still, it will not hurt to have a quick look into
     * whether we can/should process any of the virtqueue elements.
     */
    event_notifier_set(&vq->host_notifier);
}



These three look fundamentally racy:

Cc people who touched this area.

Stefan, Kevin, what do you guys think?


>  /* Fetch avail_idx from VQ memory only when we really need to know if
>   * guest has added some buffers.
>   * Called within rcu_read_lock().  */
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..e1a1621e2d 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -301,7 +301,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
> version_id);
>  void virtio_notify_config(VirtIODevice *vdev);
>  
>  bool virtio_queue_get_notification(VirtQueue *vq);
> -void virtio_queue_set_notification(VirtQueue *vq, int enable);
> +bool virtio_queue_set_notification(VirtQueue *vq, int enable);
>  
>  int virtio_queue_ready(VirtQueue *vq);
>  
> -- 
> 2.39.0


Reply via email to