On Tue, Feb 22, 2022 at 4:56 PM Eugenio Perez Martin
<epere...@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 8:26 AM Jason Wang <jasow...@redhat.com> wrote:
> >
> >
> > 在 2022/2/21 下午4:15, Eugenio Perez Martin 写道:
> > > On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasow...@redhat.com> wrote:
> > >>
> > >> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
> > >>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasow...@redhat.com> wrote:
> > >>>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> > >>>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasow...@redhat.com> 
> > >>>>> wrote:
> > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>>>>>> @@ -272,6 +590,28 @@ void 
> > >>>>>>> vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int 
> > >>>>>>> svq_kick_fd)
> > >>>>>>>      void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>>>>>      {
> > >>>>>>>          event_notifier_set_handler(&svq->svq_kick, NULL);
> > >>>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> > >>>>>>> +
> > >>>>>>> +    if (!svq->vq) {
> > >>>>>>> +        return;
> > >>>>>>> +    }
> > >>>>>>> +
> > >>>>>>> +    /* Send all pending used descriptors to guest */
> > >>>>>>> +    vhost_svq_flush(svq, false);
> > >>>>>> Do we need to wait for all the pending descriptors to be completed 
> > >>>>>> here?
> > >>>>>>
> > >>>>> No, this function does not wait, it only completes the forwarding of
> > >>>>> the *used* descriptors.
> > >>>>>
> > >>>>> The best example is the net rx queue in my opinion. This call will
> > >>>>> check SVQ's vring used_idx and will forward the last used descriptors
> > >>>>> if any, but all available descriptors will remain as available for
> > >>>>> qemu's VQ code.
> > >>>>>
> > >>>>> To skip it would miss those last rx descriptors in migration.
> > >>>>>
> > >>>>> Thanks!
> > >>>> So it's probably to not the best place to ask. It's more about the
> > >>>> inflight descriptors so it should be TX instead of RX.
> > >>>>
> > >>>> I can imagine the migration last phase, we should stop the vhost-vDPA
> > >>>> before calling vhost_svq_stop(). Then we should be fine regardless of
> > >>>> inflight descriptors.
> > >>>>
> > >>> I think I'm still missing something here.
> > >>>
> > >>> To be on the same page. Regarding tx this could cause repeated tx
> > >>> frames (one at source and other at destination), but never a missed
> > >>> buffer not transmitted. The "stop before" could be interpreted as "SVQ
> > >>> is not forwarding available buffers anymore". Would that work?
> > >>
> > >> Right, but this only work if
> > >>
> > >> 1) a flush to make sure TX DMA for inflight descriptors are all completed
> > >>
> > >> 2) just mark all inflight descriptor used
> > >>
> > > It currently trusts on the reverse: Buffers not marked as used (by the
> > > device) will be available in the destination, so expect
> > > retransmissions.
> >
> >
> > I may miss something but I think we do migrate last_avail_idx. So there
> > won't be a re-transmission, since we depend on qemu virtqueue code to
> > deal with vring base?
> >
>
> On stop, vhost_virtqueue_stop calls vhost_vdpa_get_vring_base. In SVQ
> mode, it returns last_used_idx. After that, vhost.c code set VirtQueue
> last_avail_idx == last_used_idx, and it's migrated after that if I'm
> not wrong.

Ok, I miss these details in the review. I suggest mentioning this in
the change log and add a comment in vhost_vdpa_get_vring_base().

>
> vhost kernel migrates last_avail_idx, but it makes rx buffers
> available on-demand, unlike SVQ. So it does not need to unwind buffers
> or anything like that. Because of how SVQ works with the rx queue,
> this is not possible, since the destination will find no available
> buffers for rx. And for tx you already have described the scenario.
>
> In other words, we cannot see SVQ as a vhost device in that regard:
> SVQ looks for total drain (as "make all guest's buffers available for
> the device ASAP") vs the vhost device which can live with a lot of
> available ones and it will use them on demand. Same problem as
> masking. So the difference in behavior is justified in my opinion, and
> it can be improved in the future with the vdpa in-flight descriptors.
>
> If we restore the state that way in a virtio-net device, it will see
> the available ones as expected, not as in-flight.
>
> Another possibility is to transform all of these into in-flight ones,
> but I feel it would create problems. Can we migrate all rx queues as
> in-flight, with 0 bytes written? Is it worth it?

To clarify, for inflight I meant from the device point of view, that
is [last_used_idx, last_avail_idx).

So for RX and SVQ, it should be as simple as stop forwarding buffers
since last_used_idx should be the same as last_avail_idx in this case.
(Though technically the rx buffer might be modified by the NIC).

> I didn't investigate
> that path too much, but I think the virtio-net emulated device does
> not support that at the moment. If I'm not wrong, we should copy
> something like the body of virtio_blk_load_device if we want to go
> that route.
>
> The current approach might be too net-centric, so let me know if this
> behavior is unexpected or we can do better otherwise.

It should be fine to start from a networking device. We can add more
in the future if it is needed.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >> Otherwise there could be buffers that is inflight forever.
> > >>
> > >> Thanks
> > >>
> > >>
> > >>> Thanks!
> > >>>
> > >>>> Thanks
> > >>>>
> > >>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>
> > >>>>>>> +
> > >>>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> > >>>>>>> +        g_autofree VirtQueueElement *elem = NULL;
> > >>>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > >>>>>>> +        if (elem) {
> > >>>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> > >>>>>>> +        }
> > >>>>>>> +    }
> > >>>>>>> +
> > >>>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >>>>>>> +    if (next_avail_elem) {
> > >>>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> > >>>>>>> +                                 next_avail_elem->len);
> > >>>>>>> +    }
> > >>>>>>>      }
> >
>


Reply via email to