Am 06.02.2024 um 17:44 hat Eugenio Perez Martin geschrieben:
> On Mon, Feb 5, 2024 at 2:49 PM Kevin Wolf <kw...@redhat.com> wrote:
> >
> > Am 05.02.2024 um 13:22 hat Eugenio Perez Martin geschrieben:
> > > On Fri, Feb 2, 2024 at 2:25 PM Kevin Wolf <kw...@redhat.com> wrote:
> > > >
> > > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> > > > status flag is set; with the current API of the kernel module, it is
> > > > impossible to enable the opposite order in our block export code because
> > > > userspace is not notified when a virtqueue is enabled.
> > > >
> > > > This requirement also mathces the normal initialisation order as done by
> > > > the generic vhost code in QEMU. However, commit 6c482547 accidentally
> > > > changed the order for vdpa-dev and broke access to VDUSE devices with
> > > > this.
> > > >
> > > > This changes vdpa-dev to use the normal order again and use the standard
> > > > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> > > > used with vdpa-dev again after this fix.
> > > >
> > > > Cc: qemu-sta...@nongnu.org
> > > > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> > > > Signed-off-by: Kevin Wolf <kw...@redhat.com>
> > > > ---
> > > >  hw/virtio/vdpa-dev.c   |  5 +----
> > > >  hw/virtio/vhost-vdpa.c | 17 +++++++++++++++++
> > > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> > > > index eb9ecea83b..13e87f06f6 100644
> > > > --- a/hw/virtio/vdpa-dev.c
> > > > +++ b/hw/virtio/vdpa-dev.c
> > > > @@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice 
> > > > *vdev, Error **errp)
> > > >
> > > >      s->dev.acked_features = vdev->guest_features;
> > > >
> > > > -    ret = vhost_dev_start(&s->dev, vdev, false);
> > > > +    ret = vhost_dev_start(&s->dev, vdev, true);
> > > >      if (ret < 0) {
> > > >          error_setg_errno(errp, -ret, "Error starting vhost");
> > > >          goto err_guest_notifiers;
> > > >      }
> > > > -    for (i = 0; i < s->dev.nvqs; ++i) {
> > > > -        vhost_vdpa_set_vring_ready(&s->vdpa, i);
> > > > -    }
> > > >      s->started = true;
> > > >
> > > >      /*
> > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > index 3a43beb312..c4574d56c5 100644
> > > > --- a/hw/virtio/vhost-vdpa.c
> > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > @@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa 
> > > > *v, unsigned idx)
> > > >      return r;
> > > >  }
> > > >
> > > > +static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int 
> > > > enable)
> > > > +{
> > > > +    struct vhost_vdpa *v = dev->opaque;
> > > > +    unsigned int i;
> > > > +    int ret;
> > > > +
> > > > +    for (i = 0; i < dev->nvqs; ++i) {
> > > > +        ret = vhost_vdpa_set_vring_ready(v, i);
> > > > +        if (ret < 0) {
> > > > +            return ret;
> > > > +        }
> > > > +    }
> > > > +
> > > > +    return 0;
> > > > +}
> > > > +
> > > >  static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
> > > >                                         int fd)
> > > >  {
> > > > @@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = {
> > > >          .vhost_set_features = vhost_vdpa_set_features,
> > > >          .vhost_reset_device = vhost_vdpa_reset_device,
> > > >          .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> > > > +        .vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
> > > >          .vhost_get_config  = vhost_vdpa_get_config,
> > > >          .vhost_set_config = vhost_vdpa_set_config,
> > > >          .vhost_requires_shm_log = NULL,
> > >
> > > vhost-vdpa net enables CVQ before dataplane ones to configure all the
> > > device in the destination of a live migration. To go back again to
> > > this callback would cause the device to enable the dataplane before
> > > virtqueues are configured again.
> >
> > Not that it makes a difference, but I don't think it would actually be
> > going back. Even before your commit 6c482547, we were not making use of
> > the generic callback but just called the function in a slightly
> > different place (but less different than after commit 6c482547).
> >
> > But anyway... Why don't the other vhost backend need the same for
> > vhost-net then? Do they just not support live migration?
> 
> They don't support control virtqueue. More specifically, control
> virtqueue is handled directly in QEMU.

So the network device already has to special case vdpa instead of using
the same code for all vhost backends? :-/

> > I don't know the code well enough to say where the problem is, but if
> > vhost-vdpa networking code relies on the usual vhost operations not
> > being implemented and bypasses VhostOps to replace it, that sounds like
> > a design problem to me.
> 
> I don't follow this. What vhost operation is expected not to be implemented?

You were concerned about implementing .vhost_set_vring_enable in
vdpa_ops like my patch does. So it seems that the networking code
requires that it is not implemented?

On the other hand, for vhost-vdpa, the callback seems to be called in
exactly the right place where virtqueues need to be enabled, like for
other vhost devices.

> > Maybe VhostOps needs a new operation to enable
> > just a single virtqueue that can be used by the networking code instead?
> >
> > > How does VDUSE userspace knows how many queues should enable? Can't
> > > the kernel perform the necessary actions after DRIVER_OK, like
> > > configuring the kick etc?
> >
> > Not sure if I understand the question. The vdpa kernel interface always
> > enables individual queues, so the VDUSE userspace will enable whatever
> > queues it was asked to enable. The only restriction is that the queues
> > need to be enabled before setting DRIVER_OK.
> >
> > The interface that enables all virtqueues at once seems to be just
> > .vhost_set_vring_enable in QEMU.
> 
> It enables all virtqueues of the same vhost device in QEMU, but QEMU
> creates one vhost_dev per each vhost-net virtqueue pair and another
> one for CVQ. This goes back to the time where mq vhost-kernel net
> devices were implemented by mergin many tap devices. net/vhost-vdpa.c
> only enables the last one, which corresponds to CVQ, and then enables
> the rest once all messages have been received.

So it's less about the granularity of .vhost_set_vring_enable, which
would just be right, but about the order in which it is called for the
different vhost_devs?

Can we influence the order in another way than just not implementing the
callback at all? I think net specific weirdness should be contained in
the net implementation and not affect generic vdpa code.

> On the other hand, .vhost_set_vring_enable is also used for queue
> reset (vhost_net_virtqueue_reset and vhost_net_virtqueue_restart). In
> other words, it is called after the set DRIVER_OK. I guess it is fine
> for VDUSE as long as it does not offer vring reset capabilities, but
> future wise should we start going in that direction?

I don't actually know VDUSE very well, but that would probably make
sense.

Though for the moment, I just tried to simply attach a VDUSE device and
failed, so I'm just trying to fix the regression from your commit.

> But kernels without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK should be
> supported, right. Maybe we can add vhost_vdpa_set_vring_enable and
> call vhost_dev_start with vrings parameters conditional to the feature
> flag? That way the caller can enable it later or just enable all the
> rings altogether.

Which specific caller do you mean here?

For vdpa-dev, I don't see any problem with just passing vrings=true
unconditionally. That is clearly the least problematic order even if
another order may in theory be allowed by the spec.

For vhost_net, I don't know. As far as I am concerned, there is no
reason to change its call and it could continue to pass vrings=false.

So vhost_dev_start() seems entirely unproblematic.

The only part that changes for net is that vhost_set_vring_enable() now
does something where it didn't do anything before. If that is a problem
for vdpa based network code, maybe we can just special case vdpa there
to not call vhost_ops->vhost_set_vring_enable? (Despite its name, it's a
network specific function; it should probably be called something like
vhost_net_set_vring_enable.)

With something like the below, net shouldn't see any difference any
more, right? (Happy to add a comment before it if you write it for me.)

Kevin

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..fb6d13bd69 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -543,6 +543,10 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)

     nc->vring_enable = enable;

+    if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+        return 0;
+    }
+
     if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
         return vhost_ops->vhost_set_vring_enable(&net->dev, enable);
     }


Reply via email to