Am 06.02.2024 um 17:44 hat Eugenio Perez Martin geschrieben: > On Mon, Feb 5, 2024 at 2:49 PM Kevin Wolf <kw...@redhat.com> wrote: > > > > Am 05.02.2024 um 13:22 hat Eugenio Perez Martin geschrieben: > > > On Fri, Feb 2, 2024 at 2:25 PM Kevin Wolf <kw...@redhat.com> wrote: > > > > > > > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK > > > > status flag is set; with the current API of the kernel module, it is > > > > impossible to enable the opposite order in our block export code because > > > > userspace is not notified when a virtqueue is enabled. > > > > > > > > This requirement also mathces the normal initialisation order as done by > > > > the generic vhost code in QEMU. However, commit 6c482547 accidentally > > > > changed the order for vdpa-dev and broke access to VDUSE devices with > > > > this. > > > > > > > > This changes vdpa-dev to use the normal order again and use the standard > > > > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be > > > > used with vdpa-dev again after this fix. > > > > > > > > Cc: qemu-sta...@nongnu.org > > > > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98 > > > > Signed-off-by: Kevin Wolf <kw...@redhat.com> > > > > --- > > > > hw/virtio/vdpa-dev.c | 5 +---- > > > > hw/virtio/vhost-vdpa.c | 17 +++++++++++++++++ > > > > 2 files changed, 18 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c > > > > index eb9ecea83b..13e87f06f6 100644 > > > > --- a/hw/virtio/vdpa-dev.c > > > > +++ b/hw/virtio/vdpa-dev.c > > > > @@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice > > > > *vdev, Error **errp) > > > > > > > > s->dev.acked_features = vdev->guest_features; > > > > > > > > - ret = vhost_dev_start(&s->dev, vdev, false); > > > > + ret = vhost_dev_start(&s->dev, vdev, true); > > > > if (ret < 0) { > > > > error_setg_errno(errp, -ret, "Error starting vhost"); > > > > goto err_guest_notifiers; > > > > } > > > > - for (i = 0; i < s->dev.nvqs; ++i) { > > > > - vhost_vdpa_set_vring_ready(&s->vdpa, i); > > > > - } > > > > s->started = true; > > > > > > > > /* > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > > > index 3a43beb312..c4574d56c5 100644 > > > > --- a/hw/virtio/vhost-vdpa.c > > > > +++ b/hw/virtio/vhost-vdpa.c > > > > @@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa > > > > *v, unsigned idx) > > > > return r; > > > > } > > > > > > > > +static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int > > > > enable) > > > > +{ > > > > + struct vhost_vdpa *v = dev->opaque; > > > > + unsigned int i; > > > > + int ret; > > > > + > > > > + for (i = 0; i < dev->nvqs; ++i) { > > > > + ret = vhost_vdpa_set_vring_ready(v, i); > > > > + if (ret < 0) { > > > > + return ret; > > > > + } > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > + > > > > static int vhost_vdpa_set_config_call(struct vhost_dev *dev, > > > > int fd) > > > > { > > > > @@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = { > > > > .vhost_set_features = vhost_vdpa_set_features, > > > > .vhost_reset_device = vhost_vdpa_reset_device, > > > > .vhost_get_vq_index = vhost_vdpa_get_vq_index, > > > > + .vhost_set_vring_enable = vhost_vdpa_set_vring_enable, > > > > .vhost_get_config = vhost_vdpa_get_config, > > > > .vhost_set_config = vhost_vdpa_set_config, > > > > .vhost_requires_shm_log = NULL, > > > > > > vhost-vdpa net enables CVQ before dataplane ones to configure all the > > > device in the destination of a live migration. To go back again to > > > this callback would cause the device to enable the dataplane before > > > virtqueues are configured again. > > > > Not that it makes a difference, but I don't think it would actually be > > going back. Even before your commit 6c482547, we were not making use of > > the generic callback but just called the function in a slightly > > different place (but less different than after commit 6c482547). > > > > But anyway... Why don't the other vhost backend need the same for > > vhost-net then? Do they just not support live migration? > > They don't support control virtqueue. More specifically, control > virtqueue is handled directly in QEMU.
So the network device already has to special case vdpa instead of using the same code for all vhost backends? :-/ > > I don't know the code well enough to say where the problem is, but if > > vhost-vdpa networking code relies on the usual vhost operations not > > being implemented and bypasses VhostOps to replace it, that sounds like > > a design problem to me. > > I don't follow this. What vhost operation is expected not to be implemented? You were concerned about implementing .vhost_set_vring_enable in vdpa_ops like my patch does. So it seems that the networking code requires that it is not implemented? On the other hand, for vhost-vdpa, the callback seems to be called in exactly the right place where virtqueues need to be enabled, like for other vhost devices. > > Maybe VhostOps needs a new operation to enable > > just a single virtqueue that can be used by the networking code instead? > > > > > How does VDUSE userspace knows how many queues should enable? Can't > > > the kernel perform the necessary actions after DRIVER_OK, like > > > configuring the kick etc? > > > > Not sure if I understand the question. The vdpa kernel interface always > > enables individual queues, so the VDUSE userspace will enable whatever > > queues it was asked to enable. The only restriction is that the queues > > need to be enabled before setting DRIVER_OK. > > > > The interface that enables all virtqueues at once seems to be just > > .vhost_set_vring_enable in QEMU. > > It enables all virtqueues of the same vhost device in QEMU, but QEMU > creates one vhost_dev per each vhost-net virtqueue pair and another > one for CVQ. This goes back to the time where mq vhost-kernel net > devices were implemented by mergin many tap devices. net/vhost-vdpa.c > only enables the last one, which corresponds to CVQ, and then enables > the rest once all messages have been received. So it's less about the granularity of .vhost_set_vring_enable, which would just be right, but about the order in which it is called for the different vhost_devs? Can we influence the order in another way than just not implementing the callback at all? I think net specific weirdness should be contained in the net implementation and not affect generic vdpa code. > On the other hand, .vhost_set_vring_enable is also used for queue > reset (vhost_net_virtqueue_reset and vhost_net_virtqueue_restart). In > other words, it is called after the set DRIVER_OK. I guess it is fine > for VDUSE as long as it does not offer vring reset capabilities, but > future wise should we start going in that direction? I don't actually know VDUSE very well, but that would probably make sense. Though for the moment, I just tried to simply attach a VDUSE device and failed, so I'm just trying to fix the regression from your commit. > But kernels without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK should be > supported, right. Maybe we can add vhost_vdpa_set_vring_enable and > call vhost_dev_start with vrings parameters conditional to the feature > flag? That way the caller can enable it later or just enable all the > rings altogether. Which specific caller do you mean here? For vdpa-dev, I don't see any problem with just passing vrings=true unconditionally. That is clearly the least problematic order even if another order may in theory be allowed by the spec. For vhost_net, I don't know. As far as I am concerned, there is no reason to change its call and it could continue to pass vrings=false. So vhost_dev_start() seems entirely unproblematic. The only part that changes for net is that vhost_set_vring_enable() now does something where it didn't do anything before. If that is a problem for vdpa based network code, maybe we can just special case vdpa there to not call vhost_ops->vhost_set_vring_enable? (Despite its name, it's a network specific function; it should probably be called something like vhost_net_set_vring_enable.) With something like the below, net shouldn't see any difference any more, right? (Happy to add a comment before it if you write it for me.) Kevin diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index e8e1661646..fb6d13bd69 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -543,6 +543,10 @@ int vhost_set_vring_enable(NetClientState *nc, int enable) nc->vring_enable = enable; + if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + return 0; + } + if (vhost_ops && vhost_ops->vhost_set_vring_enable) { return vhost_ops->vhost_set_vring_enable(&net->dev, enable); }