On Wed, Feb 01, 2023 at 08:49:30AM +0100, Eugenio Perez Martin wrote: > On Wed, Feb 1, 2023 at 4:29 AM Jason Wang <jasow...@redhat.com> wrote: > > > > On Wed, Feb 1, 2023 at 3:11 AM Eugenio Perez Martin <epere...@redhat.com> > > wrote: > > > > > > On Tue, Jan 31, 2023 at 8:10 PM Eugenio Perez Martin > > > <epere...@redhat.com> wrote: > > > > > > > > Hi, > > > > > > > > The current approach of offering an emulated CVQ to the guest and map > > > > the commands to vhost-user is not scaling well: > > > > * Some devices already offer it, so the transformation is redundant. > > > > * There is no support for commands with variable length (RSS?) > > > > > > > > We can solve both of them by offering it through vhost-user the same > > > > way as vhost-vdpa do. With this approach qemu needs to track the > > > > commands, for similar reasons as vhost-vdpa: qemu needs to track the > > > > device status for live migration. vhost-user should use the same SVQ > > > > code for this, so we avoid duplications. > > > > > > > > One of the challenges here is to know what virtqueue to shadow / > > > > isolate. The vhost-user device may not have the same queues as the > > > > device frontend: > > > > * The first depends on the actual vhost-user device, and qemu fetches > > > > it with VHOST_USER_GET_QUEUE_NUM at the moment. > > > > * The qemu device frontend's is set by netdev queues= cmdline parameter > > > > in qemu > > > > > > > > For the device, the CVQ is the last one it offers, but for the guest > > > > it is the last one offered in config space. > > > > > > > > To create a new vhost-user command to decrease that maximum number of > > > > queues may be an option. But we can do it without adding more > > > > commands, remapping the CVQ index at virtqueue setup. I think it > > > > should be doable using (struct vhost_dev).vq_index and maybe a few > > > > adjustments here and there. > > > > > > > > Thoughts? > > > > > > > > Thanks! > > > > > > > > > (Starting a separated thread to vhost-vdpa related use case) > > > > > > This could also work for vhost-vdpa if we ever decide to honor netdev > > > queues argument. It is totally ignored now, as opposed to the rest of > > > backends: > > > * vhost-kernel, whose tap device has the requested number of queues. > > > * vhost-user, that errors with ("you are asking more queues than > > > supported") if the vhost-user parent device has less queues than > > > requested (by vhost-user msg VHOST_USER_GET_QUEUE_NUM). > > > > > > One of the reasons for this is that device configuration space is > > > totally passthrough, with the values for mtu, rss conditions, etc. > > > This is not ideal, as qemu cannot check src and destination > > > equivalence and they can change under the feets of the guest in the > > > event of a migration. > > > > This looks not the responsibility of qemu but the upper layer (to > > provision the same config/features in src/dst). > > I think both share it. Or, at least, that it is inconsistent that QEMU > is in charge of checking / providing consistency for virtio features, > but not virtio-net config space. > > If we follow that to the extreme, we could simply delete the feature > checks, right? > > > > > > External tools are needed for this, duplicating > > > part of the effort. > > > > > > Start intercepting config space accesses and offering an emulated one > > > to the guest with this kind of adjustments is beneficial, as it makes > > > vhost-vdpa more similar to the rest of backends, making the surprise > > > on a change way lower. > > > > This probably needs more thought, since vDPA already provides a kind > > of emulation in the kernel. My understanding is that it would be > > sufficient to add checks to make sure the config that guests see is > > consistent with what host provisioned? > > > > With host provisioned you mean with "vdpa" tool or with qemu? Also, we > need a way to communicate the guest values to it If those checks are > added in the kernel. > > The reasoning here is the same as above: QEMU already filters features > with its own emulated layer, so the operator can specify a feature > that will never appear to the guest. It has other uses (abstract > between transport for example), but feature filtering is definitely a > thing there. > > A feature set to off in a VM (or that does not exist in that > particular qemu version) will never appear as on even in the case of > migration to modern qemu versions. > > We don't have the equivalent protection for device config space. QEMU > could assure a consistent MTU, number of queues, etc for the guest in > virtio_net_get_config (and equivalent for other kinds of devices). > QEMU already has some transformations there. It shouldn't take a lot > of code.
I think I agree. It's the easiest way to ensure migration consistency without troubles. > Having said that: > * I'm ok with starting just with checks there instead of > transformations like the queues remap proposed here. > * If we choose not to implement it, I'm not proposing to actually > delete the features checks, as I see them useful :). > > Thanks!