On Tue, Aug 15, 2023 at 12:28 AM Stefan Hajnoczi <stefa...@redhat.com>
wrote:

> On Fri, Aug 11, 2023 at 10:31:51AM +0800, Yong Huang wrote:
> > Hi, Stefan, thank you for your interest in this series.
> > I'm trying to explain my point,  if you think my explanation
> > doesn't stand up, please let me know.
> >
> > On Fri, Aug 11, 2023 at 2:33 AM Stefan Hajnoczi <stefa...@redhat.com>
> wrote:
> >
> > > On Thu, Aug 10, 2023 at 07:07:09AM +0000, ~hyman wrote:
> > > > Ping,
> > > >
> > > > This version is a copy of version 1 and is rebased
> > > > on the master. No functional changes.
> > > >
> > > > A 1:1 virtqueue:vCPU mapping implementation for virtio-*-pci disk
> > > > introduced since qemu >= 5.2.0, which improves IO performance
> > > > remarkably. To enjoy this feature for exiting running VMs without
> > > > service interruption, the common solution is to migrate VMs from the
> > > > lower version of the hypervisor to the upgraded hypervisor, then wait
> > > > for the next cold reboot of the VM to enable this feature. That's the
> > > > way "discard" and "write-zeroes" features work.
> > > >
> > > > As to multi-queues disk allocation automatically, it's a little
> > > > different because the destination will allocate queues to match the
> > > > number of vCPUs automatically by default in the case of live
> migration,
> > > > and the VMs on the source side remain 1 queue by default, which
> results
> > > > in migration failure due to loading disk VMState incorrectly on the
> > > > destination side.
> > >
> > > Are you using QEMU's versioned machine types to freeze the VM
> > > configuration?
> >
> >
> > > If not, then live migration won't work reliably because you're
> migrating
> > > between two potentially different VM configurations. This issue is not
> > > specific to num-queues, it affects all device properties.
> > >
> > > In commit 9445e1e15e66c19e42bea942ba810db28052cd05 ("virtio-blk-pci:
> > > default num_queues to -smp N") the num_queues property is set to 1 for
> > > versioned machine types <=5.1:
> > >
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index 9ee2aa0f7b..7f65fa8743 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -31,6 +31,7 @@
> > >  GlobalProperty hw_compat_5_1[] = {
> > >      { "vhost-scsi", "num_queues", "1"},
> > >      { "vhost-user-scsi", "num_queues", "1"},
> > > +    { "virtio-blk-device", "num-queues", "1"},
> > >      { "virtio-scsi-device", "num_queues", "1"},
> > >  };
> > >  const size_t hw_compat_5_1_len = G_N_ELEMENTS(hw_compat_5_1);
> > >
> > > Live migration works when the source and destination QEMU are launched
> > > with the same versioned machine type. You can check the "info qtree"
> > > output to confirm that starting a VM with -smp 4 -M pc-q35-5.1 results
> > > in num-queues=1 while -smp 4 -M pc-q35-5.2 results in num-queues=4.
> > >
> > > > This issue requires Qemu to provide a hint that shows
> > > > multi-queues disk allocation is automatically supported, and this
> allows
> > > > upper APPs, e.g., libvirt, to recognize the hypervisor's capability
> of
> > > > this. And upper APPs can ensure to allocate the same num-queues on
> the
> > > > destination side in case of migration failure.
> > > >
> > > > To fix the issue, we introduce the auto-num-queues property for
> > > > virtio-*-pci as a solution, which would be probed by APPs, e.g.,
> libvirt
> > > > by querying the device properties of QEMU. When launching live
> > > > migration, libvirt will send the auto-num-queues property as a
> migration
> > > > cookie to the destination, and thus the destination knows if the
> source
> > > > side supports auto-num-queues. If not, the destination would switch
> off
> > > > by building the command line with "auto-num-queues=off" when
> preparing
> > > > the incoming VM process. The following patches of libvirt show how it
> > > > roughly works:
> > > >
> > >
> https://github.com/newfriday/libvirt/commit/ce2bae2e1a6821afeb80756dc01f3680f525e506
> > > >
> > >
> https://github.com/newfriday/libvirt/commit/f546972b009458c88148fe079544db7e9e1f43c3
> > > >
> > >
> https://github.com/newfriday/libvirt/commit/5ee19c8646fdb4d87ab8b93f287c20925268ce83
> > > >
> > > > The smooth upgrade solution requires the introduction of the
> auto-num-
> > > > queues property on the QEMU side, which is what the patch set does.
> I'm
> > > > hoping for comments about the series.
> > >
> > > Please take a look at versioned machine types. I think auto-num-queues
> > > is not necessary if you use versioned machine types.
> > >
> > > If you do think auto-num-queues is needed, please explain the issue in
> > > more detail and state why versioned machine types don't help.
> >
> >
> > "Using the versioned machine types" is indeed the standard way to ensure
> > the proper functioning of live migration.
> >
> > However, a stable version is strongly advised to maintain function in our
> > production environment and perhaps practically all the production
> > environments in other businesses. As a result, we must backport features
> > like "auto-allocation num-queues" while keeping the machine type the
> same.
> >
> > This patch set is posted for that reason. The "feature-backport" scenario
> > is its target. I'm not sure if the upstream development strategy should
> > take this scenario into account; if it does, perhaps this patch set can
> be
> > of use. After all, the primary goal of this patch set is to broaden the
> uses
> > for this feature and essentially makes no functional changes.
>
> Downstreams that backport only a subset of patches instead of following
> upstream linearly must define their own versioned machine types since
> upstream versioned machine types are not relevant downstream.
>
> For example, here is how CentOS Stream defines its own machine types:
>
> https://gitlab.com/redhat/centos-stream/src/qemu-kvm/-/commit/318178778db60b6475d1484509bee136317156d3
>
> I think it's nicer for downstreams to define custom versioned machine
> types than to add properties like auto-num-queues that are intended for
> downstream usage and don't serve a purpose upstream.
>
> My suggestion is for you to maintain your own custom versioned machine
> types so you can systematically control device properties across
> versions. Does that make sense?


This is a long-term approach to production, and perhaps it works for us.
We'll test the options out and maintain in touch with the upstream party.

Anyway, I appreciate your interest and comments on this series. :)

Yong

>
> Stefan
>
> >
> >
> >
> >
> > > Thanks,
> > > Stefan
> > >
> > > >
> > > > Please review, thanks.
> > > > Yong
> > > >
> > > > Hyman Huang(黄勇) (3):
> > > >   virtio-scsi-pci: introduce auto-num-queues property
> > > >   virtio-blk-pci: introduce auto-num-queues property
> > > >   vhost-user-blk-pci: introduce auto-num-queues property
> > > >
> > > >  hw/block/vhost-user-blk.c          |  1 +
> > > >  hw/block/virtio-blk.c              |  1 +
> > > >  hw/scsi/vhost-scsi.c               |  2 ++
> > > >  hw/scsi/vhost-user-scsi.c          |  2 ++
> > > >  hw/scsi/virtio-scsi.c              |  2 ++
> > > >  hw/virtio/vhost-scsi-pci.c         | 11 +++++++++--
> > > >  hw/virtio/vhost-user-blk-pci.c     |  9 ++++++++-
> > > >  hw/virtio/vhost-user-scsi-pci.c    | 11 +++++++++--
> > > >  hw/virtio/virtio-blk-pci.c         |  9 ++++++++-
> > > >  hw/virtio/virtio-scsi-pci.c        | 11 +++++++++--
> > > >  include/hw/virtio/vhost-user-blk.h |  5 +++++
> > > >  include/hw/virtio/virtio-blk.h     |  5 +++++
> > > >  include/hw/virtio/virtio-scsi.h    |  5 +++++
> > > >  13 files changed, 66 insertions(+), 8 deletions(-)
> > > >
> > > > --
> > > > 2.38.5
> > > >
> > >
> >
> >
> > --
> > Best regards
>


-- 
Best regards

Reply via email to