On Wed, Oct 25, 2023 at 10:57:12AM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berra...@redhat.com> writes:
> 
> > On Mon, Oct 23, 2023 at 05:35:45PM -0300, Fabiano Rosas wrote:
> >> Add a capability that allows the management layer to delegate to QEMU
> >> the decision of whether to pause a VM and perform a non-live
> >> migration. Depending on the type of migration being performed, this
> >> could bring performance benefits.
> >
> > I'm not really see what problem this is solving.
> >
> 
> Well, this is the fruit of your discussion with Peter Xu in the previous
> version of the patch.
> 
> To recap: he thinks QEMU is doing useless work with file migrations
> because they are always asynchronous. He thinks we should always pause
> before doing fixed-ram migration. You said that libvirt would rather use
> fixed-ram for a more broad set of savevm-style commands, so you'd rather
> not always pause. I'm trying to cater to both of your wishes. This new
> capability is the middle ground I came up with.
> 
> So fixed-ram would always pause the VM, because that is the primary
> use-case, but libvirt would be allowed to say: don't pause this time.

If the VM is going to be powered off immediately after saving
a snapshot then yes, you might as well pause it, but we can't
assume that will be the case.  An equally common use case
would be for saving periodic snapshots of a running VM. This
should be transparent such that the VM remains running the
whole time, except a narrow window at completion of RAM/state
saving where we flip the disk snapshots, so they are in sync
with the RAM snapshot.

IOW, save/restore to disk can imply paused, but snapshotting
should not imply paused. So I don't see an unambiguous
rationale that we should diverge when fixed-ram is set and
auto-pause the VM.

> > Mgmt apps are perfectly capable of pausing the VM before issuing
> > the migrate operation.
> >
> 
> Right. But would QEMU be allowed to just assume that if a VM is paused
> at the start of migration it can then go ahead and skip all dirty page
> mechanisms?

Skipping dirty page tracking would imply that the mgmt app cannot
resume CPUs without either letting the operation complete, or
aborting it.

That is probably a reasonable assumption, as I can't come up with
a use case for starting out paused and then later resuming, unless
there was a scearnio where you needed to synchronous something
external with the start of migration.  Sychronizing storage though
is something that happens at the end of migration instead.

> Without pausing, we're basically doing *live* migration into a static
> file that will be kept on disk for who knows how long before being
> restored on the other side. We could release the src QEMU resources (a
> bit) earlier if we paused the VM beforehand.

Can we really release resources early ?  If the save operation fails
right at the end, we want to be able to resume execution of CPUs,
which assumes all resources are still available, otherwise we have
a failure scenario where we've not successfully saved to disk and
also don't still have the running QEMU.

> We're basically talking about whether we want the VM to be usable in the
> (hopefully) very short time between issuing the migration command and
> the migration being finished. We might be splitting hairs here, but we
> need some sort of consensus.

The time may not be very short for large VMs.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to