On Tue, Apr 18, 2023 at 03:26:45PM -0400, Peter Xu wrote:
> On Tue, Apr 18, 2023 at 05:58:44PM +0100, Daniel P. Berrangé wrote:
> > Libvirt has multiple APIs where it currently uses its migrate-to-file
> > approach
> > 
> >   * virDomainManagedSave()
> > 
> >     This saves VM state to an libvirt managed file, stops the VM, and the
> >     file state is auto-restored on next request to start the VM, and the
> >     file deleted. The VM CPUs are stopped during both save + restore
> >     phase
> > 
> >   * virDomainSave/virDomainRestore
> > 
> >     The former saves VM state to a file specified by the mgmt app/user.
> >     A later call to virDomaniRestore starts the VM using that saved
> >     state. The mgmt app / user can delete the file state, or re-use
> >     it many times as they desire. The VM CPUs are stopped during both
> >     save + restore phase
> > 
> >   * virDomainSnapshotXXX
> > 
> >     This family of APIs takes snapshots of the VM disks, optionally
> >     also including the full VM state to a separate file. The snapshots
> >     can later be restored. The VM CPUs remain running during the
> >     save phase, but are stopped during restore phase
> 
> For this one IMHO it'll be good if Libvirt can consider leveraging the new
> background-snapshot capability (QEMU 6.0+, so not very new..).  Or is there
> perhaps any reason why a generic migrate:fd approach is better?

I'm not sure I fully understand the implications of 'background-snapshot' ?

Based on what the QAPI comment says, it sounds potentially interesting,
as conceptually it would be nicer to have the memory / state snapshot
represent the VM at the point where we started the snapshot operation,
rather than where we finished the snapshot operation.

It would not solve the performance problems that the work in this thread
was intended to address though.  With large VMs (100's of GB of RAM),
saving all the RAM state to disk takes a very long time, regardless of
whether the VM vCPUs are paused or running.

Currently when doing this libvirt has a "libvirt_iohelper" process
that we use so that we can do writes with O_DIRECT set. This avoids
thrashing the host OS's  I/O buffers/cache, and thus negatively
impacting performance of anything else on the host doing I/O. This
can't take advantage of multifd though, and even if extended todo
so, it still imposes extra data copies during the save/restore paths.


So to speed up the above 3 libvirt APIs, we want QEMU to be able to
directly save/restore mem/vmstate to files, with parallization and
O_DIRECT.


> > All these APIs end up calling the same code inside libvirt that uses
> > the libvirt-iohelper, together with QEMU migrate:fd driver.
> > 
> > IIUC, Suse's original motivation for the performance improvements was
> > wrt to the first case of virDomainManagedSave. From the POV of actually
> > supporting this in libvirt though, we need to cover all the scenarios
> > there. Thus we need this to work both when CPUs are running and stopped,
> > and if we didn't use migrate in this case, then we basically just end
> > up re-inventing migrate again which IMHO is undesirable both from
> > libvirt's POV and QEMU's POV.
> 
> Just to make sure we're on the same page - I always think it fine to use
> the QMP "migrate" command to do this.
> 
> Meanwhile, we can also reuse the migration framework if we think that's
> still the good way to go (even if I am not 100% sure on this... I still
> think _lots_ of the live migration framework as plenty of logics trying to
> take care of a "live" VM, IOW, those logics will become pure overheads if
> we reuse the live migration framework for vm suspend).
> 
> However could you help elaborate more on why it must support live mode for
> a virDomainManagedSave() request?  As I assume this is the core of the goal.

No, we've no need for live mode for virDomainManagedSave. Live mode is
needed for virDomainSnapshot* APIs.

The point I'm making is that all three of the above libvirt APIs run exactly
the same migration code in libvirt. The only difference in the APIs is how
the operation gets striggered and whether the CPUs are running or not.

We wwant the improved performance of having parallel save/restore-to-disk
and use of O_DIRECT to be available to all 3 APIs. To me it doesn't make
sense to provide different impls for these APIs when they all have the
same end goal - it would be extra work on QEMU side and libvirt side alike
to use different solutions for each. 

> IMHO virDomainManagedSave() is a good interface design, because it contains
> the target goal of what it wants to do (according to above).  To ask in
> another way, I'm curious whether virDomainManagedSave() will stop the VM
> before triggering the QMP "migrate" to fd: If it doesn't, why not?  If it
> does, then why we can't have that assumption also for QEMU?
> 
> That assumption is IMHO important for QEMU because non-live VM migration
> can avoid tons of overhead that a live migration will need.  I've mentioned
> this in the other reply, even if we keep using the migration framework, we
> can still optimize other things like dirty tracking.  We probably don't
> even need any bitmap at all because we simply scan over all ramblocks.
> 
> OTOH, if QEMU supports live mode for a "vm suspend" in the initial design,
> not only it doesn't sound right at all from interface level, it means QEMU
> will need to keep doing so forever because we need to be compatible with
> the old interfaces even on new binaries.  That's why I keep suggesting we
> should take "VM turned off" part of the cmd if that's what we're looking
> for.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to