Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

Peter Xu Fri, 21 Apr 2023 06:57:34 -0700

On Fri, Apr 21, 2023 at 08:48:02AM +0100, Daniel P. Berrangé wrote:
> On Thu, Apr 20, 2023 at 03:19:39PM -0400, Peter Xu wrote:
> > On Thu, Apr 20, 2023 at 10:02:43AM +0100, Daniel P. Berrangé wrote:
> > > On Wed, Apr 19, 2023 at 03:07:19PM -0400, Peter Xu wrote:
> > > > On Wed, Apr 19, 2023 at 06:12:05PM +0100, Daniel P. Berrangé wrote:
> > > > > On Tue, Apr 18, 2023 at 03:26:45PM -0400, Peter Xu wrote:
> > > > > > On Tue, Apr 18, 2023 at 05:58:44PM +0100, Daniel P. Berrangé wrote:
> > > > > > > Libvirt has multiple APIs where it currently uses its 
> > > > > > > migrate-to-file
> > > > > > > approach
> > > > > > > 
> > > > > > >   * virDomainManagedSave()
> > > > > > > 
> > > > > > >     This saves VM state to an libvirt managed file, stops the VM, 
> > > > > > > and the
> > > > > > >     file state is auto-restored on next request to start the VM, 
> > > > > > > and the
> > > > > > >     file deleted. The VM CPUs are stopped during both save + 
> > > > > > > restore
> > > > > > >     phase
> > > > > > > 
> > > > > > >   * virDomainSave/virDomainRestore
> > > > > > > 
> > > > > > >     The former saves VM state to a file specified by the mgmt 
> > > > > > > app/user.
> > > > > > >     A later call to virDomaniRestore starts the VM using that 
> > > > > > > saved
> > > > > > >     state. The mgmt app / user can delete the file state, or 
> > > > > > > re-use
> > > > > > >     it many times as they desire. The VM CPUs are stopped during 
> > > > > > > both
> > > > > > >     save + restore phase
> > > > > > > 
> > > > > > >   * virDomainSnapshotXXX
> > > > > > > 
> > > > > > >     This family of APIs takes snapshots of the VM disks, 
> > > > > > > optionally
> > > > > > >     also including the full VM state to a separate file. The 
> > > > > > > snapshots
> > > > > > >     can later be restored. The VM CPUs remain running during the
> > > > > > >     save phase, but are stopped during restore phase
> > > > > > 
> > > > > > For this one IMHO it'll be good if Libvirt can consider leveraging 
> > > > > > the new
> > > > > > background-snapshot capability (QEMU 6.0+, so not very new..).  Or 
> > > > > > is there
> > > > > > perhaps any reason why a generic migrate:fd approach is better?
> > > > > 
> > > > > I'm not sure I fully understand the implications of 
> > > > > 'background-snapshot' ?
> > > > > 
> > > > > Based on what the QAPI comment says, it sounds potentially 
> > > > > interesting,
> > > > > as conceptually it would be nicer to have the memory / state snapshot
> > > > > represent the VM at the point where we started the snapshot operation,
> > > > > rather than where we finished the snapshot operation.
> > > > > 
> > > > > It would not solve the performance problems that the work in this 
> > > > > thread
> > > > > was intended to address though.  With large VMs (100's of GB of RAM),
> > > > > saving all the RAM state to disk takes a very long time, regardless of
> > > > > whether the VM vCPUs are paused or running.
> > > > 
> > > > I think it solves the performance problem by only copy each of the guest
> > > > page once, even if the guest is running.
> > > 
> > > I think we're talking about different performance problems.
> > > 
> > > What you describe here is about ensuring the snapshot is of finite size
> > > and completes in linear time, by ensuring each page is written only
> > > once.
> > > 
> > > What I'm talking about is being able to parallelize the writing of all
> > > RAM, so if a single thread can saturate the storage, using multiple
> > > threads will make the overal process faster, even when we're only
> > > writing each page once.
> > 
> > It depends on how much we want it.  Here the live snapshot scenaior could
> > probably leverage a same multi-threading framework with a vm suspend case
> > because it can assume all the pages are static and only saved once.
> > 
> > But I agree it's at least not there yet.. so we can directly leverage
> > multifd at least for now.
> > 
> > > 
> > > > Different from mostly all the rest of "migrate" use cases, background
> > > > snapshot does not use the generic dirty tracking at all (for KVM that's
> > > > get-dirty-log), instead it uses userfaultfd wr-protects, so that when
> > > > taking the snapshot all the guest pages will be protected once.
> > > 
> > > Oh, so that means this 'background-snapshot' feature only works on
> > > Linux, and only when permissions allow it. The migration parameter
> > > probably should be marked with 'CONFIG_LINUX' in the QAPI schema
> > > to make it clear this is a non-portable feature.
> > 
> > Indeed, I can have a follow up patch for this.  But it'll be the same as
> > some other features, like, postcopy (and all its sub-features including
> > postcopy-blocktime and postcopy-preempt)?
> > 
> > > 
> > > > It guarantees the best efficiency of creating a snapshot with VM 
> > > > running,
> > > > afaict.  I sincerely think Libvirt should have someone investigating and
> > > > see whether virDomainSnapshotXXX() can be implemented by this cap rather
> > > > than the default migration.
> > > 
> > > Since the background-snapshot feature is not universally available,
> > > it will only ever be possible to use it as an optional enhancement
> > > with virDomainSnapshotXXX, we'll need the portable impl to be the
> > > default / fallback.
> > 
> > I am actually curious on how a live snapshot can be implemented correctly
> > if without something like background snapshot.  I raised this question in
> > another reply here:
> > 
> > https://lore.kernel.org/all/ZDWBSuGDU9IMohEf@x1n/
> > 
> > I was using fixed-ram and vm suspend as example, but I assume it applies to
> > any live snapshot that is based on current default migration scheme.
> > 
> > For a real live snapshot (not vm suspend), IIUC we have similar challenges.
> > 
> > The problem is when migration completes (snapshot taken) the VM is still
> > running with a live disk image.  Then how can we take a snapshot exactly at
> > the same time when we got the guest image mirrored in the vm dump?  What
> > guarantees that there's no IO changes after VM image created but before we
> > take a snapshot on the disk image?
> > 
> > In short, it's a question on how libvirt can make sure the VM image and
> > disk snapshot image be taken at exactly the same time for live.
> 
> It is just a matter of where you have the synchronization point.
> 
> With background-snapshot, you have to snapshot the disks at the
> start of the migrate operation. Without background-snapshot
> yu have to snapshot the disks at the end of the migrate
> operation. The CPUs are paused at the end of the migrate, so
> when the CPUs pause, initiate the storage snapshot in the
> background and then let the CPUs resume.


Ah, indeed.

Thanks.

-- 
Peter Xu

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

Reply via email to