On Fri, Apr 21, 2023 at 08:48:02AM +0100, Daniel P. Berrangé wrote: > On Thu, Apr 20, 2023 at 03:19:39PM -0400, Peter Xu wrote: > > On Thu, Apr 20, 2023 at 10:02:43AM +0100, Daniel P. Berrangé wrote: > > > On Wed, Apr 19, 2023 at 03:07:19PM -0400, Peter Xu wrote: > > > > On Wed, Apr 19, 2023 at 06:12:05PM +0100, Daniel P. Berrangé wrote: > > > > > On Tue, Apr 18, 2023 at 03:26:45PM -0400, Peter Xu wrote: > > > > > > On Tue, Apr 18, 2023 at 05:58:44PM +0100, Daniel P. Berrangé wrote: > > > > > > > Libvirt has multiple APIs where it currently uses its > > > > > > > migrate-to-file > > > > > > > approach > > > > > > > > > > > > > > * virDomainManagedSave() > > > > > > > > > > > > > > This saves VM state to an libvirt managed file, stops the VM, > > > > > > > and the > > > > > > > file state is auto-restored on next request to start the VM, > > > > > > > and the > > > > > > > file deleted. The VM CPUs are stopped during both save + > > > > > > > restore > > > > > > > phase > > > > > > > > > > > > > > * virDomainSave/virDomainRestore > > > > > > > > > > > > > > The former saves VM state to a file specified by the mgmt > > > > > > > app/user. > > > > > > > A later call to virDomaniRestore starts the VM using that > > > > > > > saved > > > > > > > state. The mgmt app / user can delete the file state, or > > > > > > > re-use > > > > > > > it many times as they desire. The VM CPUs are stopped during > > > > > > > both > > > > > > > save + restore phase > > > > > > > > > > > > > > * virDomainSnapshotXXX > > > > > > > > > > > > > > This family of APIs takes snapshots of the VM disks, > > > > > > > optionally > > > > > > > also including the full VM state to a separate file. The > > > > > > > snapshots > > > > > > > can later be restored. The VM CPUs remain running during the > > > > > > > save phase, but are stopped during restore phase > > > > > > > > > > > > For this one IMHO it'll be good if Libvirt can consider leveraging > > > > > > the new > > > > > > background-snapshot capability (QEMU 6.0+, so not very new..). Or > > > > > > is there > > > > > > perhaps any reason why a generic migrate:fd approach is better? > > > > > > > > > > I'm not sure I fully understand the implications of > > > > > 'background-snapshot' ? > > > > > > > > > > Based on what the QAPI comment says, it sounds potentially > > > > > interesting, > > > > > as conceptually it would be nicer to have the memory / state snapshot > > > > > represent the VM at the point where we started the snapshot operation, > > > > > rather than where we finished the snapshot operation. > > > > > > > > > > It would not solve the performance problems that the work in this > > > > > thread > > > > > was intended to address though. With large VMs (100's of GB of RAM), > > > > > saving all the RAM state to disk takes a very long time, regardless of > > > > > whether the VM vCPUs are paused or running. > > > > > > > > I think it solves the performance problem by only copy each of the guest > > > > page once, even if the guest is running. > > > > > > I think we're talking about different performance problems. > > > > > > What you describe here is about ensuring the snapshot is of finite size > > > and completes in linear time, by ensuring each page is written only > > > once. > > > > > > What I'm talking about is being able to parallelize the writing of all > > > RAM, so if a single thread can saturate the storage, using multiple > > > threads will make the overal process faster, even when we're only > > > writing each page once. > > > > It depends on how much we want it. Here the live snapshot scenaior could > > probably leverage a same multi-threading framework with a vm suspend case > > because it can assume all the pages are static and only saved once. > > > > But I agree it's at least not there yet.. so we can directly leverage > > multifd at least for now. > > > > > > > > > Different from mostly all the rest of "migrate" use cases, background > > > > snapshot does not use the generic dirty tracking at all (for KVM that's > > > > get-dirty-log), instead it uses userfaultfd wr-protects, so that when > > > > taking the snapshot all the guest pages will be protected once. > > > > > > Oh, so that means this 'background-snapshot' feature only works on > > > Linux, and only when permissions allow it. The migration parameter > > > probably should be marked with 'CONFIG_LINUX' in the QAPI schema > > > to make it clear this is a non-portable feature. > > > > Indeed, I can have a follow up patch for this. But it'll be the same as > > some other features, like, postcopy (and all its sub-features including > > postcopy-blocktime and postcopy-preempt)? > > > > > > > > > It guarantees the best efficiency of creating a snapshot with VM > > > > running, > > > > afaict. I sincerely think Libvirt should have someone investigating and > > > > see whether virDomainSnapshotXXX() can be implemented by this cap rather > > > > than the default migration. > > > > > > Since the background-snapshot feature is not universally available, > > > it will only ever be possible to use it as an optional enhancement > > > with virDomainSnapshotXXX, we'll need the portable impl to be the > > > default / fallback. > > > > I am actually curious on how a live snapshot can be implemented correctly > > if without something like background snapshot. I raised this question in > > another reply here: > > > > https://lore.kernel.org/all/ZDWBSuGDU9IMohEf@x1n/ > > > > I was using fixed-ram and vm suspend as example, but I assume it applies to > > any live snapshot that is based on current default migration scheme. > > > > For a real live snapshot (not vm suspend), IIUC we have similar challenges. > > > > The problem is when migration completes (snapshot taken) the VM is still > > running with a live disk image. Then how can we take a snapshot exactly at > > the same time when we got the guest image mirrored in the vm dump? What > > guarantees that there's no IO changes after VM image created but before we > > take a snapshot on the disk image? > > > > In short, it's a question on how libvirt can make sure the VM image and > > disk snapshot image be taken at exactly the same time for live. > > It is just a matter of where you have the synchronization point. > > With background-snapshot, you have to snapshot the disks at the > start of the migrate operation. Without background-snapshot > yu have to snapshot the disks at the end of the migrate > operation. The CPUs are paused at the end of the migrate, so > when the CPUs pause, initiate the storage snapshot in the > background and then let the CPUs resume.
Ah, indeed. Thanks. -- Peter Xu