On 6/23/26 9:27 PM, Maciej S. Szmigiero wrote: > On 19.06.2026 12:55, Andrey Drobyshev wrote: >> During a CPR-style migration, we pass additional CPR state via an aux >> migration channel (the cpr-transfer case). That's done in cpr_state_save(). >> The main bulk of the migration is only sent afterwards - that's why we >> call cpr_state_save() early on in qmp_migrate(). >> >> Previous patch placed emitting the SETUP event before cpr_state_save(), >> since devices must be properly shut down before we send their FDs to CPR >> target. However, for the proper shut down to take place, we should also >> stop operating them beforehand, i.e. stop the VM. >> >> Thus the desirable order for cpr-transfer case looks as follows: >> >> SOURCE TARGET >> ------ ------ >> cpr_state_load() blocks >> | | >> | 1. migration_stop_vm() | >> | VM stopped, devices quiesced | >> | | Waiting for >> | 2. notifiers (SETUP) | FDs from source >> | vhost_reset_owner() releases | >> | device ownership | >> | | >> | 3. cpr_state_save() ---- FDs -------> | >> | | >> v v >> postmigrate Device init begins >> - cpr_find_fd() >> - vhost_dev_init() >> - VHOST_SET_OWNER >> >> So step 3 is the synchronization/cut-over point. Target proceeds immediately >> upon receiving FDs, so steps 1-2 must complete successfully. Otherwise: >> >> * Target's VHOST_SET_OWNER fails with -EBUSY (source still owns) >> * Race between source I/O and target device init >> >> Let's stop the VM early (before FD transfer) to prevent this race. >> Unlike regular migration, CPR-transfer passes memory via FD (memfd) >> rather than copying RAM, so early VM stop should have minimal downtime. > > Was this "minimum downtime impact" assertion actually tested/benchmark > with VM under some serious load? > > I am especially curious how this interacts with VFIO devices that > have a lot of data.
No, I don't have the exact numbers here, my claim was based on the fact that early stop only adds SETUP notifiers + FDs transfer to the downtime, which don't scale with RAM. We can reword the commit message to loosen the claim. >> Since we call migration_stop_vm() from qmp_migrate() (i.e. from the main >> thread), we should also balance it out by fallback resume outside of the >> migration thread - i.e. resume in migration_iteration_finish() is not >> enough. One good place for this resume op is migration_cleanup(). >> However, in migration_iteration_finish() resume is gated by successful >> block activation, so we additionally traverse the block graph nodes to >> make sure activation did take place before doing vm_resume(). >> >> This patch is a rework of the change originally proposed by Steve and >> Ben at [0]. >> >> [0] >> https://lore.kernel.org/qemu-devel/[email protected] >> >> Originally-by: Steve Sistare <[email protected]> >> Originally-by: Ben Chaney <[email protected]> >> Signed-off-by: Andrey Drobyshev <[email protected]> >> --- > Thanks, > Maciej >
