On 19.06.2026 12:55, Andrey Drobyshev wrote:
During a CPR-style migration, we pass additional CPR state via an aux
migration channel (the cpr-transfer case).  That's done in cpr_state_save().
The main bulk of the migration is only sent afterwards - that's why we
call cpr_state_save() early on in qmp_migrate().

Previous patch placed emitting the SETUP event before cpr_state_save(),
since devices must be properly shut down before we send their FDs to CPR
target.  However, for the proper shut down to take place, we should also
stop operating them beforehand, i.e. stop the VM.

Thus the desirable order for cpr-transfer case looks as follows:

   SOURCE                                  TARGET
   ------                                  ------
                                       cpr_state_load() blocks
     |                                        |
     |  1. migration_stop_vm()                |
     |     VM stopped, devices quiesced       |
     |                                        | Waiting for
     |  2. notifiers (SETUP)                  | FDs from source
     |     vhost_reset_owner() releases       |
     |     device ownership                   |
     |                                        |
     |  3. cpr_state_save() ---- FDs -------> |
     |                                        |
     v                                        v
   postmigrate                        Device init begins
                                      - cpr_find_fd()
                                      - vhost_dev_init()
                                      - VHOST_SET_OWNER

So step 3 is the synchronization/cut-over point. Target proceeds immediately
upon receiving FDs, so steps 1-2 must complete successfully.  Otherwise:

   * Target's VHOST_SET_OWNER fails with -EBUSY (source still owns)
   * Race between source I/O and target device init

Let's stop the VM early (before FD transfer) to prevent this race.
Unlike regular migration, CPR-transfer passes memory via FD (memfd)
rather than copying RAM, so early VM stop should have minimal downtime.

Was this "minimum downtime impact" assertion actually tested/benchmark
with VM under some serious load?

I am especially curious how this interacts with VFIO devices that
have a lot of data.
Since we call migration_stop_vm() from qmp_migrate() (i.e. from the main
thread), we should also balance it out by fallback resume outside of the
migration thread - i.e. resume in migration_iteration_finish() is not
enough.  One good place for this resume op is migration_cleanup().
However, in migration_iteration_finish() resume is gated by successful
block activation, so we additionally traverse the block graph nodes to
make sure activation did take place before doing vm_resume().

This patch is a rework of the change originally proposed by Steve and
Ben at [0].

[0] 
https://lore.kernel.org/qemu-devel/[email protected]

Originally-by: Steve Sistare <[email protected]>
Originally-by: Ben Chaney <[email protected]>
Signed-off-by: Andrey Drobyshev <[email protected]>
---
Thanks,
Maciej


Reply via email to