On 6/23/26 9:27 PM, Maciej S. Szmigiero wrote:
> On 19.06.2026 12:55, Andrey Drobyshev wrote:
>> During a CPR-style migration, we pass additional CPR state via an aux
>> migration channel (the cpr-transfer case).  That's done in cpr_state_save().
>> The main bulk of the migration is only sent afterwards - that's why we
>> call cpr_state_save() early on in qmp_migrate().
>>
>> Previous patch placed emitting the SETUP event before cpr_state_save(),
>> since devices must be properly shut down before we send their FDs to CPR
>> target.  However, for the proper shut down to take place, we should also
>> stop operating them beforehand, i.e. stop the VM.
>>
>> Thus the desirable order for cpr-transfer case looks as follows:
>>
>>    SOURCE                                  TARGET
>>    ------                                  ------
>>                                        cpr_state_load() blocks
>>      |                                        |
>>      |  1. migration_stop_vm()                |
>>      |     VM stopped, devices quiesced       |
>>      |                                        | Waiting for
>>      |  2. notifiers (SETUP)                  | FDs from source
>>      |     vhost_reset_owner() releases       |
>>      |     device ownership                   |
>>      |                                        |
>>      |  3. cpr_state_save() ---- FDs -------> |
>>      |                                        |
>>      v                                        v
>>    postmigrate                        Device init begins
>>                                       - cpr_find_fd()
>>                                       - vhost_dev_init()
>>                                       - VHOST_SET_OWNER
>>
>> So step 3 is the synchronization/cut-over point. Target proceeds immediately
>> upon receiving FDs, so steps 1-2 must complete successfully.  Otherwise:
>>
>>    * Target's VHOST_SET_OWNER fails with -EBUSY (source still owns)
>>    * Race between source I/O and target device init
>>
>> Let's stop the VM early (before FD transfer) to prevent this race.
>> Unlike regular migration, CPR-transfer passes memory via FD (memfd)
>> rather than copying RAM, so early VM stop should have minimal downtime.
> 
> Was this "minimum downtime impact" assertion actually tested/benchmark
> with VM under some serious load?
>
> I am especially curious how this interacts with VFIO devices that
> have a lot of data.

No, I don't have the exact numbers here, my claim was based on the fact
that early stop only adds SETUP notifiers + FDs transfer to the
downtime, which don't scale with RAM.  We can reword the commit message
to loosen the claim.

>> Since we call migration_stop_vm() from qmp_migrate() (i.e. from the main
>> thread), we should also balance it out by fallback resume outside of the
>> migration thread - i.e. resume in migration_iteration_finish() is not
>> enough.  One good place for this resume op is migration_cleanup().
>> However, in migration_iteration_finish() resume is gated by successful
>> block activation, so we additionally traverse the block graph nodes to
>> make sure activation did take place before doing vm_resume().
>>
>> This patch is a rework of the change originally proposed by Steve and
>> Ben at [0].
>>
>> [0] 
>> https://lore.kernel.org/qemu-devel/[email protected]
>>
>> Originally-by: Steve Sistare <[email protected]>
>> Originally-by: Ben Chaney <[email protected]>
>> Signed-off-by: Andrey Drobyshev <[email protected]>
>> ---
> Thanks,
> Maciej
> 


Reply via email to