On 6/16/26 9:19 AM, Dongli Zhang wrote:
>
>
> On 2026-06-12 9:51 AM, Andrey Drobyshev wrote:
>> For CPR-style migration, we need the VM to stop earlier on, before
>> cpr_state_save() fires. That is to prevent the race between source I/O
>> and target device init, as well as make sure VHOST_RESET_OWNER and
>> VHOST_SET_OWNER are fired in the right order (cpr-transfer case). Thus,
>> for CPR case migration, let's move the stop to qmp_migrate(), directly
>> before launching cpr_state_save().
>>
>> This patch is a rework of the change originally proposed by Steve and
>> Ben at [0].
>>
>> [0]
>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/[email protected]__;!!ACWV5N9M2RV99hQ!McDgSIuX3a-mCdNkiaf0TCITdQB-bEHCppcpig8fe6BPBWlU7qkvjW5FhLN0twb2fq9pC2qDYPH_FQZJYglJy4C71FMad5DMog$
>>
>>
>> Originally-by: Steve Sistare <[email protected]>
>> Originally-by: Ben Chaney <[email protected]>
>> Signed-off-by: Andrey Drobyshev <[email protected]>
>> ---
>> migration/migration.c | 73 +++++++++++++++++++++++++++++++++----------
>> 1 file changed, 56 insertions(+), 17 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 3c70467d314..c02f1da8b0e 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1380,10 +1380,13 @@ static void migration_cleanup(MigrationState *s)
>>
>> /*
>> * FAILED notification should have already happened. Notify DONE if
>> - * migration completed successfully.
>> + * migration completed successfully. On a failed/canceled path,
>> + * resume VM for the CPR case.
>> */
>> if (!migration_has_failed(s)) {
>> migration_call_notifiers(MIG_EVENT_DONE, NULL);
>> + } else if (migrate_mode_is_cpr()) {
>> + vm_resume(s->vm_old_state);
>
> In migration_iteration_finish(), the VM is resumed only after
> migration_block_activate() succeeds. With this patch, CPR failure handling is
> moved to migration_cleanup(), where vm_resume() is called *unconditionally*
> for
> failed CPR migrations.
>
You're right, thanks for noticing. Theoretically we could just call
migration_block_activate() once again here and gate the resume by its
return value, since for an already activated block graph this new
activation attempt should be a no-op. But I'd prefer a simple read-only
iteration over the graph, to check whether we've already done a
successful activation - and if we have, proceed with the resume.
>> }
>>
>> yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>> @@ -2130,6 +2133,45 @@ void qmp_migrate(const char *uri, bool has_channels,
>> */
>> Error *local_err = NULL;
>>
>> + /*
>> + * CPR-transfer ordering:
>> + *
>> + * SOURCE TARGET
>> + * ------ ------
>> + * cpr_state_load() blocks
>> + * | |
>> + * | 1. migration_stop_vm() |
>> + * | VM stopped, devices quiesced |
>> + * | | Waiting for
>> + * | 2. notifiers (SETUP) | FDs from source
>> + * | vhost_reset_owner() releases |
>> + * | device ownership |
>> + * | |
>> + * | 3. cpr_state_save() ---- FDs -------> |
>> + * | |
>> + * v v
>> + * postmigrate Device init begins
>> + * - cpr_find_fd()
>> + * - vhost_dev_init()
>> + * - VHOST_SET_OWNER
>> + *
>> + * Step 3 is the synchronization/cut-over point. Target proceeds
>> immediately
>> + * upon receiving FDs, so steps 1-2 must complete otherwise:
>
> Complete successfully?
>
>> + * - Target's VHOST_SET_OWNER fails with -EBUSY (source still owns)
>> + * - Race between source I/O and target device init
>> + *
>> + * We stop the VM early (before FD transfer) to prevent this race.
>
> How about move some of comments to the commit message instead?
>
> When reading this comment in the final code, I found the race description a
> bit
> confusing.
>
> The vhost_dev_init change is introduced only in the next patch.
>
> With this patch applied, the VM is already stopped at step 1 before SETUP and
> before cpr_state_save(), so a future reader may wonder why there is still a
> race
> between source I/O and target device init.
>
> The race is the motivation for the change, not the behavior of the code after
> the change.
>
> It may be clearer to keep the comment focused on the required ordering.
>
> Thank you very much!
>
Agreed. Let's move the motivational part + the diagram to the commit
message.
Andrey
> Dongli Zhang
>
>> + * Unlike regular migration, CPR-transfer passes memory via FD (memfd)
>> + * rather than copying RAM, so early VM stop should have minimal
>> downtime.
>> + */
>> + if (migrate_mode_is_cpr()) {
>> + int ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
>> + if (ret < 0) {
>> + error_setg(&local_err, "migration_stop_vm failed, error %d",
>> -ret);
>> + goto out;
>> + }
>> + }
>> +
>> /* Notify before starting migration thread and before starting CPR */
>> if (migration_call_notifiers(MIG_EVENT_SETUP, &local_err)) {
>> goto out;
>> @@ -3482,13 +3524,19 @@ static void
>> migration_iteration_finish(MigrationState *s)
>> */
>> migration_call_notifiers(MIG_EVENT_FAILED, NULL);
>>
>> - if (runstate_is_live(s->vm_old_state)) {
>> - if (!runstate_check(RUN_STATE_SHUTDOWN)) {
>> - vm_start();
>> - }
>> - } else {
>> - if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
>> - runstate_set(s->vm_old_state);
>> + /*
>> + * For cpr the VM resume on failure is centralized in
>> + * migration_cleanup(), so don't resume here as well.
>> + */
>> + if (!migrate_mode_is_cpr()) {
>> + if (runstate_is_live(s->vm_old_state)) {
>> + if (!runstate_check(RUN_STATE_SHUTDOWN)) {
>> + vm_start();
>> + }
>> + } else {
>> + if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
>> + runstate_set(s->vm_old_state);
>> + }
>> }
>> }
>> break;
>> @@ -3903,7 +3951,6 @@ void migration_start_outgoing(MigrationState *s)
>> Error *local_err = NULL;
>> uint64_t rate_limit;
>> bool resume = (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP);
>> - int ret;
>>
>> if (resume) {
>> /* This is a resumed migration */
>> @@ -3944,14 +3991,6 @@ void migration_start_outgoing(MigrationState *s)
>> return;
>> }
>>
>> - if (migrate_mode_is_cpr()) {
>> - ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
>> - if (ret < 0) {
>> - error_setg(&local_err, "migration_stop_vm failed, error %d",
>> -ret);
>> - goto fail;
>> - }
>> - }
>> -
>> /*
>> * Take a refcount to make sure the migration object won't get freed by
>> * the main thread already in migration_shutdown().
>