On 2026-06-12 9:51 AM, Andrey Drobyshev wrote:
> Devices register their own callbacks to MIG_EVENT_* notifications and
> process them accordingly.  But some devices, especially during CPR-style
> migration, might need to be aware and process SETUP event earlier on.  One
> such example is vhost-based devs, as they need to call RESET_OWNER ioctl
> before the target (in the cpr-transfer case) tries to call SET_OWNER.
> 
> Let's move SETUP notification to an earlier place in qmp_migrate().  Before
> this change, SETUP was firing shortly before migration thread creation, and
> FAILED notification balancing it was only fired from
> migration_iteration_finish() (ran in the thread).  But now by moving SETUP
> to earlier we widen the window.  Hence let's also add FAILED to
> migration_connect_error_propagate() and the special CPR-related case in
> migration_cancel(), to balance things out.
> 
> This patch is a rework of the change originally proposed by Steve and
> Ben at [0].
> 
> [0] 
> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/[email protected]__;!!ACWV5N9M2RV99hQ!PqTirI0SC9ktAapICP7w1YqQ7Fwlq0iPjHspIeoQU_M2xXPnQ1X5CJSCrLZEs5H-7rWU1DU_s-tNL6DD8riRAcMWPwp49pQVqQ$
>  
> 
> Originally-by: Steve Sistare <[email protected]>
> Originally-by: Ben Chaney <[email protected]>
> Signed-off-by: Andrey Drobyshev <[email protected]>
> ---
>  migration/migration.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 074d3f2c697..3c70467d314 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1471,6 +1471,13 @@ void migration_connect_error_propagate(MigrationState 
> *s, Error *error)
>      migrate_error_propagate(s, error);
>  
>      if (!resume) {
> +        /*
> +         * Emit FAILED for every pre-thread failure/cancel that ends up here,
> +         * to balance the SETUP sent in qmp_migrate().
> +         */
> +        if (migration_has_failed(s)) {
> +            migration_call_notifiers(MIG_EVENT_FAILED, NULL);
> +        }
>          migration_cleanup(s);
>      }
>  }
> @@ -1530,6 +1537,12 @@ void migration_cancel(void)
>      if (cpr_transfer_source_active(s)) {
>          assert(migrate_mode() == MIG_MODE_CPR_TRANSFER);
>          assert(setup && !s->to_dst_file);
> +        /*
> +         * This path bypasses both migration_iteration_finish() and
> +         * migration_connect_error_propagate(), so emit FAILED here to 
> balance
> +         * the SETUP sent in qmp_migrate(), before the VM is resumed.
> +         */
> +        migration_call_notifiers(MIG_EVENT_FAILED, NULL);
>          migration_cleanup(s);
>          /* Now all things should have been released */
>          assert(!cpr_transfer_source_active(s));
> @@ -2117,6 +2130,11 @@ void qmp_migrate(const char *uri, bool has_channels,
>       */
>      Error *local_err = NULL;
>  
> +    /* Notify before starting migration thread and before starting CPR */
> +    if (migration_call_notifiers(MIG_EVENT_SETUP, &local_err)) {
> +        goto out;
> +    }
> +

This affects not only CPR, but also regular live migration and postcopy recovery
(i.e. when resume=true). Even with resume=true, qmp_migrate() now triggers
MIG_EVENT_SETUP unconditionally at the beginning. If recovery fails,
migration_connect_error_propagate() treats it as the resume case and does not
trigger MIG_EVENT_FAILED.

Have you tested regular live migration with resume==true?

Thank you very much!

Dongli Zhang

>      if (!cpr_state_save(cpr_ch, &local_err)) {
>          goto out;
>      }
> @@ -3893,11 +3911,6 @@ void migration_start_outgoing(MigrationState *s)
>      } else {
>          /* This is a fresh new migration */
>          rate_limit = migrate_max_bandwidth();
> -
> -        /* Notify before starting migration thread */
> -        if (migration_call_notifiers(MIG_EVENT_SETUP, &local_err)) {
> -            goto fail;
> -        }
>      }
>  
>      migration_rate_set(rate_limit);


Reply via email to