On 2026-06-12 9:51 AM, Andrey Drobyshev wrote:
> Devices register their own callbacks to MIG_EVENT_* notifications and
> process them accordingly. But some devices, especially during CPR-style
> migration, might need to be aware and process SETUP event earlier on. One
> such example is vhost-based devs, as they need to call RESET_OWNER ioctl
> before the target (in the cpr-transfer case) tries to call SET_OWNER.
>
> Let's move SETUP notification to an earlier place in qmp_migrate(). Before
> this change, SETUP was firing shortly before migration thread creation, and
> FAILED notification balancing it was only fired from
> migration_iteration_finish() (ran in the thread). But now by moving SETUP
> to earlier we widen the window. Hence let's also add FAILED to
> migration_connect_error_propagate() and the special CPR-related case in
> migration_cancel(), to balance things out.
>
> This patch is a rework of the change originally proposed by Steve and
> Ben at [0].
>
> [0]
> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/[email protected]__;!!ACWV5N9M2RV99hQ!PqTirI0SC9ktAapICP7w1YqQ7Fwlq0iPjHspIeoQU_M2xXPnQ1X5CJSCrLZEs5H-7rWU1DU_s-tNL6DD8riRAcMWPwp49pQVqQ$
>
>
> Originally-by: Steve Sistare <[email protected]>
> Originally-by: Ben Chaney <[email protected]>
> Signed-off-by: Andrey Drobyshev <[email protected]>
> ---
> migration/migration.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 074d3f2c697..3c70467d314 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1471,6 +1471,13 @@ void migration_connect_error_propagate(MigrationState
> *s, Error *error)
> migrate_error_propagate(s, error);
>
> if (!resume) {
> + /*
> + * Emit FAILED for every pre-thread failure/cancel that ends up here,
> + * to balance the SETUP sent in qmp_migrate().
> + */
> + if (migration_has_failed(s)) {
> + migration_call_notifiers(MIG_EVENT_FAILED, NULL);
> + }
> migration_cleanup(s);
> }
> }
> @@ -1530,6 +1537,12 @@ void migration_cancel(void)
> if (cpr_transfer_source_active(s)) {
> assert(migrate_mode() == MIG_MODE_CPR_TRANSFER);
> assert(setup && !s->to_dst_file);
> + /*
> + * This path bypasses both migration_iteration_finish() and
> + * migration_connect_error_propagate(), so emit FAILED here to
> balance
> + * the SETUP sent in qmp_migrate(), before the VM is resumed.
> + */
> + migration_call_notifiers(MIG_EVENT_FAILED, NULL);
> migration_cleanup(s);
> /* Now all things should have been released */
> assert(!cpr_transfer_source_active(s));
> @@ -2117,6 +2130,11 @@ void qmp_migrate(const char *uri, bool has_channels,
> */
> Error *local_err = NULL;
>
> + /* Notify before starting migration thread and before starting CPR */
> + if (migration_call_notifiers(MIG_EVENT_SETUP, &local_err)) {
> + goto out;
> + }
> +
This affects not only CPR, but also regular live migration and postcopy recovery
(i.e. when resume=true). Even with resume=true, qmp_migrate() now triggers
MIG_EVENT_SETUP unconditionally at the beginning. If recovery fails,
migration_connect_error_propagate() treats it as the resume case and does not
trigger MIG_EVENT_FAILED.
Have you tested regular live migration with resume==true?
Thank you very much!
Dongli Zhang
> if (!cpr_state_save(cpr_ch, &local_err)) {
> goto out;
> }
> @@ -3893,11 +3911,6 @@ void migration_start_outgoing(MigrationState *s)
> } else {
> /* This is a fresh new migration */
> rate_limit = migrate_max_bandwidth();
> -
> - /* Notify before starting migration thread */
> - if (migration_call_notifiers(MIG_EVENT_SETUP, &local_err)) {
> - goto fail;
> - }
> }
>
> migration_rate_set(rate_limit);