On Mon, Sep 15, 2025 at 01:59:14PM +0200, Juraj Marcin wrote: > From: Juraj Marcin <[email protected]> > > Currently, there are two functions that are responsible for cleanup of > the incoming migration state. With successful precopy, it's the main > thread and with successful postcopy it's the listen thread. However, if > postcopy fails during in the device load, both functions will try to do > the cleanup. Moreover, when exit-on-error parameter was added, it was > applied only to precopy. > > This patch refactors common cleanup and exiting on error into a helper > function that can be started either from precopy or postcopy, reducing > the duplication. If the listen thread has been started (the postcopy > state is at least LISTENING), the listen thread is responsible for all > cleanup and exiting, otherwise it's the main thread's responsibility.
Looks almost good, thanks! Only nitpicks below. > > Signed-off-by: Juraj Marcin <[email protected]> > --- > migration/migration.c | 64 ++++++++++++++++++++++++------------------- > migration/migration.h | 1 + > migration/savevm.c | 48 +++++++++++--------------------- > 3 files changed, 53 insertions(+), 60 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 2c0b3a7229..7222e3de13 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -442,9 +442,19 @@ void > migration_incoming_transport_cleanup(MigrationIncomingState *mis) > void migration_incoming_state_destroy(void) > { > struct MigrationIncomingState *mis = migration_incoming_get_current(); > + PostcopyState ps = postcopy_state_get(); > > multifd_recv_cleanup(); > > + if (mis->have_listen_thread) { > + qemu_thread_join(&mis->listen_thread); > + mis->have_listen_thread = false; > + } Maybe this fits more to be in postcopy_ram_incoming_cleanup() below? > + > + if (ps != POSTCOPY_INCOMING_NONE) { > + postcopy_ram_incoming_cleanup(mis); > + } > + > /* > * RAM state cleanup needs to happen after multifd cleanup, because > * multifd threads can use some of its states (receivedmap). > @@ -809,6 +819,23 @@ static void qemu_start_incoming_migration(const char > *uri, bool has_channels, > cpr_state_close(); > } > > +void migration_incoming_finish(void) > +{ > + MigrationState *s = migrate_get_current(); > + MigrationIncomingState *mis = migration_incoming_get_current(); > + > + migration_incoming_state_destroy(); > + > + if (migration_has_failed(mis->state) && mis->exit_on_error) { If you agree on my comment in patch 2, we can keep checking against FAILED. > + WITH_QEMU_LOCK_GUARD(&s->error_mutex) { > + error_report_err(s->error); > + s->error = NULL; > + } > + > + exit(EXIT_FAILURE); > + } > +} > + > static void process_incoming_migration_bh(void *opaque) > { > MigrationIncomingState *mis = opaque; > @@ -861,7 +888,7 @@ static void process_incoming_migration_bh(void *opaque) > */ > migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, > MIGRATION_STATUS_COMPLETED); > - migration_incoming_state_destroy(); > + migration_incoming_finish(); This is exactly the same as before when we know it's succeeding, but I think I get your point, always using migration_incoming_finish() should be fine. > } > > static void coroutine_fn > @@ -888,23 +915,13 @@ process_incoming_migration_co(void *opaque) > > ps = postcopy_state_get(); > trace_process_incoming_migration_co_end(ret, ps); > - if (ps != POSTCOPY_INCOMING_NONE) { > - if (ps == POSTCOPY_INCOMING_ADVISE) { > - /* > - * Where a migration had postcopy enabled (and thus went to > advise) > - * but managed to complete within the precopy period, we can use > - * the normal exit. > - */ > - postcopy_ram_incoming_cleanup(mis); > - } else if (ret >= 0) { > - /* > - * Postcopy was started, cleanup should happen at the end of the > - * postcopy thread. > - */ > - trace_process_incoming_migration_co_postcopy_end_main(); > - goto out; > - } > - /* Else if something went wrong then just fall out of the normal > exit */ > + if (ps >= POSTCOPY_INCOMING_LISTENING) { > + /* > + * Postcopy was started, cleanup should happen at the end of the > + * postcopy thread. Postcopy has >1 threads, better mention "at the end of postcopy ram listen thread", that helps explain why checking >= POSTCOPY_INCOMING_LISTENING, because that event creates the ram listen thread. > + */ > + trace_process_incoming_migration_co_postcopy_end_main(); > + goto out; > } > > if (ret < 0) { > @@ -926,16 +943,7 @@ fail: > migrate_set_error(s, local_err); > error_free(local_err); > > - migration_incoming_state_destroy(); > - > - if (mis->exit_on_error) { > - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { > - error_report_err(s->error); > - s->error = NULL; > - } > - > - exit(EXIT_FAILURE); > - } > + migration_incoming_finish(); > out: > /* Pairs with the refcount taken in qmp_migrate_incoming() */ > migrate_incoming_unref_outgoing_state(); > diff --git a/migration/migration.h b/migration/migration.h > index 2c2331f40d..67e3318467 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -518,6 +518,7 @@ void migrate_set_state(MigrationStatus *state, > MigrationStatus old_state, > void migration_fd_process_incoming(QEMUFile *f); > void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp); > void migration_incoming_process(void); > +void migration_incoming_finish(void); > > bool migration_has_all_channels(void); > > diff --git a/migration/savevm.c b/migration/savevm.c > index fabbeb296a..d7eb416d48 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -2069,6 +2069,11 @@ static int > loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis, > return 0; > } > > +static void postcopy_ram_listen_thread_bh(void *opaque) > +{ > + migration_incoming_finish(); > +} > + > /* > * Triggered by a postcopy_listen command; this thread takes over reading > * the input stream, leaving the main thread free to carry on loading the > rest > @@ -2122,52 +2127,31 @@ static void *postcopy_ram_listen_thread(void *opaque) > "bitmaps may be lost, and present migrated dirty " > "bitmaps are correctly migrated and valid.", > __func__, load_res); > - load_res = 0; /* prevent further exit() */ > } else { > error_report("%s: loadvm failed: %d", __func__, load_res); > migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, > MIGRATION_STATUS_FAILED); > + goto out; > } > } > - if (load_res >= 0) { > - /* > - * This looks good, but it's possible that the device loading in the > - * main thread hasn't finished yet, and so we might not be in 'RUN' > - * state yet; wait for the end of the main thread. > - */ > - qemu_event_wait(&mis->main_thread_load_event); > - } > - postcopy_ram_incoming_cleanup(mis); > - > - if (load_res < 0) { > - /* > - * If something went wrong then we have a bad state so exit; > - * depending how far we got it might be possible at this point > - * to leave the guest running and fire MCEs for pages that never > - * arrived as a desperate recovery step. > - */ > - rcu_unregister_thread(); > - exit(EXIT_FAILURE); > - } > + /* > + * This looks good, but it's possible that the device loading in the > + * main thread hasn't finished yet, and so we might not be in 'RUN' > + * state yet; wait for the end of the main thread. > + */ > + qemu_event_wait(&mis->main_thread_load_event); > > migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, > MIGRATION_STATUS_COMPLETED); > - /* > - * If everything has worked fine, then the main thread has waited > - * for us to start, and we're the last use of the mis. > - * (If something broke then qemu will have to exit anyway since it's > - * got a bad migration state). > - */ > - bql_lock(); > - migration_incoming_state_destroy(); > - bql_unlock(); > > +out: > rcu_unregister_thread(); > - mis->have_listen_thread = false; > postcopy_state_set(POSTCOPY_INCOMING_END); > > object_unref(OBJECT(migr)); > > + migration_bh_schedule(postcopy_ram_listen_thread_bh, NULL); > + > return NULL; > } > > @@ -2217,7 +2201,7 @@ static int > loadvm_postcopy_handle_listen(MigrationIncomingState *mis) > mis->have_listen_thread = true; > postcopy_thread_create(mis, &mis->listen_thread, > MIGRATION_THREAD_DST_LISTEN, > - postcopy_ram_listen_thread, QEMU_THREAD_DETACHED); > + postcopy_ram_listen_thread, QEMU_THREAD_JOINABLE); Nit once more: better mention this change in commit message. Thanks! > trace_loadvm_postcopy_handle_listen("return"); > > return 0; > -- > 2.51.0 > -- Peter Xu
