On Tue, Jun 02, 2026 at 12:26:15PM +0300, Avihai Horon wrote: > Switchover ACK is checked only during precopy while the guest is still > running. The last migration_can_switchover() decision and guest stop are > not atomic, so a device may want to request another switchover ACK in > the gap after switchover decision has been made but before the guest is > stopped. Migration would then miss that request, which can increase > downtime. > > Cover this case by failing the migration if a switchover-ack was > requested during that time. > > Ideally, precopy iterations should be resumed in this case, however, > VFIO doesn't support going back to precopy after being stopped, so > implementing such logic would require non-trivial changes to the guest > start/stop flow. Given the above and that this case should be rare, > failing the migration seems reasonable. > > Signed-off-by: Avihai Horon <[email protected]>
Reviewed-by: Peter Xu <[email protected]> One nit: > --- > migration/migration.c | 30 ++++++++++++++++++++++++++++++ > 1 file changed, 30 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 4bb649a467..6ee1c795ff 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -2810,6 +2810,27 @@ static bool > migration_switchover_prepare(MigrationState *s) > return s->state == MIGRATION_STATUS_DEVICE; > } > > +static bool migration_switchover_check_switchover_ack_pending(MigrationState > *s, The name is slightly confusing. It says "check if there is pending ack" but then it returns true if there's no pending ACK.. Maybe migration_switchover_is_acknowledged()? > + Error **errp) > +{ > + uint32_t pending_num; > + > + if (!migrate_switchover_ack() || migrate_switchover_ack_legacy()) { > + return true; > + } > + > + pending_num = qatomic_read(&s->switchover_ack_pending_num); > + if (pending_num > 0) { > + error_setg(errp, > + "Switchover ACK was requested by %" PRIu32 > + " devices during switchover", > + pending_num); > + return false; > + } > + > + return true; > +} > + > static bool migration_switchover_start(MigrationState *s, Error **errp) > { > ERRP_GUARD(); > @@ -2822,6 +2843,15 @@ static bool migration_switchover_start(MigrationState > *s, Error **errp) > > qemu_savevm_query_pending_final(s, &pending); > > + /* > + * Switchover-ack requests done after switchover decision, are not > allowed. > + * Fail the migration in this case since we currently don't support going > + * back to precopy. > + */ > + if (!migration_switchover_check_switchover_ack_pending(s, errp)) { > + return false; > + } > + > /* Inactivate disks except in COLO */ > if (!migrate_colo()) { > /* > -- > 2.40.1 > -- Peter Xu
