On Tue, Jun 02, 2026 at 12:26:15PM +0300, Avihai Horon wrote:
> Switchover ACK is checked only during precopy while the guest is still
> running. The last migration_can_switchover() decision and guest stop are
> not atomic, so a device may want to request another switchover ACK in
> the gap after switchover decision has been made but before the guest is
> stopped. Migration would then miss that request, which can increase
> downtime.
> 
> Cover this case by failing the migration if a switchover-ack was
> requested during that time.
> 
> Ideally, precopy iterations should be resumed in this case, however,
> VFIO doesn't support going back to precopy after being stopped, so
> implementing such logic would require non-trivial changes to the guest
> start/stop flow. Given the above and that this case should be rare,
> failing the migration seems reasonable.
> 
> Signed-off-by: Avihai Horon <[email protected]>

Reviewed-by: Peter Xu <[email protected]>

One nit:

> ---
>  migration/migration.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 4bb649a467..6ee1c795ff 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2810,6 +2810,27 @@ static bool 
> migration_switchover_prepare(MigrationState *s)
>      return s->state == MIGRATION_STATUS_DEVICE;
>  }
>  
> +static bool migration_switchover_check_switchover_ack_pending(MigrationState 
> *s,

The name is slightly confusing.  It says "check if there is pending ack"
but then it returns true if there's no pending ACK..

Maybe migration_switchover_is_acknowledged()?

> +                                                              Error **errp)
> +{
> +    uint32_t pending_num;
> +
> +    if (!migrate_switchover_ack() || migrate_switchover_ack_legacy()) {
> +        return true;
> +    }
> +
> +    pending_num = qatomic_read(&s->switchover_ack_pending_num);
> +    if (pending_num > 0) {
> +        error_setg(errp,
> +                   "Switchover ACK was requested by %" PRIu32
> +                   " devices during switchover",
> +                   pending_num);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
>  static bool migration_switchover_start(MigrationState *s, Error **errp)
>  {
>      ERRP_GUARD();
> @@ -2822,6 +2843,15 @@ static bool migration_switchover_start(MigrationState 
> *s, Error **errp)
>  
>      qemu_savevm_query_pending_final(s, &pending);
>  
> +    /*
> +     * Switchover-ack requests done after switchover decision, are not 
> allowed.
> +     * Fail the migration in this case since we currently don't support going
> +     * back to precopy.
> +     */
> +    if (!migration_switchover_check_switchover_ack_pending(s, errp)) {
> +        return false;
> +    }
> +
>      /* Inactivate disks except in COLO */
>      if (!migrate_colo()) {
>          /*
> -- 
> 2.40.1
> 

-- 
Peter Xu


Reply via email to