On 6/3/2026 11:42 PM, Peter Xu wrote:
External email: Use caution opening links or attachments


On Tue, Jun 02, 2026 at 12:26:15PM +0300, Avihai Horon wrote:
Switchover ACK is checked only during precopy while the guest is still
running. The last migration_can_switchover() decision and guest stop are
not atomic, so a device may want to request another switchover ACK in
the gap after switchover decision has been made but before the guest is
stopped. Migration would then miss that request, which can increase
downtime.

Cover this case by failing the migration if a switchover-ack was
requested during that time.

Ideally, precopy iterations should be resumed in this case, however,
VFIO doesn't support going back to precopy after being stopped, so
implementing such logic would require non-trivial changes to the guest
start/stop flow. Given the above and that this case should be rare,
failing the migration seems reasonable.

Signed-off-by: Avihai Horon <[email protected]>
Reviewed-by: Peter Xu <[email protected]>

One nit:

---
  migration/migration.c | 30 ++++++++++++++++++++++++++++++
  1 file changed, 30 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 4bb649a467..6ee1c795ff 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2810,6 +2810,27 @@ static bool migration_switchover_prepare(MigrationState 
*s)
      return s->state == MIGRATION_STATUS_DEVICE;
  }

+static bool migration_switchover_check_switchover_ack_pending(MigrationState 
*s,
The name is slightly confusing.  It says "check if there is pending ack"
but then it returns true if there's no pending ACK..

Maybe migration_switchover_is_acknowledged()?

Sure, or maybe migration_have_new_switchover_ack_pending()?

I just noticed that I overlooked one corner case:
If guest is stopped during precopy and source has pending acks > 0 (e.g., virsh migrate with --timeout param), then we will fail in this check and unnecessarily abort migration.

I thought about doing the change below. It's not as clean as I'd want it to be, but I have no other ideas.

===========8<===========
diff --git a/migration/migration.c b/migration/migration.c
index 6ee1c795ff..6054453b18 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2810,37 +2810,47 @@ static bool migration_switchover_prepare(MigrationState *s)
     return s->state == MIGRATION_STATUS_DEVICE;
 }

-static bool migration_switchover_check_switchover_ack_pending(MigrationState *s,
- Error **errp)
+static bool migration_have_new_switchover_ack_pending(
+    MigrationState *s, uint32_t old_switchover_ack_pending_num, Error **errp)
 {
     uint32_t pending_num;

     if (!migrate_switchover_ack() || migrate_switchover_ack_legacy()) {
-        return true;
+        return false;
     }

     pending_num = qatomic_read(&s->switchover_ack_pending_num);
-    if (pending_num > 0) {
+    if (pending_num > old_switchover_ack_pending_num) {
         error_setg(errp,
                    "Switchover ACK was requested by %" PRIu32
                    " devices during switchover",
-                   pending_num);
-        return false;
+                   pending_num - old_switchover_ack_pending_num);
+        return true;
     }

-    return true;
+    return false;
 }

 static bool migration_switchover_start(MigrationState *s, Error **errp)
 {
     ERRP_GUARD();
     MigPendingData pending = {};
+    uint32_t old_switchover_ack_pending_num;

     if (!migration_switchover_prepare(s)) {
         error_setg(errp, "Switchover is interrupted");
         return false;
     }

+    /*
+     * If the guest was stopped during precopy (e.g., by management), we may get
+     * here with switchover_ack_pending_num > 0. Store the current
+     * switchover_ack_pending_num value before the final pending query to
+     * distinguish between existing pending ACKs and new ones.
+     */
+    old_switchover_ack_pending_num =
+        qatomic_read(&s->switchover_ack_pending_num);
+
     qemu_savevm_query_pending_final(s, &pending);

     /*
@@ -2848,7 +2858,8 @@ static bool migration_switchover_start(MigrationState *s, Error **errp)       * Fail the migration in this case since we currently don't support going
      * back to precopy.
      */
-    if (!migration_switchover_check_switchover_ack_pending(s, errp)) {
+    if (migration_have_new_switchover_ack_pending(
+            s, old_switchover_ack_pending_num, errp)) {
         return false;
     }

===========8<===========

Thanks.


+                                                              Error **errp)
+{
+    uint32_t pending_num;
+
+    if (!migrate_switchover_ack() || migrate_switchover_ack_legacy()) {
+        return true;
+    }
+
+    pending_num = qatomic_read(&s->switchover_ack_pending_num);
+    if (pending_num > 0) {
+        error_setg(errp,
+                   "Switchover ACK was requested by %" PRIu32
+                   " devices during switchover",
+                   pending_num);
+        return false;
+    }
+
+    return true;
+}
+
  static bool migration_switchover_start(MigrationState *s, Error **errp)
  {
      ERRP_GUARD();
@@ -2822,6 +2843,15 @@ static bool migration_switchover_start(MigrationState 
*s, Error **errp)

      qemu_savevm_query_pending_final(s, &pending);

+    /*
+     * Switchover-ack requests done after switchover decision, are not allowed.
+     * Fail the migration in this case since we currently don't support going
+     * back to precopy.
+     */
+    if (!migration_switchover_check_switchover_ack_pending(s, errp)) {
+        return false;
+    }
+
      /* Inactivate disks except in COLO */
      if (!migrate_colo()) {
          /*
--
2.40.1

--
Peter Xu


Reply via email to