Switchover ACK is checked only during precopy while the guest is still
running. The last migration_can_switchover() decision and guest stop are
not atomic, so a device may want to request another switchover ACK in
the gap after switchover decision has been made but before the guest is
stopped. Migration would then miss that request, which can increase
downtime.

Cover this case by failing the migration if a switchover-ack was
requested during that time.

Ideally, precopy iterations should be resumed in this case, however,
VFIO doesn't support going back to precopy after being stopped, so
implementing such logic would require non-trivial changes to the guest
start/stop flow. Given the above and that this case should be rare,
failing the migration seems reasonable.

Signed-off-by: Avihai Horon <[email protected]>
---
 migration/migration.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 4bb649a467..6ee1c795ff 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2810,6 +2810,27 @@ static bool migration_switchover_prepare(MigrationState 
*s)
     return s->state == MIGRATION_STATUS_DEVICE;
 }
 
+static bool migration_switchover_check_switchover_ack_pending(MigrationState 
*s,
+                                                              Error **errp)
+{
+    uint32_t pending_num;
+
+    if (!migrate_switchover_ack() || migrate_switchover_ack_legacy()) {
+        return true;
+    }
+
+    pending_num = qatomic_read(&s->switchover_ack_pending_num);
+    if (pending_num > 0) {
+        error_setg(errp,
+                   "Switchover ACK was requested by %" PRIu32
+                   " devices during switchover",
+                   pending_num);
+        return false;
+    }
+
+    return true;
+}
+
 static bool migration_switchover_start(MigrationState *s, Error **errp)
 {
     ERRP_GUARD();
@@ -2822,6 +2843,15 @@ static bool migration_switchover_start(MigrationState 
*s, Error **errp)
 
     qemu_savevm_query_pending_final(s, &pending);
 
+    /*
+     * Switchover-ack requests done after switchover decision, are not allowed.
+     * Fail the migration in this case since we currently don't support going
+     * back to precopy.
+     */
+    if (!migration_switchover_check_switchover_ack_pending(s, errp)) {
+        return false;
+    }
+
     /* Inactivate disks except in COLO */
     if (!migrate_colo()) {
         /*
-- 
2.40.1


Reply via email to