QEMU will provide an expected downtime for the whole system during
migration, by remembering the total dirty RAM that we synced the last time,
divides the estimated switchover bandwidth.

That was flawed when VFIO is taking into account: consider there is a VFIO
GPU device that contains GBs of data to migrate during stop phase.  Those
will not be accounted in this math.

Fix it by updating dirty_bytes_last_sync properly only when we go to the
next iteration, rather than hide this update in the RAM code.  Meanwhile,
fetch the total (rather than RAM-only) portion of dirty bytes, so as to
include GPU device states too.

Update the comment of the field to reflect its new meaning.

Now after this change, the expected-downtime to be read from query-migrate
should be very accurate even with VFIO devices involved.

Signed-off-by: Peter Xu <[email protected]>
---
 migration/migration-stats.h | 10 +++++-----
 migration/migration.c       | 11 ++++++++---
 migration/ram.c             |  1 -
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index 326ddb0088..14b2773beb 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -31,11 +31,11 @@
  */
 typedef struct {
     /*
-     * Number of bytes that were dirty last time that we synced with
-     * the guest memory.  We use that to calculate the downtime.  As
-     * the remaining dirty amounts to what we know that is still dirty
-     * since last iteration, not counting what the guest has dirtied
-     * since we synchronized bitmaps.
+     * Number of bytes that are still dirty after the last whole-system
+     * sync on dirty information.  We use that to calculate the expected
+     * downtime.  As the remaining dirty amounts to what we know that is
+     * still dirty since last iteration, not counting what the guest has
+     * dirtied since then on either RAM or device states.
      */
     uint64_t dirty_bytes_last_sync;
     /*
diff --git a/migration/migration.c b/migration/migration.c
index 23c78b3a2c..1c00572d14 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3240,18 +3240,23 @@ static void migration_iteration_go_next(MigPendingData 
*pending)
      */
     qemu_savevm_query_pending(pending, false);
 
+    /*
+     * Update the dirty information for the whole system for this
+     * iteration.  This value is used to calculate expected downtime.
+     */
+    qatomic_set(&mig_stats.dirty_bytes_last_sync, pending->total_bytes);
+
     /*
      * Boost dirty sync count to reflect we finished one iteration.
      *
      * NOTE: we need to make sure when this happens (together with the
      * event sent below) all modules have slow-synced the pending data
-     * above.  That means a write mem barrier, but qatomic_add() should be
-     * enough.
+     * above and updated corresponding fields (e.g. dirty_bytes_last_sync).
      *
      * It's because a mgmt could wait on the iteration event to query again
      * on pending data for policy changes (e.g. downtime adjustments).  The
      * ordering will make sure the query will fetch the latest results from
-     * all the modules.
+     * all the modules on everything.
      */
     qatomic_add(&mig_stats.dirty_sync_count, 1);
 
diff --git a/migration/ram.c b/migration/ram.c
index 29e9608715..1bdf121d16 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1148,7 +1148,6 @@ static void migration_bitmap_sync(RAMState *rs, bool 
last_stage)
             RAMBLOCK_FOREACH_NOT_IGNORED(block) {
                 ramblock_sync_dirty_bitmap(rs, block);
             }
-            qatomic_set(&mig_stats.dirty_bytes_last_sync, 
ram_bytes_remaining());
         }
     }
 
-- 
2.50.1


Reply via email to