QEMU will provide an expected downtime for the whole system during migration, by remembering the total dirty RAM that we synced the last time, divides the estimated switchover bandwidth.
That was flawed when VFIO is taking into account: consider there is a VFIO GPU device that contains GBs of data to migrate during stop phase. Those will not be accounted in this math. Fix it by updating dirty_bytes_last_sync properly only when we go to the next iteration, rather than hide this update in the RAM code. Meanwhile, fetch the total (rather than RAM-only) portion of dirty bytes, so as to include GPU device states too. Update the comment of the field to reflect its new meaning. Now after this change, the expected-downtime to be read from query-migrate should be very accurate even with VFIO devices involved. Tested-by: Cédric Le Goater <[email protected]> Reviewed-by: Juraj Marcin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> --- migration/migration-stats.h | 8 +++----- migration/migration.c | 11 ++++++++--- migration/ram.c | 1 - 3 files changed, 11 insertions(+), 9 deletions(-) diff --git a/migration/migration-stats.h b/migration/migration-stats.h index 326ddb0088..1775b916df 100644 --- a/migration/migration-stats.h +++ b/migration/migration-stats.h @@ -31,11 +31,9 @@ */ typedef struct { /* - * Number of bytes that were dirty last time that we synced with - * the guest memory. We use that to calculate the downtime. As - * the remaining dirty amounts to what we know that is still dirty - * since last iteration, not counting what the guest has dirtied - * since we synchronized bitmaps. + * Number of bytes that were reported dirty after the latest + * system-wise synchronization of dirty information. It is used to do + * best-effort estimation on expected downtime. */ uint64_t dirty_bytes_last_sync; /* diff --git a/migration/migration.c b/migration/migration.c index d740d9df85..ab09dcbcf4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3244,18 +3244,23 @@ static void migration_iteration_go_next(MigPendingData *pending) */ qemu_savevm_query_pending(pending, true); + /* + * Update the dirty information for the whole system for this + * iteration. This value is used to calculate expected downtime. + */ + qatomic_set(&mig_stats.dirty_bytes_last_sync, pending->total_bytes); + /* * Boost dirty sync count to reflect we finished one iteration. * * NOTE: we need to make sure when this happens (together with the * event sent below) all modules have slow-synced the pending data - * above. That means a write mem barrier, but qatomic_add() should be - * enough. + * above and updated corresponding fields (e.g. dirty_bytes_last_sync). * * It's because a mgmt could wait on the iteration event to query again * on pending data for policy changes (e.g. downtime adjustments). The * ordering will make sure the query will fetch the latest results from - * all the modules. + * all the modules on everything. */ qatomic_add(&mig_stats.dirty_sync_count, 1); diff --git a/migration/ram.c b/migration/ram.c index ecd4b6165c..fc38ffbf8a 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1148,7 +1148,6 @@ static void migration_bitmap_sync(RAMState *rs, bool last_stage) RAMBLOCK_FOREACH_NOT_IGNORED(block) { ramblock_sync_dirty_bitmap(rs, block); } - qatomic_set(&mig_stats.dirty_bytes_last_sync, ram_bytes_remaining()); } } -- 2.53.0
