On Tue, May 05, 2026 at 11:14:18AM +0300, Avihai Horon wrote:
> When precopy initial_bytes reaches zero VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
> flag is sent to the destination to indicate that initial data has been
> sent, so destination can indicate back to source when it finished
> loading it.
>
> To get a more accurate estimation of initial_bytes, re-query precopy
> size before sending the flag. Extract the flag sending logic from
> vfio_save_iterate() to a new helper for clarity.
>
> This may prevent premature sending of VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
> flag if, for example, the previously queried initial_bytes was lower
> than actually is. Additionally, it prevents sending the flag if
> vfio_query_precopy_size() failed.
>
> Signed-off-by: Avihai Horon <[email protected]>
> ---
> hw/vfio/migration.c | 37 ++++++++++++++++++++++++++++++++-----
> hw/vfio/trace-events | 1 +
> 2 files changed, 33 insertions(+), 5 deletions(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 2911583ee1..243624b5fe 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -456,6 +456,37 @@ static void
> vfio_update_estimated_pending_data(VFIOMigration *migration,
> data_size);
> }
>
> +/* Returns true if the init data flag was sent, false otherwise */
> +static bool vfio_send_init_data_flag(QEMUFile *f, VFIOMigration *migration)
> +{
> + VFIODevice *vbasedev = migration->vbasedev;
> + int ret;
> +
> + if (!migrate_switchover_ack()) {
> + return false;
> + }
> +
> + if (migration->precopy_init_size || migration->initial_data_sent) {
> + return false;
> + }
> +
> + /*
> + * precopy_init_size holds an estimation of the initial data size,
> re-query
> + * precopy size to ensure it's really zero before sending init data flag.
> + * Don't send the flag if query fails.
> + */
> + ret = vfio_query_precopy_size(migration);
> + if (ret || migration->precopy_init_size) {
> + return false;
> + }
IIUC this chunk isn't necessary? If we don't expect REINIT to happen that
much (when NIC reconfigures?), then we can still rely on the window where
the "new switchover ack" will be requested later on during the exact sync.
Relying on that seems slightly cleaner.
> +
> + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
> + migration->initial_data_sent = true;
> + trace_vfio_send_init_data_flag(vbasedev->name);
> +
> + return true;
> +}
> +
> static bool vfio_precopy_supported(VFIODevice *vbasedev)
> {
> VFIOMigration *migration = vbasedev->migration;
> @@ -664,11 +695,7 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque)
>
> vfio_update_estimated_pending_data(migration, data_size);
>
> - if (migrate_switchover_ack() && !migration->precopy_init_size &&
> - !migration->initial_data_sent) {
> - qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
> - migration->initial_data_sent = true;
> - } else {
> + if (!vfio_send_init_data_flag(f, migration)) {
> qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> }
>
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index ab27ff5ea2..e91858354c 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -176,6 +176,7 @@ vfio_save_iterate(const char *name, uint64_t
> precopy_init_size, uint64_t precopy
> vfio_save_iterate_start(const char *name) " (%s)"
> vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data
> buffer size %"PRIu64
> vfio_state_pending(const char *name, uint64_t stopcopy_size, uint64_t
> precopy_init_size, uint64_t precopy_dirty_size, bool exact) " (%s) stopcopy
> size %"PRIu64" precopy initial size %"PRIu64" precopy dirty size %"PRIu64 "
> exact %d"
> +vfio_send_init_data_flag(const char *name) " (%s)"
> vfio_vmstate_change(const char *name, int running, const char *reason, const
> char *dev_state) " (%s) running %d reason %s device state %s"
> vfio_vmstate_change_prepare(const char *name, int running, const char
> *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
>
> --
> 2.40.1
>
--
Peter Xu