On Thu, May 21, 2026 at 04:46:31PM +0300, Avihai Horon wrote:
> 
> On 5/19/2026 10:58 PM, Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Tue, May 05, 2026 at 11:14:18AM +0300, Avihai Horon wrote:
> > > When precopy initial_bytes reaches zero VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
> > > flag is sent to the destination to indicate that initial data has been
> > > sent, so destination can indicate back to source when it finished
> > > loading it.
> > > 
> > > To get a more accurate estimation of initial_bytes, re-query precopy
> > > size before sending the flag. Extract the flag sending logic from
> > > vfio_save_iterate() to a new helper for clarity.
> > > 
> > > This may prevent premature sending of VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
> > > flag if, for example, the previously queried initial_bytes was lower
> > > than actually is. Additionally, it prevents sending the flag if
> > > vfio_query_precopy_size() failed.
> > > 
> > > Signed-off-by: Avihai Horon <[email protected]>
> > > ---
> > >   hw/vfio/migration.c  | 37 ++++++++++++++++++++++++++++++++-----
> > >   hw/vfio/trace-events |  1 +
> > >   2 files changed, 33 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> > > index 2911583ee1..243624b5fe 100644
> > > --- a/hw/vfio/migration.c
> > > +++ b/hw/vfio/migration.c
> > > @@ -456,6 +456,37 @@ static void 
> > > vfio_update_estimated_pending_data(VFIOMigration *migration,
> > >                                            data_size);
> > >   }
> > > 
> > > +/* Returns true if the init data flag was sent, false otherwise */
> > > +static bool vfio_send_init_data_flag(QEMUFile *f, VFIOMigration 
> > > *migration)
> > > +{
> > > +    VFIODevice *vbasedev = migration->vbasedev;
> > > +    int ret;
> > > +
> > > +    if (!migrate_switchover_ack()) {
> > > +        return false;
> > > +    }
> > > +
> > > +    if (migration->precopy_init_size || migration->initial_data_sent) {
> > > +        return false;
> > > +    }

[1]

> > > +
> > > +    /*
> > > +     * precopy_init_size holds an estimation of the initial data size, 
> > > re-query
> > > +     * precopy size to ensure it's really zero before sending init data 
> > > flag.
> > > +     * Don't send the flag if query fails.
> > > +     */
> > > +    ret = vfio_query_precopy_size(migration);
> > > +    if (ret || migration->precopy_init_size) {
> > > +        return false;
> > > +    }
> > IIUC this chunk isn't necessary?  If we don't expect REINIT to happen that
> > much (when NIC reconfigures?), then we can still rely on the window where
> > the "new switchover ack" will be requested later on during the exact sync.
> > 
> > Relying on that seems slightly cleaner.
> 
> Not sure I follow.
> 
> New switchover ack is requested in exact sync if we see new init_bytes > 0
> (REINIT flag).
> This flow happens only after the new switchover ack is requested in exact
> sync, when init_bytes = 0 again.
> 
> So this chunk just makes sure we send the VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
> flag at the right time.

AFAIU, what this chunk does is, we may save one switchover-ack if REINIT
got here.  It doesn't provide much functional difference in reality.

With this code there, when it happens to see REINIT, instead of sending an
immediate VFIO_MIG_FLAG_DEV_INIT_DATA_SENT message, it falls back to send
init data in the next iteration loop, saving that flag, and saving a
"request switchover-ack" on src QEMU too.

If above code removed, IIUC VFIO will send VFIO_MIG_FLAG_DEV_INIT_DATA_SENT
immediately causing dest sends ACK.  vfio_query_precopy_size() will be
postponed until the next sync query (which must happen at some point before
final switchover), then it will be collected there, VFIO src will request
for switchover-ack, then another VFIO_MIG_FLAG_DEV_INIT_DATA_SENT is
expected.

Both should work, but what I meant is, I think we don't need this random
check, because it's optimistic, it's not functionally necessary, IIUC.

IOW, see the current code and how it can still race with a REINIT anyway:

             migration thread                                  some vfio driver 
thread

    ret = vfio_query_precopy_size(migration);
    if (ret || migration->precopy_init_size) {
        return false;
    }
                                                                   got 
reconfigured,
                                                                   set REINIT

    qemu_put_be64(f, VFIO_MIG_FLAG_DEV_INIT_DATA_SENT);
    migration->initial_data_sent = true;
    trace_vfio_send_init_data_flag(vbasedev->name);

It's the same to me if e.g. we try to vfio_query_precopy_size() in VFIO's
iterative loops from time to time, it'll also work, it'll make sync more
frequent, but it's not needed.

Thanks,

-- 
Peter Xu


Reply via email to