On Thu, May 21, 2026 at 04:53:54PM +0300, Avihai Horon wrote:
> 
> On 5/19/2026 11:09 PM, Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Tue, May 05, 2026 at 11:14:09AM +0300, Avihai Horon wrote:
> > > Performance tests were done by migrating a single VM with:
> > > * 8 GB RAM
> > > * 4 mlx5 VFIO devices:
> > >    - One device with 1GB of device data (stopcopy data) that runs
> > >      workload during precopy so VFIO_PRECOPY_INFO_REINIT is exercised
> > >      (generate new initial_bytes chunks during precopy).
> > Could you elaborate a bit more on what workload is executed, and how that
> > will affect REINIT reportings (e.g. is only one REINIT generated, or it
> > keeps generating)?
> 
> Basically, I create and destroy RDMA resources (MRs, QPs, CQs, etc.) on the
> VFIO device in a loop for several iterations.
> This generates several REINITs.
> 
> > 
> > Can I understand it in this way: without REINIT, device is forced to put
> > those data into stopcopy size; then with REINIT, some stopcopy size is
> > essentially moved back to precopy phase?
> 
> Almost:
> Without REINIT, the device is forced to put this data in precopy
> dirty_bytes.
> With REINIT, this data can be put in precopy init_bytes (and do the
> switchover-ack dance again).

Hmm, then I don't understand why moving some chunk of data from
precopy_bytes to init_bytes helps downtime.

Essentially, QEMU makes the switchover decision based on the math of:

   init+dirty+stop
   --------------- <= downtime_limit
         bw

The possible min of above is:

        stop
   ---------------
         bw

Here whether some data would be in init or precopy portion shouldn't matter
for a min downtime, since both portions are allowed to be moved during
precopy phase.

OTOH, if stop_bytes unchanged, min downtime is still the same before /
after supporting REINIT, if we try harder.

Say, with below testing results:

With VFIO_PRECOPY_INFO_REINIT:
  1335ms total (~520ms from the VFIO device running the workload).

Without VFIO_PRECOPY_INFO_REINIT:
  2352ms total (~1600ms from the VFIO device running the workload).

What is the downtime_limit you specified for both cases?  Have you tried to
specify lower downtime_limit than what you specified, so that both results
will become even closer (until they become, statistically, identical)?

In general, I can understand the REINIT will stop converging too early, but
it'll be the same IIUC just to turn the downtime_limit smaller..  IOW, I
may still miss some important piece of info that how this REINIT feature
helps downtime..

Thanks,

-- 
Peter Xu


Reply via email to