On Mon, Jul 24, 2023 at 03:47:50PM -0400, Peter Xu wrote:
> On Mon, Jul 24, 2023 at 07:04:29PM +0100, Daniel P. Berrangé wrote:
> > On Mon, Jul 24, 2023 at 01:07:55PM -0400, Peter Xu wrote:
> > > Migration bandwidth is a very important value to live migration.  It's
> > > because it's one of the major factors that we'll make decision on when to
> > > switchover to destination in a precopy process.
> > 
> > To elaborate on this for those reading along...
> > 
> > QEMU takes maxmimum downtime limit and multiplies by its estimate
> > of bandwidth. This gives a figure for the amount of data QEMU thinks
> > it can transfer within the downtime period.
> > 
> > QEMU compares this figure to the amount of data that is still pending
> > at the end of an iteration.
> > 
> > > This value is currently estimated by QEMU during the whole live migration
> > > process by monitoring how fast we were sending the data.  This can be the
> > > most accurate bandwidth if in the ideal world, where we're always feeding
> > > unlimited data to the migration channel, and then it'll be limited to the
> > > bandwidth that is available.
> > 
> > The QEMU estimate for available bandwidth will definitely be wrong,
> > potentially by orders of magnitude, if QEMU has a max bandwidth limit
> > set, as in that case it is never trying to push the peak rates available
> > from the NICs/network fabric.
> > 
> > > The issue is QEMU itself may not be able to avoid those uncertainties on
> > > measuing the real "available migration bandwidth".  At least not something
> > > I can think of so far.
> > 
> > IIUC, you can query the NIC properties to find the hardware transfer
> > rate of the NICs. That doesn't imply apps can actually reach that
> > rate in practice - it has a decent chance of being a over-estimate
> > of bandwidth, possibly very very much over.
> > 
> > Is such an over estimate better or worse than QEMU's current
> > under-estimate ? It depends on the POV.
> > 
> > From the POV of QEMU, over-estimating means means it'll be not
> > be throttling as much as it should. That's not a downside of
> > migration - it makes it more likely for migration to complete :-)
> 
> Heh. :)
> 
> > 
> > From the POV of non-QEMU apps though, if QEMU over-estimates,
> > it'll mean other apps get starved of network bandwidth.
> > 
> > Overall I agree, there's no obvious way QEMU can ever come up
> > with a reliable estimate for bandwidth available.
> > 
> > > One way to fix this is when the user is fully aware of the available
> > > bandwidth, then we can allow the user to help providing an accurate value.
> > >
> > > For example, if the user has a dedicated channel of 10Gbps for migration
> > > for this specific VM, the user can specify this bandwidth so QEMU can
> > > always do the calculation based on this fact, trusting the user as long as
> > > specified.
> > 
> > I can see that in theory, but when considering a non-trivial
> > deployments of QEMU, I wonder if the user can really have any
> > such certainty of what is truely avaialble. It would need
> > global awareness of the whole network of hosts & workloads.
> 
> Indeed it may or may not be easy always.
> 
> The good thing about this parameter is we always use the old estimation if
> the user can't specify anything valid, so this is always optional not
> required.
> 
> It solves the cases where the user can still specify accurately on the bw -
> our QE team has already verified that it worked for us on GPU tests, where
> it used to not be able to migrate at all with any sane downtime specified.
> I should have attached a Tested-By from Zhiyi but since this is not exactly
> the patch he was using I didn't.
> 
> > 
> > > When the user wants to have migration only use 5Gbps out of that 10Gbps,
> > > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 
> > > 5Gbps
> > > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for
> > > other things).  So it can be useful even if the network is not dedicated,
> > > but as long as the user can know a solid value.
> > > 
> > > A new parameter "available-bandwidth" is introduced just for this. So when
> > > the user specified this parameter, instead of trusting the estimated value
> > > from QEMU itself (based on the QEMUFile send speed), let's trust the user
> > > more.
> > 
> > I feel like rather than "available-bandwidth", we should call
> > it "max-convergance-bandwidth".
> > 
> > To me that name would better reflect the fact that this isn't
> > really required to be a measure of how much NIC bandwidth is
> > available. It is merely an expression of a different bandwidth
> > limit to apply during switch over.
> > 
> > IOW
> > 
> > * max-bandwidth: limit during pre-copy main transfer
> > * max-convergance-bandwidth: limit during pre-copy switch-over
> > * max-postcopy-banwidth: limit during post-copy phase
> 
> I worry the new name suggested is not straightforward enough at the 1st
> glance, even to me as a developer.
> 
> "available-bandwidth" doesn't even bind that value to "convergence" at all,
> even though it was for solving this specific problem here. I wanted to make
> this parameter sololy for the admin to answer the question "how much
> bandwidth is available to QEMU migration in general?"  That's pretty much
> straightforward IMHO.  With that, it's pretty sane to consider using all we
> have during switchover (aka, unlimited bandwidth, as fast as possible).
> 
> Maybe at some point we can even leverage this information for other purpose
> rather than making the migration converge.

The flipside is that the semantics & limits we want for convergance
are already known to be different from what we wanted for pre-copy
and post-copy. With that existing practice, it is probably more
likely that we would not want to re-use the same setting for different
cases, which makes me think a specifically targetted parameter is
better.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to