On Mon, Jul 24, 2023 at 07:04:29PM +0100, Daniel P. Berrangé wrote:
> On Mon, Jul 24, 2023 at 01:07:55PM -0400, Peter Xu wrote:
> > Migration bandwidth is a very important value to live migration.  It's
> > because it's one of the major factors that we'll make decision on when to
> > switchover to destination in a precopy process.
> 
> To elaborate on this for those reading along...
> 
> QEMU takes maxmimum downtime limit and multiplies by its estimate
> of bandwidth. This gives a figure for the amount of data QEMU thinks
> it can transfer within the downtime period.
> 
> QEMU compares this figure to the amount of data that is still pending
> at the end of an iteration.
> 
> > This value is currently estimated by QEMU during the whole live migration
> > process by monitoring how fast we were sending the data.  This can be the
> > most accurate bandwidth if in the ideal world, where we're always feeding
> > unlimited data to the migration channel, and then it'll be limited to the
> > bandwidth that is available.
> 
> The QEMU estimate for available bandwidth will definitely be wrong,
> potentially by orders of magnitude, if QEMU has a max bandwidth limit
> set, as in that case it is never trying to push the peak rates available
> from the NICs/network fabric.
> 
> > The issue is QEMU itself may not be able to avoid those uncertainties on
> > measuing the real "available migration bandwidth".  At least not something
> > I can think of so far.
> 
> IIUC, you can query the NIC properties to find the hardware transfer
> rate of the NICs. That doesn't imply apps can actually reach that
> rate in practice - it has a decent chance of being a over-estimate
> of bandwidth, possibly very very much over.
> 
> Is such an over estimate better or worse than QEMU's current
> under-estimate ? It depends on the POV.
> 
> From the POV of QEMU, over-estimating means means it'll be not
> be throttling as much as it should. That's not a downside of
> migration - it makes it more likely for migration to complete :-)

Heh. :)

> 
> From the POV of non-QEMU apps though, if QEMU over-estimates,
> it'll mean other apps get starved of network bandwidth.
> 
> Overall I agree, there's no obvious way QEMU can ever come up
> with a reliable estimate for bandwidth available.
> 
> > One way to fix this is when the user is fully aware of the available
> > bandwidth, then we can allow the user to help providing an accurate value.
> >
> > For example, if the user has a dedicated channel of 10Gbps for migration
> > for this specific VM, the user can specify this bandwidth so QEMU can
> > always do the calculation based on this fact, trusting the user as long as
> > specified.
> 
> I can see that in theory, but when considering a non-trivial
> deployments of QEMU, I wonder if the user can really have any
> such certainty of what is truely avaialble. It would need
> global awareness of the whole network of hosts & workloads.

Indeed it may or may not be easy always.

The good thing about this parameter is we always use the old estimation if
the user can't specify anything valid, so this is always optional not
required.

It solves the cases where the user can still specify accurately on the bw -
our QE team has already verified that it worked for us on GPU tests, where
it used to not be able to migrate at all with any sane downtime specified.
I should have attached a Tested-By from Zhiyi but since this is not exactly
the patch he was using I didn't.

> 
> > When the user wants to have migration only use 5Gbps out of that 10Gbps,
> > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 5Gbps
> > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for
> > other things).  So it can be useful even if the network is not dedicated,
> > but as long as the user can know a solid value.
> > 
> > A new parameter "available-bandwidth" is introduced just for this. So when
> > the user specified this parameter, instead of trusting the estimated value
> > from QEMU itself (based on the QEMUFile send speed), let's trust the user
> > more.
> 
> I feel like rather than "available-bandwidth", we should call
> it "max-convergance-bandwidth".
> 
> To me that name would better reflect the fact that this isn't
> really required to be a measure of how much NIC bandwidth is
> available. It is merely an expression of a different bandwidth
> limit to apply during switch over.
> 
> IOW
> 
> * max-bandwidth: limit during pre-copy main transfer
> * max-convergance-bandwidth: limit during pre-copy switch-over
> * max-postcopy-banwidth: limit during post-copy phase

I worry the new name suggested is not straightforward enough at the 1st
glance, even to me as a developer.

"available-bandwidth" doesn't even bind that value to "convergence" at all,
even though it was for solving this specific problem here. I wanted to make
this parameter sololy for the admin to answer the question "how much
bandwidth is available to QEMU migration in general?"  That's pretty much
straightforward IMHO.  With that, it's pretty sane to consider using all we
have during switchover (aka, unlimited bandwidth, as fast as possible).

Maybe at some point we can even leverage this information for other purpose
rather than making the migration converge.

Thanks,

-- 
Peter Xu


Reply via email to