On 25/06/2024 20:03, Peter Xu wrote:
> On Tue, Jun 25, 2024 at 05:31:19PM +0100, Joao Martins wrote:
>> The device-state multifd scaling is a take on improving switchover phase,
>> and we will keep improving it whenever we find things... but the
> 
> That'll be helpful, thanks.  Just a quick note that "reducing downtime" is
> a separate issue comparing to "make downtime_limit accurate".
> 
I see those two separately too; it's just that right now that's the only work I
know we can make it better is decreasing/optimizing it (other lines of work
doing similar stuff inside vDPA too, by Si-Wei for example). Making
downtime_limit accurate not so sure what it entails right now from your PoV. But
it depends on what this question really was about, see at the end in case I am
understanding you correctly.

>> switchover itself can't be 'precomputed' into a downtime number equation
>> ahead of time to encompass all possible latencies/costs. Part of the
>> reason that at least we couldn't think of a way besides this proposal
>> here, which at the core it's meant to bounds check switchover. Even
>> without taking into account VFs/HW[0], it is simply not considered how
>> long it might take and giving some sort of downtime buffer coupled with
>> enforcement that can be enforced helps not violating migration SLAs.
> 
> I agree such enforcement alone can be useful in general to be able to
> fallback.  Said that, I think it would definitely be nice to attach more
> information on the downtime analysis when reposting this series, if there
> is any.
> 
> For example, irrelevant of whether QEMU can do proper predictions at all,
> there can be data / results to show what is the major parts that are
> missing besides the current calculations, aka an expectation on when the
> fallback can trigger, and some justification on why they can't be
> predicted.
> 

/me nods -- I think this might be a gap in the current cover letter.

I recall we have looked at quite a few downtime traces (thanks to the tracing
improvements made in the last dev cycle!), but it's also easy to reproduce these
problems with downtime-limit even without past data with relatively simple 
configs.

> IMHO the enforcement won't make much sense if it keeps triggering, in that
> case people will simply not use it as it stops migrations from happening.

Right -- The enforcement *alone* damages more than it fixes. Meaning enforcing
without having some way to give some headroom within downtime-limit for
switchover to be accounted. The latter is what allows the enforcement to be
placed, otherwise we would just be failing migrations left and right.

> Ultimately the work will still be needed to make downtime_limit accurate.
> The fallback should only be an last fence to guard the promise which should
> be the "corner cases".
> 

Are you thinking in something specifically?

Many "variables" affect this from the point we decide switchover, and at the
worst (likely) case it means having qemu subsystems declare empirical values on
how long it takes to suspend/resume/transfer-state to migration expected
downtime prediction equation. Part of the reason that having headroom within
downtime-limit was a simple 'catch-all' (from our PoV) in terms of
maintainability while giving user something to fallback for characterizing its
SLA. Personally, I think there's a tiny bit disconnect between what the user
desires when setting downtime-limit vs what it really does. downtime-limit right
now looks to be best viewed as 'precopy-ram-downtime-limit' :)

Unless the accuracy work you're thinking is just having a better migration
algorithm at obtaining the best possible downtime for outstanding-data/RAM *even
if* downtime-limit is set at a high limit, like giving 1) a grace period in the
beginning of migration post first dirty sync or 2) a measured value with
continually incrementing target downtime limit until max downtime-limit set by
user hits ... before defaulting to the current behaviour of migrating as soon as
expected downtime is within the downtime-limit. As discussed in the last
response, this could create the 'downtime headroom' for getting the
enforcement/SLA better honored. Is this maybe your line of thinking?

Reply via email to