Re: [PATCH RFC 2/2] migration: abort on destination if switchover limit exceeded

Daniel P . Berrangé Wed, 26 Jun 2024 05:21:26 -0700

On Wed, Jun 26, 2024 at 01:12:15PM +0100, Joao Martins wrote:
> On 26/06/2024 12:34, Daniel P. Berrangé wrote:
> > On Wed, Jun 26, 2024 at 12:29:41PM +0100, Joao Martins wrote:
> >> On 25/06/2024 19:37, Daniel P. Berrangé wrote:
> >>> On Tue, Jun 25, 2024 at 10:53:41AM -0400, Peter Xu wrote:
> >>>> Then the question is how should we suggest the user to specify these two
> >>>> parameters.
> >>>>
> >>>> The cover letter used:
> >>>>
> >>>>   migrate_set_parameter downtime-limit 300
> >>>>   migrate_set_parameter switchover-limit 10
> >>>
> >>> What this means is that in practice the total downtime limit
> >>> is 310 ms, however, expressing this as two parameters is
> >>> incredibly inflexible.
> >>>
> >>> If the actual RAM transfer downtime only took 50 ms, then why
> >>> should the switchover downtime still be limited to 10ms, when
> >>> we've still got a budget of 250 ms that was unused.
> >>>
> >>
> >> The downtime limit is 300, it's more than you are giving something *extra* 
> >> 10ms
> >> when you switchover regardless of where that's spent.
> >>
> >> If it makes it easier to understand you could see this parameter as:
> >>
> >> 'downtime-limit-max-error' = 10 ms
> >>
> >> The name as proposed by the RFC was meant to honor what the error margin 
> >> was
> >> meant for: to account for extra time during switchover. Adding this inside
> >> downtime-limit wouldn't work as it otherwise would be used solely for RAM
> >> transfer during precopy.
> >>
> >>> IOW, if my VM tolerates a downtime of 310ms, then I want that
> >>> 310ms spread across the RAM transfer downtime and switchover
> >>> downtime in *any* ratio. ALl that matters is the overall
> >>> completion time.
> >>>
> >> That still happens with this patches, no specific budget is given to each.
> > 
> > If no specific budget is given to each, then IMHO adding the second
> > parameter is pointless & misleading. 
> 
> That is contradictory with your earlier statement.


I don't think it is.

> You redacted the part where I describe how this works in *the worst case* if 
> the
> entire downtime-limit is used for RAM transfer then the switchover-limit might
> *implicitly* act as an budget:
> 
> | Though implicitly if downtime-limit captures only RAM transfer, then in 
> theory
> | if you're migrating a busy guest that happens to meet the SLA say
> | expected-downtime=290, then you have a total of 20 for switchover (thanks to
> | the extra 10 used in switchover-limit/downtime-limit-max-error 10).
> 
> I am confused with what to make here. If budget is bad because any ratio 
> should
> be used if available, but then the added parameter doesn't care about ratios
> specifically but *can* act as switchover ratio when RAM dominates
> downtime-limit. But now no budget is associated is also bad ... then what's 
> your
> middle ground from your point of view to tackle switchover downtime being
> somehow accounted?

The pre-existing 'downtime-limit' value should apply to anything that
happens between src CPUs stopping, and dst CPUs starting. If we were
not correctly accounting for some parts, we just need to fix that
accounting, without adding extra time parameters.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH RFC 2/2] migration: abort on destination if switchover limit exceeded

Reply via email to