Re: [PATCH RFC 2/2] migration: abort on destination if switchover limit exceeded

Joao Martins Wed, 26 Jun 2024 05:12:58 -0700

On 26/06/2024 12:34, Daniel P. Berrangé wrote:
> On Wed, Jun 26, 2024 at 12:29:41PM +0100, Joao Martins wrote:
>> On 25/06/2024 19:37, Daniel P. Berrangé wrote:
>>> On Tue, Jun 25, 2024 at 10:53:41AM -0400, Peter Xu wrote:
>>>> Then the question is how should we suggest the user to specify these two
>>>> parameters.
>>>>
>>>> The cover letter used:
>>>>
>>>>   migrate_set_parameter downtime-limit 300
>>>>   migrate_set_parameter switchover-limit 10
>>>
>>> What this means is that in practice the total downtime limit
>>> is 310 ms, however, expressing this as two parameters is
>>> incredibly inflexible.
>>>
>>> If the actual RAM transfer downtime only took 50 ms, then why
>>> should the switchover downtime still be limited to 10ms, when
>>> we've still got a budget of 250 ms that was unused.
>>>
>>
>> The downtime limit is 300, it's more than you are giving something *extra* 
>> 10ms
>> when you switchover regardless of where that's spent.
>>
>> If it makes it easier to understand you could see this parameter as:
>>
>> 'downtime-limit-max-error' = 10 ms
>>
>> The name as proposed by the RFC was meant to honor what the error margin was
>> meant for: to account for extra time during switchover. Adding this inside
>> downtime-limit wouldn't work as it otherwise would be used solely for RAM
>> transfer during precopy.
>>
>>> IOW, if my VM tolerates a downtime of 310ms, then I want that
>>> 310ms spread across the RAM transfer downtime and switchover
>>> downtime in *any* ratio. ALl that matters is the overall
>>> completion time.
>>>
>> That still happens with this patches, no specific budget is given to each.
> 
> If no specific budget is given to each, then IMHO adding the second
> parameter is pointless & misleading.


That is contradictory with your earlier statement.

You redacted the part where I describe how this works in *the worst case* if the
entire downtime-limit is used for RAM transfer then the switchover-limit might
*implicitly* act as an budget:

| Though implicitly if downtime-limit captures only RAM transfer, then in theory
| if you're migrating a busy guest that happens to meet the SLA say
| expected-downtime=290, then you have a total of 20 for switchover (thanks to
| the extra 10 used in switchover-limit/downtime-limit-max-error 10).

I am confused with what to make here. If budget is bad because any ratio should
be used if available, but then the added parameter doesn't care about ratios
specifically but *can* act as switchover ratio when RAM dominates
downtime-limit. But now no budget is associated is also bad ... then what's your
middle ground from your point of view to tackle switchover downtime being
somehow accounted?

Re: [PATCH RFC 2/2] migration: abort on destination if switchover limit exceeded

Reply via email to