On 26/06/2024 12:34, Daniel P. Berrangé wrote:
> On Wed, Jun 26, 2024 at 12:29:41PM +0100, Joao Martins wrote:
>> On 25/06/2024 19:37, Daniel P. Berrangé wrote:
>>> On Tue, Jun 25, 2024 at 10:53:41AM -0400, Peter Xu wrote:
>>>> Then the question is how should we suggest the user to specify these two
>>>> parameters.
>>>>
>>>> The cover letter used:
>>>>
>>>>   migrate_set_parameter downtime-limit 300
>>>>   migrate_set_parameter switchover-limit 10
>>>
>>> What this means is that in practice the total downtime limit
>>> is 310 ms, however, expressing this as two parameters is
>>> incredibly inflexible.
>>>
>>> If the actual RAM transfer downtime only took 50 ms, then why
>>> should the switchover downtime still be limited to 10ms, when
>>> we've still got a budget of 250 ms that was unused.
>>>
>>
>> The downtime limit is 300, it's more than you are giving something *extra* 
>> 10ms
>> when you switchover regardless of where that's spent.
>>
>> If it makes it easier to understand you could see this parameter as:
>>
>> 'downtime-limit-max-error' = 10 ms
>>
>> The name as proposed by the RFC was meant to honor what the error margin was
>> meant for: to account for extra time during switchover. Adding this inside
>> downtime-limit wouldn't work as it otherwise would be used solely for RAM
>> transfer during precopy.
>>
>>> IOW, if my VM tolerates a downtime of 310ms, then I want that
>>> 310ms spread across the RAM transfer downtime and switchover
>>> downtime in *any* ratio. ALl that matters is the overall
>>> completion time.
>>>
>> That still happens with this patches, no specific budget is given to each.
> 
> If no specific budget is given to each, then IMHO adding the second
> parameter is pointless & misleading. 

That is contradictory with your earlier statement.

You redacted the part where I describe how this works in *the worst case* if the
entire downtime-limit is used for RAM transfer then the switchover-limit might
*implicitly* act as an budget:

| Though implicitly if downtime-limit captures only RAM transfer, then in theory
| if you're migrating a busy guest that happens to meet the SLA say
| expected-downtime=290, then you have a total of 20 for switchover (thanks to
| the extra 10 used in switchover-limit/downtime-limit-max-error 10).

I am confused with what to make here. If budget is bad because any ratio should
be used if available, but then the added parameter doesn't care about ratios
specifically but *can* act as switchover ratio when RAM dominates
downtime-limit. But now no budget is associated is also bad ... then what's your
middle ground from your point of view to tackle switchover downtime being
somehow accounted?

Reply via email to