On 26/06/2024 12:34, Daniel P. Berrangé wrote: > On Wed, Jun 26, 2024 at 12:29:41PM +0100, Joao Martins wrote: >> On 25/06/2024 19:37, Daniel P. Berrangé wrote: >>> On Tue, Jun 25, 2024 at 10:53:41AM -0400, Peter Xu wrote: >>>> Then the question is how should we suggest the user to specify these two >>>> parameters. >>>> >>>> The cover letter used: >>>> >>>> migrate_set_parameter downtime-limit 300 >>>> migrate_set_parameter switchover-limit 10 >>> >>> What this means is that in practice the total downtime limit >>> is 310 ms, however, expressing this as two parameters is >>> incredibly inflexible. >>> >>> If the actual RAM transfer downtime only took 50 ms, then why >>> should the switchover downtime still be limited to 10ms, when >>> we've still got a budget of 250 ms that was unused. >>> >> >> The downtime limit is 300, it's more than you are giving something *extra* >> 10ms >> when you switchover regardless of where that's spent. >> >> If it makes it easier to understand you could see this parameter as: >> >> 'downtime-limit-max-error' = 10 ms >> >> The name as proposed by the RFC was meant to honor what the error margin was >> meant for: to account for extra time during switchover. Adding this inside >> downtime-limit wouldn't work as it otherwise would be used solely for RAM >> transfer during precopy. >> >>> IOW, if my VM tolerates a downtime of 310ms, then I want that >>> 310ms spread across the RAM transfer downtime and switchover >>> downtime in *any* ratio. ALl that matters is the overall >>> completion time. >>> >> That still happens with this patches, no specific budget is given to each. > > If no specific budget is given to each, then IMHO adding the second > parameter is pointless & misleading.
That is contradictory with your earlier statement. You redacted the part where I describe how this works in *the worst case* if the entire downtime-limit is used for RAM transfer then the switchover-limit might *implicitly* act as an budget: | Though implicitly if downtime-limit captures only RAM transfer, then in theory | if you're migrating a busy guest that happens to meet the SLA say | expected-downtime=290, then you have a total of 20 for switchover (thanks to | the extra 10 used in switchover-limit/downtime-limit-max-error 10). I am confused with what to make here. If budget is bad because any ratio should be used if available, but then the added parameter doesn't care about ratios specifically but *can* act as switchover ratio when RAM dominates downtime-limit. But now no budget is associated is also bad ... then what's your middle ground from your point of view to tackle switchover downtime being somehow accounted?