Hello This RFC patchset introduces strict downtime SLA for live migration by restricting how long switchover phase can take and aborts live migration if this exceeded.
Various consumers of VFIO Live Migration are bound checks on how long the switchover process lasts. Some things are not accounted for and are unbounded, such as: - Time to quiesce/resume the VF - Time to save/resume all system state - How fast we can save/restore VF state These cases lead to the final downtime being larger than what was configured in by setting a downtime limit. In some applications it is important to observe the requested downtime and re-try live migration some other time if the downtime requirements cannot be satisfied. This patchset introduces capability to abort live migration if the downtime exceeds a certain value specified by switchover limit migration parameter. When a guest stops at the source, measure the downtime and if it exceeds a threshold we cancel the migration and resume the guest. The destination is being notified of the source downtime and its threshold and starts measuring downtime. Destination will cancel live migration if downtime exceeds the swithover limit. The migration with this capability would be used this way for example: migrate_set_capability return-path on migrate_set_capability switchover-abort on migrate_set_parameter downtime-limit 300 migrate_set_parameter switchover-limit 10 The migration will be aborted if the downtime exceeds 10ms (switchover-limit) and total downtime would not be more than 310ms. Please send your comments and recommendations. The patchset idea originally comes from Joao Martins <joao.m.mart...@oracle.com>. Elena Ufimtseva (2): migration: abort when switchover limit exceeded migration: abort on destination if switchover limit exceeded hw/core/machine.c | 1 + include/migration/client-options.h | 1 + migration/migration-hmp-cmds.c | 10 ++++ migration/migration.c | 41 +++++++++++++++ migration/migration.h | 20 ++++++++ migration/options.c | 56 +++++++++++++++++++++ migration/options.h | 1 + migration/savevm.c | 81 ++++++++++++++++++++++++++++++ migration/savevm.h | 2 + migration/trace-events | 3 ++ qapi/migration.json | 27 ++++++++-- 11 files changed, 239 insertions(+), 4 deletions(-) -- 2.34.1