[PATCH 0/5] migration: Downtime observability improvements

Joao Martins Tue, 26 Sep 2023 09:20:34 -0700

Hey,

The cost of switchover is usually not accounted in the migration
algorithm, as the migration algorithm reduces all of it to "pending
bytes" fitting a "threshold" (which represents some available or
proactively-measured link bandwidth) as the rule of thumb to calculate
downtime.


External latencies (OS, or Qemu ones), as well as when VFs are
present, may affect how big or small the switchover may be. Given the wide
range of configurations possible, it is either non exactly determinist or
predictable to have some generic rule to calculate the cost of switchover.

This series is aimed at improving observability what contributes to the
switchover/downtime particularly. The breakdown:

* The first 2 patches move storage of downtime timestamps to its dedicated
data structure, and then we add a couple key places to measure those
timestamps. 

* What we do with those timestamps is the next 2 patches by
calculating the downtime breakdown when asked for the data as well as
adding the tracepointt.

* Finally last patch provides introspection to the
calculated expected-downtime (pending_bytes vs threshold_size) which is
when we decide to switchover, and print that data when available to give
some comparison.

For now, mainly precopy data, and here I added both tracepoints and
QMP stats via query-migrate. Postcopy is still missing.

Thoughts, comments appreciated as usual.

Thanks!
        Joao

Joao Martins (5):
  migration: Store downtime timestamps in an array
  migration: Collect more timestamps during switchover
  migration: Add a tracepoint for the downtime stats
  migration: Provide QMP access to downtime stats
  migration: Print expected-downtime on completion

 qapi/migration.json    | 50 +++++++++++++++++++++++++
 migration/migration.h  |  7 +++-
 migration/migration.c  | 85 ++++++++++++++++++++++++++++++++++++++++--
 migration/savevm.c     |  2 +
 migration/trace-events |  1 +
 5 files changed, 139 insertions(+), 6 deletions(-)

-- 
2.39.3

[PATCH 0/5] migration: Downtime observability improvements

Reply via email to