[Qemu-devel] [PATCH v1 0/6] A migration performance testing framework

Daniel P. Berrange Thu, 05 May 2016 07:33:05 -0700

This series of patches provides a framework for testing migration performance
characteristics. The motivating factor for this is planning that is underway
in OpenStack wrt making use of QEMU migration features such as compression,
auto-converge and post-copy. The primary aim for OpenStack is to have Nova
autonomously manage migration features & tunables to maximise chances that
migration will complete. The problem faced is figuring out just which QEMU
migration features are "best" suited to our needs. This means we want data
on how well they are able to ensure completion of a migration, against the
host resources used and the impact on the guest workload performance.


The test framework produced here takes a pathelogical guest workload (every
CPU just burning 100% of time xor'ing every byte of guest memory with random
data). This is quite a pessimistic test because most guest workloads are not
giong to be this heavy on memory writes, and their data won't be uniformly
random and so will be able to compress better than this test does.

With this worst case guest, I have produced a set of tests using UNIX socket,
TCP localhost, TCP remote and RDMA remote socket transports, with both a
1 GB RAM + 1 CPU guest and a 8 GB RAM + 4 CPU guest.

The TCP/RDMA remote host tests were run over a 10-GiG-E network interface.

I have put the results online to view here:

  https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/

The charts here are showing two core sets of data:

 - The guest CPU performance. The left axis is showing the time in milliseconds
   required to xor 1 GB of memory. This is shown per-guest CPU and combined all
   CPUs.

 - The host CPU utilization. The right axis is showing the overall QEMU process
   CPU utilization, and the per-VCPU utilization.

Note that the charts are interactive - you can turn on/off each plot line and
zoom in by selecting regions on the chart.


Some interesting things that I have observed with this

 - At the start of each iteration of migration there is a distinct drop in
   guest CPU performance as shown by a spike in the guest CPU time lines.
   Performance would drop from 200ms/GB to 400ms/GB. Presumably this is
   related to QEMU recalculating the dirty bitmap for the guest RAM. See
   the spikes in the green line in:

    
https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/post-copy-bandwidth/post-copy-bw-1gbs.html

 - For the larger sized guests, the auto-converge code has to throttle the
   guest to as much as 90% or more before it is able to meet the 500ms max
   downtime value

    
https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/auto-converge-bandwidth/auto-converge-bw-1gbs.html

   Even then I often saw tests aborting as they hit the max number of
   iterations I permitted (30 iters max)

    
https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/auto-converge-bandwidth/auto-converge-bw-10gbs.html

 - MT compression is actively harmful to chances of successful migration when
   the guest RAM is not compression friendly. My work load is worst case since
   it is splattering RAM with totally random bytes. The MT compression is
   dramatically increasing the time for each iteration as we bottleneck on CPU
   compression speed, leaving the network largely idle. This causes migration
   which would have completed without compression, to fail. It also burns huge
   amounts of host CPU time

     
https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/compr-mt/compr-mt-threads-4.html

 - XBZRLE compression did not have as much of a CPU peformance penalty on the
   host as MT comprssion, but also did not help migration to actually complete.
   Again this is largely due to the workload being the worst case scenario with
   random bytes. The downside is obviously the potentially significant memory
   overhead on the host due to the cache sizing

    
https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/compr-xbzrle/compr-xbzrle-cache-50.html


 - Post-copy, by its very nature, obviously ensured that the migraton would
   complete. While post-copy was running in pre-copy mode there was a somewhat
   chaotic small impact on guest CPU performance, causing performance to
   periodically oscillate between 400ms/GB and 800ms/GB. This is less than
   the impact at the start of each migration iteration which was 1000ms/GB
   in this test. There was also a massive penalty at time of switchover from
   pre to post copy, as to be expected. The migration completed in post-copy
   phase quite quickly though. For this workload, number of iterations in
   pre-copy mode before switching to post-copy did not have much impact. I
   expect a less extreme workload would have shown more interesting results
   wrt number of iterations of pre-copy:

    
https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/post-copy-iters.html


Overall, if we're looking for a solution that can guarantee completion under
the most extreme guest workload, then only post-copy & autoconverge appear
upto the job.

The MT compression is seriously harmful to migration and has severe CPU
overhead. The XBZRLE compression is moderatly harmful to migration and has
potentilly severa memory overhead for large cache sizes to make it useful.

While auto-converge can ensure that guest migration completes, it has a
pretty significantly long term impact on guest CPU performance to achieve
this. ie the guest spends a long time in pre-copy mode with its CPUs very
dramatically throttled down. The level of throttling required makes one
wonder whether it is worth using, against simply pausing the guest workload.
The latter has a hard blackout period, but over a quite short time frame
if network speed is fast.

The post-copy code does have an impact on guest performance while in pre
copy mode, vs a plain migration. It also has a fairly high spike when in
post-copy mode, but this last for a pretty short time. As compared to
auto-converge, it is able to ensure the migration completes in a finite
time without having a prolonged impact on guest CPU performance. The
penalty during the post-copy phase is on a par with the penalty impose
by auto-converge when it has to throttle to 90%+.


Overall, in the context of a worst case guest workload, it appears that
post-copy is the clear winning strategy ensuring completion of migration
without imposing an long duration penalty on guest peformance. If the
risk of failure from post-copy is unacceptable then auto-converge is a
good fallback option, if the long duration guest CPU penalty can be
accepted.

The compression options are only worth using if the host has free CPU
resources, and the guest RAM is believed to be compression friendly,
as they steal significant CPU time away from guests in order to run
compression, often with a negative impact on migration completion
chances.

Looking at migration in general, even with a 10-GiG-E NIC and RDMA
transport it is possible for a single guest to provide a workload that
will saturate the network during migration & thus prevent completion.
Based on this, there is little point in attempting to run migrations
in parallel on the same host, unless multiple NICs are available,
as parallel migrations would reduce the chances of either one ever
completing. Better reliability & faster overall completion would
likely be achieved by fully serializing migration operations per
host.

There is clearly scope for more investigation here, in particular

 - Produce some alternative guest workloads that try to present
   a more "average" scenario workload, instead of the worst-case.
   These would likely allow compression to have some positive
   impact.

 - Try various combinations of strategies. For example, combining
   post-copy and auto-converge at the same time, or compression
   combined with either post-copy or auto-converge.

 - Investigate block migration performance too, with NBD migration
   server.

 - Investigate effect of dynamically changing max downtime value
   during migration, rather than using a fixed 500ms value.


Daniel P. Berrange (6):
  scripts: add __init__.py file to scripts/qmp/
  scripts: add a 'debug' parameter to QEMUMonitorProtocol
  scripts: refactor the VM class in iotests for reuse
  scripts: set timeout when waiting for qemu monitor connection
  scripts: ensure monitor socket has SO_REUSEADDR set
  tests: introduce a framework for testing migration performance

 configure                               |   2 +
 scripts/qemu.py                         | 202 +++++++++++
 scripts/qmp/__init__.py                 |   0
 scripts/qmp/qmp.py                      |  15 +-
 scripts/qtest.py                        |  34 ++
 tests/Makefile                          |  12 +
 tests/migration/.gitignore              |   2 +
 tests/migration/guestperf-batch.py      |  26 ++
 tests/migration/guestperf-plot.py       |  26 ++
 tests/migration/guestperf.py            |  27 ++
 tests/migration/guestperf/__init__.py   |   0
 tests/migration/guestperf/comparison.py | 124 +++++++
 tests/migration/guestperf/engine.py     | 439 ++++++++++++++++++++++
 tests/migration/guestperf/hardware.py   |  62 ++++
 tests/migration/guestperf/plot.py       | 623 ++++++++++++++++++++++++++++++++
 tests/migration/guestperf/progress.py   | 117 ++++++
 tests/migration/guestperf/report.py     |  98 +++++
 tests/migration/guestperf/scenario.py   |  95 +++++
 tests/migration/guestperf/shell.py      | 255 +++++++++++++
 tests/migration/guestperf/timings.py    |  55 +++
 tests/migration/stress.c                | 367 +++++++++++++++++++
 tests/qemu-iotests/iotests.py           | 135 +------
 22 files changed, 2583 insertions(+), 133 deletions(-)
 create mode 100644 scripts/qemu.py
 create mode 100644 scripts/qmp/__init__.py
 create mode 100644 tests/migration/.gitignore
 create mode 100755 tests/migration/guestperf-batch.py
 create mode 100755 tests/migration/guestperf-plot.py
 create mode 100755 tests/migration/guestperf.py
 create mode 100644 tests/migration/guestperf/__init__.py
 create mode 100644 tests/migration/guestperf/comparison.py
 create mode 100644 tests/migration/guestperf/engine.py
 create mode 100644 tests/migration/guestperf/hardware.py
 create mode 100644 tests/migration/guestperf/plot.py
 create mode 100644 tests/migration/guestperf/progress.py
 create mode 100644 tests/migration/guestperf/report.py
 create mode 100644 tests/migration/guestperf/scenario.py
 create mode 100644 tests/migration/guestperf/shell.py
 create mode 100644 tests/migration/guestperf/timings.py
 create mode 100644 tests/migration/stress.c

-- 
2.5.5

[Qemu-devel] [PATCH v1 0/6] A migration performance testing framework

Reply via email to