This series of patches provides a framework for testing migration performance characteristics. The motivating factor for this is planning that is underway in OpenStack wrt making use of QEMU migration features such as compression, auto-converge and post-copy. The primary aim for OpenStack is to have Nova autonomously manage migration features & tunables to maximise chances that migration will complete. The problem faced is figuring out just which QEMU migration features are "best" suited to our needs. This means we want data on how well they are able to ensure completion of a migration, against the host resources used and the impact on the guest workload performance.
The test framework produced here takes a pathelogical guest workload (every CPU just burning 100% of time xor'ing every byte of guest memory with random data). This is quite a pessimistic test because most guest workloads are not giong to be this heavy on memory writes, and their data won't be uniformly random and so will be able to compress better than this test does. With this worst case guest, I have produced a set of tests using UNIX socket, TCP localhost, TCP remote and RDMA remote socket transports, with both a 1 GB RAM + 1 CPU guest and a 8 GB RAM + 4 CPU guest. The TCP/RDMA remote host tests were run over a 10-GiG-E network interface. I have put the results online to view here: https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/ The charts here are showing two core sets of data: - The guest CPU performance. The left axis is showing the time in milliseconds required to xor 1 GB of memory. This is shown per-guest CPU and combined all CPUs. - The host CPU utilization. The right axis is showing the overall QEMU process CPU utilization, and the per-VCPU utilization. Note that the charts are interactive - you can turn on/off each plot line and zoom in by selecting regions on the chart. Some interesting things that I have observed with this - At the start of each iteration of migration there is a distinct drop in guest CPU performance as shown by a spike in the guest CPU time lines. Performance would drop from 200ms/GB to 400ms/GB. Presumably this is related to QEMU recalculating the dirty bitmap for the guest RAM. See the spikes in the green line in: https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/post-copy-bandwidth/post-copy-bw-1gbs.html - For the larger sized guests, the auto-converge code has to throttle the guest to as much as 90% or more before it is able to meet the 500ms max downtime value https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/auto-converge-bandwidth/auto-converge-bw-1gbs.html Even then I often saw tests aborting as they hit the max number of iterations I permitted (30 iters max) https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/auto-converge-bandwidth/auto-converge-bw-10gbs.html - MT compression is actively harmful to chances of successful migration when the guest RAM is not compression friendly. My work load is worst case since it is splattering RAM with totally random bytes. The MT compression is dramatically increasing the time for each iteration as we bottleneck on CPU compression speed, leaving the network largely idle. This causes migration which would have completed without compression, to fail. It also burns huge amounts of host CPU time https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/compr-mt/compr-mt-threads-4.html - XBZRLE compression did not have as much of a CPU peformance penalty on the host as MT comprssion, but also did not help migration to actually complete. Again this is largely due to the workload being the worst case scenario with random bytes. The downside is obviously the potentially significant memory overhead on the host due to the cache sizing https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-1gb-1cpu/compr-xbzrle/compr-xbzrle-cache-50.html - Post-copy, by its very nature, obviously ensured that the migraton would complete. While post-copy was running in pre-copy mode there was a somewhat chaotic small impact on guest CPU performance, causing performance to periodically oscillate between 400ms/GB and 800ms/GB. This is less than the impact at the start of each migration iteration which was 1000ms/GB in this test. There was also a massive penalty at time of switchover from pre to post copy, as to be expected. The migration completed in post-copy phase quite quickly though. For this workload, number of iterations in pre-copy mode before switching to post-copy did not have much impact. I expect a less extreme workload would have shown more interesting results wrt number of iterations of pre-copy: https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp-remote-8gb-4cpu/post-copy-iters.html Overall, if we're looking for a solution that can guarantee completion under the most extreme guest workload, then only post-copy & autoconverge appear upto the job. The MT compression is seriously harmful to migration and has severe CPU overhead. The XBZRLE compression is moderatly harmful to migration and has potentilly severa memory overhead for large cache sizes to make it useful. While auto-converge can ensure that guest migration completes, it has a pretty significantly long term impact on guest CPU performance to achieve this. ie the guest spends a long time in pre-copy mode with its CPUs very dramatically throttled down. The level of throttling required makes one wonder whether it is worth using, against simply pausing the guest workload. The latter has a hard blackout period, but over a quite short time frame if network speed is fast. The post-copy code does have an impact on guest performance while in pre copy mode, vs a plain migration. It also has a fairly high spike when in post-copy mode, but this last for a pretty short time. As compared to auto-converge, it is able to ensure the migration completes in a finite time without having a prolonged impact on guest CPU performance. The penalty during the post-copy phase is on a par with the penalty impose by auto-converge when it has to throttle to 90%+. Overall, in the context of a worst case guest workload, it appears that post-copy is the clear winning strategy ensuring completion of migration without imposing an long duration penalty on guest peformance. If the risk of failure from post-copy is unacceptable then auto-converge is a good fallback option, if the long duration guest CPU penalty can be accepted. The compression options are only worth using if the host has free CPU resources, and the guest RAM is believed to be compression friendly, as they steal significant CPU time away from guests in order to run compression, often with a negative impact on migration completion chances. Looking at migration in general, even with a 10-GiG-E NIC and RDMA transport it is possible for a single guest to provide a workload that will saturate the network during migration & thus prevent completion. Based on this, there is little point in attempting to run migrations in parallel on the same host, unless multiple NICs are available, as parallel migrations would reduce the chances of either one ever completing. Better reliability & faster overall completion would likely be achieved by fully serializing migration operations per host. There is clearly scope for more investigation here, in particular - Produce some alternative guest workloads that try to present a more "average" scenario workload, instead of the worst-case. These would likely allow compression to have some positive impact. - Try various combinations of strategies. For example, combining post-copy and auto-converge at the same time, or compression combined with either post-copy or auto-converge. - Investigate block migration performance too, with NBD migration server. - Investigate effect of dynamically changing max downtime value during migration, rather than using a fixed 500ms value. Daniel P. Berrange (6): scripts: add __init__.py file to scripts/qmp/ scripts: add a 'debug' parameter to QEMUMonitorProtocol scripts: refactor the VM class in iotests for reuse scripts: set timeout when waiting for qemu monitor connection scripts: ensure monitor socket has SO_REUSEADDR set tests: introduce a framework for testing migration performance configure | 2 + scripts/qemu.py | 202 +++++++++++ scripts/qmp/__init__.py | 0 scripts/qmp/qmp.py | 15 +- scripts/qtest.py | 34 ++ tests/Makefile | 12 + tests/migration/.gitignore | 2 + tests/migration/guestperf-batch.py | 26 ++ tests/migration/guestperf-plot.py | 26 ++ tests/migration/guestperf.py | 27 ++ tests/migration/guestperf/__init__.py | 0 tests/migration/guestperf/comparison.py | 124 +++++++ tests/migration/guestperf/engine.py | 439 ++++++++++++++++++++++ tests/migration/guestperf/hardware.py | 62 ++++ tests/migration/guestperf/plot.py | 623 ++++++++++++++++++++++++++++++++ tests/migration/guestperf/progress.py | 117 ++++++ tests/migration/guestperf/report.py | 98 +++++ tests/migration/guestperf/scenario.py | 95 +++++ tests/migration/guestperf/shell.py | 255 +++++++++++++ tests/migration/guestperf/timings.py | 55 +++ tests/migration/stress.c | 367 +++++++++++++++++++ tests/qemu-iotests/iotests.py | 135 +------ 22 files changed, 2583 insertions(+), 133 deletions(-) create mode 100644 scripts/qemu.py create mode 100644 scripts/qmp/__init__.py create mode 100644 tests/migration/.gitignore create mode 100755 tests/migration/guestperf-batch.py create mode 100755 tests/migration/guestperf-plot.py create mode 100755 tests/migration/guestperf.py create mode 100644 tests/migration/guestperf/__init__.py create mode 100644 tests/migration/guestperf/comparison.py create mode 100644 tests/migration/guestperf/engine.py create mode 100644 tests/migration/guestperf/hardware.py create mode 100644 tests/migration/guestperf/plot.py create mode 100644 tests/migration/guestperf/progress.py create mode 100644 tests/migration/guestperf/report.py create mode 100644 tests/migration/guestperf/scenario.py create mode 100644 tests/migration/guestperf/shell.py create mode 100644 tests/migration/guestperf/timings.py create mode 100644 tests/migration/stress.c -- 2.5.5