> This series of patches provides a framework for testing migration > performance characteristics. The motivating factor for this is planning that > is > underway in OpenStack wrt making use of QEMU migration features such as > compression, auto-converge and post-copy. The primary aim for OpenStack > is to have Nova autonomously manage migration features & tunables to > maximise chances that migration will complete. The problem faced is figuring > out just which QEMU migration features are "best" suited to our needs. This > means we want data on how well they are able to ensure completion of a > migration, against the host resources used and the impact on the guest > workload performance. > > The test framework produced here takes a pathelogical guest workload > (every CPU just burning 100% of time xor'ing every byte of guest memory > with random data). This is quite a pessimistic test because most guest > workloads are not giong to be this heavy on memory writes, and their data > won't be uniformly random and so will be able to compress better than this > test does. >
Wonderful test report! > With this worst case guest, I have produced a set of tests using UNIX socket, > TCP localhost, TCP remote and RDMA remote socket transports, with both a > 1 GB RAM + 1 CPU guest and a 8 GB RAM + 4 CPU guest. > > The TCP/RDMA remote host tests were run over a 10-GiG-E network > interface. > > I have put the results online to view here: > > https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/ > > The charts here are showing two core sets of data: > > - The guest CPU performance. The left axis is showing the time in > milliseconds > required to xor 1 GB of memory. This is shown per-guest CPU and > combined all > CPUs. > > - The host CPU utilization. The right axis is showing the overall QEMU > process > CPU utilization, and the per-VCPU utilization. > > Note that the charts are interactive - you can turn on/off each plot line and > zoom in by selecting regions on the chart. > > > Some interesting things that I have observed with this > > - At the start of each iteration of migration there is a distinct drop in > guest CPU performance as shown by a spike in the guest CPU time lines. > Performance would drop from 200ms/GB to 400ms/GB. Presumably this is > related to QEMU recalculating the dirty bitmap for the guest RAM. See > the spikes in the green line in: > > https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp- > remote-1gb-1cpu/post-copy-bandwidth/post-copy-bw-1gbs.html > > - For the larger sized guests, the auto-converge code has to throttle the > guest to as much as 90% or more before it is able to meet the 500ms max > downtime value > > https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp- > remote-1gb-1cpu/auto-converge-bandwidth/auto-converge-bw-1gbs.html > > Even then I often saw tests aborting as they hit the max number of > iterations I permitted (30 iters max) > > https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp- > remote-8gb-4cpu/auto-converge-bandwidth/auto-converge-bw-10gbs.html > > - MT compression is actively harmful to chances of successful migration > when > the guest RAM is not compression friendly. My work load is worst case > since > it is splattering RAM with totally random bytes. The MT compression is > dramatically increasing the time for each iteration as we bottleneck on CPU > compression speed, leaving the network largely idle. This causes migration > which would have completed without compression, to fail. It also burns > huge > amounts of host CPU time > > https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp- > remote-1gb-1cpu/compr-mt/compr-mt-threads-4.html > > - XBZRLE compression did not have as much of a CPU peformance penalty on > the > host as MT comprssion, but also did not help migration to actually > complete. > Again this is largely due to the workload being the worst case scenario > with > random bytes. The downside is obviously the potentially significant > memory > overhead on the host due to the cache sizing > > https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp- > remote-1gb-1cpu/compr-xbzrle/compr-xbzrle-cache-50.html > > > - Post-copy, by its very nature, obviously ensured that the migraton would > complete. While post-copy was running in pre-copy mode there was a > somewhat > chaotic small impact on guest CPU performance, causing performance to > periodically oscillate between 400ms/GB and 800ms/GB. This is less than > the impact at the start of each migration iteration which was 1000ms/GB > in this test. There was also a massive penalty at time of switchover from > pre to post copy, as to be expected. The migration completed in post-copy > phase quite quickly though. For this workload, number of iterations in > pre-copy mode before switching to post-copy did not have much impact. I > expect a less extreme workload would have shown more interesting results > wrt number of iterations of pre-copy: > > https://berrange.fedorapeople.org/qemu-mig-test-2016-05-05/tcp- > remote-8gb-4cpu/post-copy-iters.html > > > Overall, if we're looking for a solution that can guarantee completion under > the most extreme guest workload, then only post-copy & autoconverge > appear upto the job. > > The MT compression is seriously harmful to migration and has severe CPU > overhead. The XBZRLE compression is moderatly harmful to migration and > has potentilly severa memory overhead for large cache sizes to make it > useful. > > While auto-converge can ensure that guest migration completes, it has a > pretty significantly long term impact on guest CPU performance to achieve > this. ie the guest spends a long time in pre-copy mode with its CPUs very > dramatically throttled down. The level of throttling required makes one > wonder whether it is worth using, against simply pausing the guest workload. > The latter has a hard blackout period, but over a quite short time frame if > network speed is fast. > > The post-copy code does have an impact on guest performance while in pre > copy mode, vs a plain migration. It also has a fairly high spike when in post- > copy mode, but this last for a pretty short time. As compared to auto- > converge, it is able to ensure the migration completes in a finite time > without > having a prolonged impact on guest CPU performance. The penalty during > the post-copy phase is on a par with the penalty impose by auto-converge > when it has to throttle to 90%+. > > > Overall, in the context of a worst case guest workload, it appears that post- > copy is the clear winning strategy ensuring completion of migration without > imposing an long duration penalty on guest peformance. If the risk of failure > from post-copy is unacceptable then auto-converge is a good fallback option, > if the long duration guest CPU penalty can be accepted. > > The compression options are only worth using if the host has free CPU > resources, and the guest RAM is believed to be compression friendly, as they > steal significant CPU time away from guests in order to run compression, > often with a negative impact on migration completion chances. > MT compression should only be used when the network bandwidth is the bottle neck that effects live migration. Use other faster (de)compression algorithm can reduce the CPU overhead. Liang > Looking at migration in general, even with a 10-GiG-E NIC and RDMA > transport it is possible for a single guest to provide a workload that will > saturate the network during migration & thus prevent completion. > Based on this, there is little point in attempting to run migrations in > parallel > on the same host, unless multiple NICs are available, as parallel migrations > would reduce the chances of either one ever completing. Better reliability & > faster overall completion would likely be achieved by fully serializing > migration operations per host. > > There is clearly scope for more investigation here, in particular > > - Produce some alternative guest workloads that try to present > a more "average" scenario workload, instead of the worst-case. > These would likely allow compression to have some positive > impact. > > - Try various combinations of strategies. For example, combining > post-copy and auto-converge at the same time, or compression > combined with either post-copy or auto-converge. > > - Investigate block migration performance too, with NBD migration > server. > > - Investigate effect of dynamically changing max downtime value > during migration, rather than using a fixed 500ms value. > > > Daniel P. Berrange (6): > scripts: add __init__.py file to scripts/qmp/ > scripts: add a 'debug' parameter to QEMUMonitorProtocol > scripts: refactor the VM class in iotests for reuse > scripts: set timeout when waiting for qemu monitor connection > scripts: ensure monitor socket has SO_REUSEADDR set > tests: introduce a framework for testing migration performance > > configure | 2 + > scripts/qemu.py | 202 +++++++++++ > scripts/qmp/__init__.py | 0 > scripts/qmp/qmp.py | 15 +- > scripts/qtest.py | 34 ++ > tests/Makefile | 12 + > tests/migration/.gitignore | 2 + > tests/migration/guestperf-batch.py | 26 ++ > tests/migration/guestperf-plot.py | 26 ++ > tests/migration/guestperf.py | 27 ++ > tests/migration/guestperf/__init__.py | 0 > tests/migration/guestperf/comparison.py | 124 +++++++ > tests/migration/guestperf/engine.py | 439 ++++++++++++++++++++++ > tests/migration/guestperf/hardware.py | 62 ++++ > tests/migration/guestperf/plot.py | 623 > ++++++++++++++++++++++++++++++++ > tests/migration/guestperf/progress.py | 117 ++++++ > tests/migration/guestperf/report.py | 98 +++++ > tests/migration/guestperf/scenario.py | 95 +++++ > tests/migration/guestperf/shell.py | 255 +++++++++++++ > tests/migration/guestperf/timings.py | 55 +++ > tests/migration/stress.c | 367 +++++++++++++++++++ > tests/qemu-iotests/iotests.py | 135 +------ > 22 files changed, 2583 insertions(+), 133 deletions(-) create mode 100644 > scripts/qemu.py create mode 100644 scripts/qmp/__init__.py create mode > 100644 tests/migration/.gitignore create mode 100755 > tests/migration/guestperf-batch.py > create mode 100755 tests/migration/guestperf-plot.py create mode 100755 > tests/migration/guestperf.py create mode 100644 > tests/migration/guestperf/__init__.py > create mode 100644 tests/migration/guestperf/comparison.py > create mode 100644 tests/migration/guestperf/engine.py > create mode 100644 tests/migration/guestperf/hardware.py > create mode 100644 tests/migration/guestperf/plot.py create mode 100644 > tests/migration/guestperf/progress.py > create mode 100644 tests/migration/guestperf/report.py > create mode 100644 tests/migration/guestperf/scenario.py > create mode 100644 tests/migration/guestperf/shell.py > create mode 100644 tests/migration/guestperf/timings.py > create mode 100644 tests/migration/stress.c > > -- > 2.5.5 >