From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> Hi, This is a 2nd cut of my postcopy implementation; it fixes up some of the comments from the 1st posting but it also fixes a lot of bugs, (other comments from the 1st round will be fixed later).
The commands to start a postcopy migration have changed to: migrate_set_capability x-postcopy-ram on migrate -d tcp:whereever <some time later> migrate_start_postcopy Starting the destination in paused mode (-s) now works. This now survives in a test harness with a few hundred migrations without problems (both on heavily loaded and idle VMs). Other changes: Changed source rp handler to coroutine (from fd_set_handler) - cures occasional loss of requests (and hence huge latency spikes) Added 'SHUT' rp command as a handshake from the destination that it's finished Fixed race during migration of guest with very little page modification Shutdown the fault-thread cleanly Update 'received' map in ADVISE state Turned off nagling on rp - we want our requests to get their ASAP Fixed seg when attempting to postcopy to a stream that couldn't get a return path Got rid of the move of QEMUFile into the .h Fixed up '-ve' and title of assert patch Removed a couple of return-path opening optimisations as per Paolo's comments Note this version needs the QEMUSizedBuffer/QEMUFile patches that I've posted separately. Current TODO: 1) It's not bisectable yet 2) There are no testsuite additions (although I have a virt-test modification I've been using). 3) End-of-migration cleanup is much better than it was but still needs some work. 4) Not all the code is there for systems with hostpagesize!=qemupagesize 5) xbzrle needs disabling once in postcopy 6) RDMA needs some rework 7) The latency measurements are now pretty consistent, no very large spikes, but they're a bit higher than expected, I need to look at rate limiting just the background scan. 8) Conversion of return-path to a process and blocking fd needs investigation (as per discussion with Paolo) 9) Andrea has suggestions on ways to avoid some of the huge-page splitting that occurs during the discard phase after precopy. 10) I'd like to format the data on the return path in a more structured way (i.e. maybe using stuff from my BER world). 11) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than the source feels like it needs looking at for postcopy. Dave Dr. David Alan Gilbert (43): qemu_ram_foreach_block: pass up error value, and down the ramblock name improve DPRINTF macros, add to savevm Add qemu_get_counted_string to read a string prefixed by a count byte Create MigrationIncomingState Return path: socket_writev_buffer: Block even on non-blocking fd's Migration commands Return path: Control commands Return path: Send responses from destination to source Return path: Source handling of return path qemu_loadvm errors and debug ram_debug_dump_bitmap: Dump a migration bitmap as text Rework loadvm path for subloops Add migration-capability boolean for postcopy-ram. Add wrappers and handlers for sending/receiving the postcopy-ram migration messages. QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream migrate_init: Call from savevm Allow savevm handlers to state whether they could go into postcopy postcopy: OS support test migrate_start_postcopy: Command to trigger transition to postcopy MIG_STATE_POSTCOPY_ACTIVE: Add new migration state qemu_savevm_state_complete: Postcopy changes Postcopy: Maintain sentmap during postcopy pre phase Postcopy page-map-incoming (PMI) structure postcopy: Add incoming_init/cleanup functions postcopy: Incoming initialisation postcopy: ram_enable_notify to switch on userfault Postcopy: postcopy_start Postcopy: Rework migration thread for postcopy mode mig fd_connect: open return path Postcopy: Create a fault handler thread before marking the ram as userfault Page request: Add MIG_RPCOMM_REQPAGES reverse command Page request: Process incoming page request Page request: Consume pages off the post-copy queue Add assertion to check migration_dirty_pages postcopy_ram.c: place_page and helpers Postcopy: Use helpers to map pages during migration qemu_ram_block_from_host Postcopy; Handle userfault requests Start up a postcopy/listener thread ready for incoming page data postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands End of migration for postcopy Start documenting how postcopy works. Makefile.objs | 2 +- arch_init.c | 490 +++++++++++++++++++-- docs/migration.txt | 150 +++++++ exec.c | 66 ++- hmp-commands.hx | 15 + hmp.c | 7 + hmp.h | 1 + include/exec/cpu-common.h | 8 +- include/migration/migration.h | 128 ++++++ include/migration/postcopy-ram.h | 89 ++++ include/migration/qemu-file.h | 9 + include/migration/vmstate.h | 2 +- include/qemu/typedefs.h | 7 +- include/sysemu/sysemu.h | 41 +- migration-rdma.c | 4 +- migration.c | 667 ++++++++++++++++++++++++++-- postcopy-ram.c | 915 +++++++++++++++++++++++++++++++++++++++ qapi-schema.json | 14 +- qemu-file.c | 124 +++++- qmp-commands.hx | 19 + savevm.c | 854 +++++++++++++++++++++++++++++++++--- 21 files changed, 3473 insertions(+), 139 deletions(-) create mode 100644 include/migration/postcopy-ram.h create mode 100644 postcopy-ram.c -- 1.9.3