On 07 Oct 2014, at 17:12 , Dr. David Alan Gilbert <dgilb...@redhat.com> wrote:
> * Cristian Klein (cristian.kl...@cs.umu.se) wrote: >> On 04 Oct 2014, at 4:21 , Dr. David Alan Gilbert <dgilb...@redhat.com> wrote: >> >>> >>> I've updated our github at: >>> https://github.com/orbitfp7/qemu/tree/wp3-postcopy >>> >>> to have this version. >>> >>> and it corresponds to the tag: >>> https://github.com/orbitfp7/qemu/releases/tag/wp3-postcopy-v4 >> >> Hi Dave, >> >> I just tested this version of post-copy using the libvirt patches I recently >> posted and it works a lot better. The video streaming VM migrates with a >> downtime of less than 1 second. Before post-copy finishes, the VM is a bit >> slow but otherwise running well. >> >> I also tested the patches with a VM doing ?ping? and the downtime was around >> 0.6 seconds. I suspect that this delay could be caused by libvirt and not by >> qemu. Notice that, libvirt is a bit special, in the sense that the VM is >> migrated in suspended state and resumed only after the network was set up on >> the destination. I will investigate and let you know. > > That's great news - although I'm not quite sure what caused the improvement, > there > were quite a few minor bug fixes and things but nothing that I can think of > that > would directly contribute (except the patches I'd sent you which you'd > already tried). Unfortunately, I made an error in my experiments (post-copy started too late). I re-launched the experiments a few times. A ping VM observes a downtime of about 2 seconds, whereas a video streaming VM of about 4 seconds. Cristian >> >>> * Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote: >>>> From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> >>>> >>>> Hi, >>>> This is the 4th cut of my version of postcopy; it is designed for use with >>>> the Linux kernel additions just posted by Andrea Arcangeli here: >>>> >>>> http://marc.info/?l=linux-kernel&m=141235633015100&w=2 >>>> >>>> (Note: This is a new version compared to my previous postcopy patchset; >>>> you'll >>>> need to update the kernel to the new version.) >>>> >>>> Other than the new kernel ABI (which is only a small change to the >>>> userspace side); >>>> the major changes are; >>>> >>>> a) Code for host page size != target page size >>>> b) Support for migration over fd >>>> From Cristian Klein; this is for libvirt support which Cristian recently >>>> posted to the libvirt list. >>>> c) It's now build bisectable and builds on 32bit >>>> >>>> Testing wise; I've now done many thousand of postcopy migrations without >>>> failure (both of idle and busy guests); so it seems pretty solid. >>>> >>>> Must-TODO's: >>>> 1) A partially repeatable migration_cancel failure >>>> 2) virt_test's migrate.with_reboot test is failing >>>> 3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than >>>> the source feels like it needs looking at for postcopy. >>>> 4) Paolo's comments with respect to the wakeup_request/is_running code >>>> in the migration thread >>>> 5) xbzrle needs disabling once in postcopy >>>> >>>> Later-TODO's: >>>> 1) Control the rate of background page transfers during postcopy to >>>> reduce their impact on the latency of postcopy requests. >>>> 2) Work with RDMA >>>> 3) Could destination RP be made blocking (as per discussion with Paolo; >>>> I'm still worried that that changes too many assumptions) >>>> >>>> >>>> >>>> V4: >>>> Initial support for host page size != target page size >>>> - tested heavily on hps==tps >>>> - only partially tested on hps!=tps systems >>>> - This involved quite a bit of rework around the discard code >>>> Updated to new kernel userfault ABI >>>> - It won't work with the previous version >>>> Fix mis-optimisation of postcopy request for wrong RAMBlock >>>> request for block A offset n >>>> un-needed fault for block B/m (already received - no req sent) >>>> request for block B/l - wrongly sent as request for A/l >>>> Fix thinko in discard bitmap processing (missed last word of bitmap) >>>> Symptom: remap failures near the top of RAM if postcopy started late >>>> Fix bug that caused kernel page acknowledgments to be misaligned >>>> May have meant the guest was paused for longer than required >>>> Fix potential for crashing cleaning up failed RP >>>> Fixes in docs (from Yang) >>>> Handle migration by fd as sockets if they are sockets >>>> Build tested on 32bit >>>> Fully build bisectable (x86-64) >>>> >>>> >>>> Dave >>>> >>>> Cristian Klein (1): >>>> Handle bi-directional communication for fd migration >>>> >>>> Dr. David Alan Gilbert (46): >>>> QEMUSizedBuffer based QEMUFile >>>> Tests: QEMUSizedBuffer/QEMUBuffer >>>> Start documenting how postcopy works. >>>> qemu_ram_foreach_block: pass up error value, and down the ramblock >>>> name >>>> improve DPRINTF macros, add to savevm >>>> Add qemu_get_counted_string to read a string prefixed by a count byte >>>> Create MigrationIncomingState >>>> socket shutdown >>>> Provide runtime Target page information >>>> Return path: Open a return path on QEMUFile for sockets >>>> Return path: socket_writev_buffer: Block even on non-blocking fd's >>>> Migration commands >>>> Return path: Control commands >>>> Return path: Send responses from destination to source >>>> Return path: Source handling of return path >>>> qemu_loadvm errors and debug >>>> ram_debug_dump_bitmap: Dump a migration bitmap as text >>>> Rework loadvm path for subloops >>>> Add migration-capability boolean for postcopy-ram. >>>> Add wrappers and handlers for sending/receiving the postcopy-ram >>>> migration messages. >>>> QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream >>>> migrate_init: Call from savevm >>>> Allow savevm handlers to state whether they could go into postcopy >>>> postcopy: OS support test >>>> migrate_start_postcopy: Command to trigger transition to postcopy >>>> MIG_STATE_POSTCOPY_ACTIVE: Add new migration state >>>> qemu_savevm_state_complete: Postcopy changes >>>> Postcopy page-map-incoming (PMI) structure >>>> Postcopy: Maintain sentmap and calculate discard >>>> postcopy: Incoming initialisation >>>> postcopy: ram_enable_notify to switch on userfault >>>> Postcopy: Postcopy startup in migration thread >>>> Postcopy: Create a fault handler thread before marking the ram as >>>> userfault >>>> Page request: Add MIG_RPCOMM_REQPAGES reverse command >>>> Page request: Process incoming page request >>>> Page request: Consume pages off the post-copy queue >>>> Add assertion to check migration_dirty_pages >>>> postcopy_ram.c: place_page and helpers >>>> Postcopy: Use helpers to map pages during migration >>>> qemu_ram_block_from_host >>>> Don't sync dirty bitmaps in postcopy >>>> Host page!=target page: Cleanup bitmaps >>>> Postcopy; Handle userfault requests >>>> Start up a postcopy/listener thread ready for incoming page data >>>> postcopy: Wire up loadvm_postcopy_ram_handle_{run,end} commands >>>> End of migration for postcopy >>>> >>>> Makefile.objs | 2 +- >>>> arch_init.c | 739 +++++++++++++++++++++++++-- >>>> docs/migration.txt | 189 +++++++ >>>> exec.c | 76 ++- >>>> hmp-commands.hx | 15 + >>>> hmp.c | 7 + >>>> hmp.h | 1 + >>>> include/exec/cpu-common.h | 8 +- >>>> include/migration/migration.h | 130 +++++ >>>> include/migration/postcopy-ram.h | 106 ++++ >>>> include/migration/qemu-file.h | 47 ++ >>>> include/migration/vmstate.h | 2 +- >>>> include/qemu/sockets.h | 1 + >>>> include/qemu/typedefs.h | 9 +- >>>> include/sysemu/sysemu.h | 43 +- >>>> migration-fd.c | 24 +- >>>> migration-rdma.c | 4 +- >>>> migration.c | 693 +++++++++++++++++++++++++- >>>> postcopy-ram.c | 1016 >>>> ++++++++++++++++++++++++++++++++++++++ >>>> qapi-schema.json | 14 +- >>>> qemu-file.c | 598 +++++++++++++++++++++- >>>> qmp-commands.hx | 19 + >>>> savevm.c | 881 +++++++++++++++++++++++++++++++-- >>>> tests/Makefile | 2 +- >>>> tests/test-vmstate.c | 74 +-- >>>> util/qemu-sockets.c | 28 ++ >>>> 26 files changed, 4550 insertions(+), 178 deletions(-) >>>> create mode 100644 include/migration/postcopy-ram.h >>>> create mode 100644 postcopy-ram.c >>>> >>>> -- >>>> 1.9.3 >>>> >>>> >>> -- >>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >> > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK