On Fri, Sep 18, 2015 at 11:53:05AM +0000, Murray, Paul (HP Cloud) wrote: > Hi All, > > There are various efforts going on around live migration at the moment: > fixing up CI, bug fixes, additions to cover more corner cases, proposals > for new operations.... > > Generally live migration could do with a little TLC (see: [1]), so I am > going to suggest we give some of that care in the next cycle. > > Please respond to this post if you have an interest in this and what you > would like to see done. Include anything you are already getting on with > so we get a clear picture. If there is enough interest I'll put this > together as a proposal for a work stream. Something along the lines of > "robustify live migration".
We merged some robustness improvements for migration during Liberty. Specifically, with KVM we now track the progress of data transfer and if it is not making forward progress during a set window of time, we will abort the migration. This ensures you don't get a migration that never ends. We also now have code which dynamically increases the max permitted downtime during switchover, to try and make it more likely to succeeed. We could do with getting feedback on how well the various tunable settings work in practie for real world deployments, to see if we need to change any defaults. There was a proposal to nova to allow the 'pause' operation to be invoked while migration was happening. This would turn a live migration into a coma-migration, thereby ensuring it succeeds. I cna't remember if this merged or not, as i can't find the review offhand, but its important to have this ASAP IMHO, as when evacuating VMs from a host admins need a knob to use to force successful evacuation, even at the cost of pausing the guest temporarily. In libvirt upstream we now have the ability to filter what disks are migrated during block migration. We need to leverage that new feature to fix the long standing problems of block migration when non-local images are attached - eg cinder volumes. We definitely want this in Mitaka. We should look at what we need to do to isolate the migration data network from the main management network. Currently we live migrate over whatever network is associated with the compute hosts primary Hostname / IP address. This is not neccessarily the fastest NIC on the host. We ought to be able to record an alternative hostname / IP address against each compute host to indicate the desired migration interface. Libvirt/KVM have the ability to turn on compression for migration which again improves the chances of convergance & thus success. We would look at leveraging that. QEMU has a crude "auto-converge" flag you can turn on, which limits guest CPU execution time, in an attempt to slow down data dirtying rate to again improve chance of successful convergance. I'm working on enhancements to QEMU itself to support TLS encryption for migration. This will enable openstack to have secure migration datastream, without having to tunnel via libvirtd. This is useful as tunneling via libvirtd doesn't work with block migration. It will also be much faster than tunnelling. This probably might be merged in QEMU before Mitaka cycle ends, but more likely it is Nxxx cycle There is also work on post-copy migration in QEMU. Normally with live migration, the guest doesn't start executing on the target host until migration has transferred all data. There are many workloads where that doesn't work, as the guest is dirtying data too quickly, With post-copy you can start runing the guest on the target at any time, and when it faults on a missing page that will be pulled from the source host. This is slightly more fragile as you risk loosing the guest entirely if the source host dies before migration finally completes. It does guarantee that migration will succeed no matter what workload is in the guest. This is probably Nxxxx cycle material. Testing. Testing. Testing. Lots more I can't think of right now.... Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev