On 12 March 2015 at 12:26, Luis Tomas <l...@cs.umu.se> wrote: > On 03/12/2015 12:34 PM, John Garbutt wrote: >> >> On 12 March 2015 at 08:41, Luis Tomas <l...@cs.umu.se> wrote: >>> >>> Hi, >>> >>> As part of an European (FP7) project, named ORBIT >>> (http://www.orbitproject.eu/), I'm working on including the possibility >>> of >>> live-migrating VMs in OpenStack in a post-copy mode. >>> This way of live-migrating VMs basically moves the computation right away >>> to >>> the destination and then the VM starts working from there, while still >>> copying the memory from the source to the new location of the VM. That >>> way >>> the memory pages are only copied as if the VM modifies them, they are >>> already in the destination host. This basically ensures that migrations >>> finish regardless of what the VM is doing, i.e., even extremely memory >>> intensive VMs. Therefore removing the problem of having VMs hanging on in >>> migrating state forever (as discussed in previous mails, e.g., >>> >>> http://lists.openstack.org/pipermail/openstack-dev/2015-February/055725.html). >>> >>> So far, I have included and tested this new functionality at the JUNO >>> version, and the code modifications can be found in the github repository >>> of >>> the project (branch named "post-copy"): >>> - https://github.com/orbitfp7/nova/tree/post-copy --> mainly >>> enabling >>> the possibility of using the libvirt post-copy flag (libvirt driver.py). >>> Note post-copy migration is not using "tunneling" as LibVirt patch for >>> that >>> is not yet ready. >>> - https://github.com/orbitfp7/python-novaclient/tree/post-copy --> >>> adding the possibility of using the post-copy mode when triggering the >>> migration: nova live-migration [--block-migrate] [--post-copy] VM_ID >>> - https://github.com/orbitfp7/horizon/tree/post-copy --> include a >>> checkbox in the live-migration panel to perform the migration in >>> post-copy >>> mode. (like the one for enabling block-migration) >>> >>> To be able to live-migrate VMs in a post-copy way, I'm relying on some >>> kernel+qemu+libvirt modifications, not yet merged upstream (but in their >>> way >>> to it), also available at the project github: >>> - Kernel: https://lkml.org/lkml/2015/3/5/576 >>> - Qemu: https://github.com/orbitfp7/qemu/tree/wp3-postcopy >>> - LibVirt: https://github.com/orbitfp7/libvirt/tree/wp3-postcopy >> >> Before merging the code in Nova, we usually like the dependent >> features to be released by the respective projects. >> >> Ideally we would like it to be easy to run that on some distro so >> people could test/use the feature fairly easily. > > Yes, that's why I proposed to target the version after kilo (or even the > next to that one if need be)
Ah, cool. I just wanted to be explicit about that. >>> If this is a nice feature to have in future versions of OpenStack, I'm >>> happy >>> to adapt the code for the next release (the one after KILO). Any comments >>> are really welcome. >> >> It sounds like something that doesn't need an API call, as its a >> deployer choice if they have support for this new live-migrate mode. >> Is that true? >> >> Although maybe it has a substantial runtime penalty as a page read >> miss causes a fetch across the network, making it a user choice? Or do >> you only start the fetch mode at the point you detect a failure to >> "merge" using the regular live-migrate mode? > > > I think it should be up to the user/admin what option to choose. > Although post-copy ensures that the migration will finish, as you said, it > could have some impact into the VM performance due to having to wait until a > missing memory page is fetched. Anyway, I wouldn't say there is a > substantial runtime penalty. In fact, the libvirt flag that we have included > in OpenStack basically tries pre-copy first (normal live-migration), and > after trying to copy all the memory once (first iteration), automatically > changes to post-copy, meaning moving the VM cpu to the destination and only > having to copy the remaining pages (the ones dirtied while doing the first > copy iteration). This way the impact into the application performance is > minimized. Ah, thats what I was trying to describe and failed. Sounds good. > On the other hand, post-copy has a downside. If by any chance the migration > crash during the process, unlike pre-copy, you can not recover the VM as not > the source nor the destination has a fully working VM at the time (part of > the memory in the source, part of it at the destination). Eek, good point. > These are basically the reasons we considered for making it as an optional > choice. Totally make sense. Only tip is to include that sort of information when you submit your nova-spec, once those features are merged and released. Thanks, John __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev