On 03/12/2015 12:34 PM, John Garbutt wrote:
On 12 March 2015 at 08:41, Luis Tomas <l...@cs.umu.se> wrote:
Hi,

As part of an European (FP7) project, named ORBIT
(http://www.orbitproject.eu/), I'm working on including the possibility of
live-migrating VMs in OpenStack in a post-copy mode.
This way of live-migrating VMs basically moves the computation right away to
the destination and then the VM starts working from there, while still
copying the memory from the source to the new location of the VM. That way
the memory pages are only copied as if the VM modifies them, they are
already in the destination host. This basically ensures that migrations
finish regardless of what the VM is doing, i.e., even extremely memory
intensive VMs. Therefore removing the problem of having VMs hanging on in
migrating state forever (as discussed in previous mails, e.g.,
http://lists.openstack.org/pipermail/openstack-dev/2015-February/055725.html).

So far, I have included and tested this new functionality at the JUNO
version, and the code modifications can be found in the github repository of
the project (branch named "post-copy"):
     - https://github.com/orbitfp7/nova/tree/post-copy --> mainly enabling
the possibility of using the libvirt post-copy flag (libvirt driver.py).
Note post-copy migration is not using "tunneling" as LibVirt patch for that
is not yet ready.
     - https://github.com/orbitfp7/python-novaclient/tree/post-copy -->
adding the possibility of using the post-copy mode when triggering the
migration: nova live-migration [--block-migrate] [--post-copy] VM_ID
     - https://github.com/orbitfp7/horizon/tree/post-copy --> include a
checkbox in the live-migration panel to perform the migration in post-copy
mode. (like the one for enabling block-migration)

To be able to live-migrate VMs in a post-copy way, I'm relying on some
kernel+qemu+libvirt modifications, not yet merged upstream (but in their way
to it), also available at the project github:
     - Kernel: https://lkml.org/lkml/2015/3/5/576
     - Qemu: https://github.com/orbitfp7/qemu/tree/wp3-postcopy
     - LibVirt: https://github.com/orbitfp7/libvirt/tree/wp3-postcopy
Before merging the code in Nova, we usually like the dependent
features to be released by the respective projects.

Ideally we would like it to be easy to run that on some distro so
people could test/use the feature fairly easily.
Yes, that's why I proposed to target the version after kilo (or even the next to that one if need be)


If this is a nice feature to have in future versions of OpenStack, I'm happy
to adapt the code for the next release (the one after KILO). Any comments
are really welcome.
It sounds like something that doesn't need an API call, as its a
deployer choice if they have support for this new live-migrate mode.
Is that true?

Although maybe it has a substantial runtime penalty as a page read
miss causes a fetch across the network, making it a user choice? Or do
you only start the fetch mode at the point you detect a failure to
"merge" using the regular live-migrate mode?

I think it should be up to the user/admin what option to choose.
Although post-copy ensures that the migration will finish, as you said, it could have some impact into the VM performance due to having to wait until a missing memory page is fetched. Anyway, I wouldn't say there is a substantial runtime penalty. In fact, the libvirt flag that we have included in OpenStack basically tries pre-copy first (normal live-migration), and after trying to copy all the memory once (first iteration), automatically changes to post-copy, meaning moving the VM cpu to the destination and only having to copy the remaining pages (the ones dirtied while doing the first copy iteration). This way the impact into the application performance is minimized.

On the other hand, post-copy has a downside. If by any chance the migration crash during the process, unlike pre-copy, you can not recover the VM as not the source nor the destination has a fully working VM at the time (part of the memory in the source, part of it at the destination).

These are basically the reasons we considered for making it as an optional choice.

Thank you for your comments!

Regards,
Luis

Thanks,
John

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--
-----------------------------------
Dr. Luis Tomás
Postdoctoral Researcher
Department of Computing Science
Umeå University
l...@cs.umu.se
www.cloudresearch.se
www8.cs.umu.se/~luis
------------------------------------


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to