The objective for the live migration priority is to improve the stability of migrations based on operator experience. The high level approach is to do the following:
1. Improve CI 2. Improve documentation 3. Improve manageability of migrations 4. Fix bugs In this cycle we targeted a few immediately implementable features that would help, specifically giving operators commands to allow them to manage migrations (inspect progress, force completion, and cancel) and improve security (split-networks and remove ssh-based resize/migration; aka storage pools). Most of these are on track to be completed in this cycle with the exception of storage pools work which is being deferred. Further details follow. Expand CI coverage - in progress There is a job in the experimental queue called: gate-tempest-dsvm-multinode-live-migrationqueued. This will become the job that performs live migration tests; any live migration tests in other jobs will be removed. At present the job has been configured to cover different storage configurations including cinder, NFS, ceph. Tests are now being added to the job. Patches are currently up for live migration of instances with swap and instances with ephemeral disks. Please trigger the experimental queue if your patches touch migrations in some way so we can check the stability of the jobs. Once stable and with sufficient tests we will promote the job from the experimental queue so that it always runs. See: https://review.openstack.org/#/q/topic:lm_test Improve API docs - done Some changes were made to the API guide for moving servers, including better descriptions for the server actions migrate, live migrate, shelve, resize and evacuate ( http://developer.openstack.org/api-guide/compute/server_concepts.html#server-actions ) and a section that describes reasons for moving VMs with common use cases outlined ( http://developer.openstack.org/api-guide/compute/server_concepts.html#moving-servers ) Block live migration with attached volumes - done The selective block device migration API in libvirt 1.2.17 is used to allow block migration when volumes are attached. A follow on patch to allow readonly drives to be copied in block migration has not been completed. This patch is required to allow iso9600 format config drives to be migrated. Without it only vfat config drives can be migrated. There is still some thought going into that - see: https://review.openstack.org/#/c/234659 Force complete - requires python-novaclient change Force-complete forces a live migration to complete by pausing the VM and restarting it when it has completed migration. This is intended as a brute force way to make a VM complete its migration when it is taking too long. In the future auto-converge and post-copy will be looked at. These became available in qemu 2.5. Force complete is done in nova but still requires a change to python-novaclient to implement the CLI. Cancel - in progress Cancel stops a live migration, leaving it on the source host with the migration status left as "cancelled". This is in progress and follows the pattern of force-complete. Unfortunately this needs to be bundled up into one patch to avoid multiple API bumps. Patches for review: https://review.openstack.org/#/q/status:open+topic:bp/abort-live-migration Progress reporting - in progress (no pun intended) Progress reporting introduces migrations as a sub-resource of servers and adds progress data to the migration record. There was some debate at the mid cycle and on the mailing list about how to record this transient data. It is a waste to keep writing it to the database, but as it is generated at the compute manager but examined at the API it was felt that writing it to the database is necessary to fit the existing architecture. The conclusions was that writing to the database every 5 seconds would not cause a significant overhead. Alternatives could be persued later if necessary. For discussion see this ML thread: http://lists.openstack.org/pipermail/openstack-dev/2016-February/085662.html and the IRC meeting transcript here: http://eavesdrop.openstack.org/meetings/nova_live_migration/2016/nova_live_migration.2016-02-09-14.01.log.html Patches for review: https://review.openstack.org/#/q/status:open+topic:bp/live-migration-progress-report Split networking - done Split networking adds a configuration parameter to specify live_migration_inbound_addr as the ip address or host name to be used as the target for migration traffic. This allows migration traffic to be isolated on a separate network to other management traffic, providing an opportunity to islate service levels for the two networks and improve security by moving unencrypted migration traffic to an isolated network. Resize/cold migrate using storage pools - deferred The objective here was to change the libvirt implementation of migrate and resize to use libvirt storage pools instead of scp/rsync over ssh with passwordless keys. Storage pools are supported in all versions of libvrit supported by nova, so it was thought that by changing the implementation it would be possible to drop the ssh based code. However two flaws in this approach arose: the recently added ploop storage device does not work with storage pools in libvirt and the libvirt data copy implementation is very inefficient and so slower than scp or rsync. The guys at Parallels kindly agreed to implement storage pools support for ploop in libvirt and this work is already making progress. Work was also started in libvirt to improve the copy performance. These features will be available in a future release, so we will need to maintain old ssh-based migration for libvirt as well as refactor and implement the storage pools based alternative. Work has started on refactoring the libvirt driver code but the following blueprints will be deferred beyond mitaka: http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/use-libvirt-storage-pools.html http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/migrate-libvirt-volumes.html Deprecate migration flags - done There are a lot of migration flags used with libvirt that are either redundant or can be inferred from the deployed configuration. These are being deprecated and will be removed in the next cycle. See: https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:deprecate-migration-flags-config Feel free to respond with corrections or additions. Regards, Paul Paul Murray Technical Lead, HPE Cloud Hewlett Packard Enterprise +44 117 316 2527
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev