Hi commit 8ecf93e[1] got me thinking - the live_migration_flag config option unnecessarily allows operators choose arbitrary behavior of the migrateToURI() libvirt call, to the extent that we allow the operator to configure a behavior that can result in data loss[1].
I see that danpb recently said something similar: https://review.openstack.org/171098 "Honestly, I wish we'd just kill off 'live_migration_flag' and 'block_migration_flag' as config options. We really should not be exposing low level libvirt API flags as admin tunable settings. Nova should really be in charge of picking the correct set of flags for the current libvirt version, and the operation it needs to perform. We might need to add other more sensible config options in their place [..]" I've just proposed a series of patches, which boils down to the following steps: 1) Modify the approach taken in commit 8ecf93e so that instead of just warning about unsafe use of NON_SHARED_INC, we fix up the config option to a safe value. https://review.openstack.org/263431 2) Hard-code the P2P flag for live and block migrations as appropriate for the libvirt driver being used. For the qemu driver, We should always use VIR_MIGRATE_PEER2PEER both live and block migrations. Without this option, you get: Live Migration failure: Requested operation is not valid: direct migration is not supported by the connection driver OTOH, the Xen driver does not support P2P, and only supports "unmanaged direct connection". https://review.openstack.org/263432 3) Require the use of the UNDEFINE_SOURCE flag, and the non-use of the PERSIST_DEST flag. Nova itself persists the domain configuration on the destination host, but it assumes the libvirt migration call removes it from the source host. So it makes no sense to allow operators configure these flags. https://review.openstack.org/263433 4) Add a new config option for tunneled versus native: [libvirt] live_migration_tunneled = true This enables the use of the VIR_MIGRATE_TUNNELLED flag. We have historically defaulted to tunneled mode because it requires the least configuration and is currently the only way to have a secure migration channel. danpb's quote above continues with: "perhaps a "live_migration_secure_channel" to indicate that migration must use encryption, which would imply use of TUNNELLED flag" So we need to discuss whether the config option should express the choice of tunneled vs native, or whether it should express another choice which implies tunneled vs native. https://review.openstack.org/263434 5) Add a new config option for additional migration flags: [libvirt] live_migration_extra_flags = VIR_MIGRATE_COMPRESSED This allows operators to continue to experiment with libvirt behaviors in safe ways without each use case having to be accounted for. https://review.openstack.org/263435 We would disallow setting the following flags via this option: VIR_MIGRATE_LIVE VIR_MIGRATE_PEER2PEER VIR_MIGRATE_TUNNELLED VIR_MIGRATE_PERSIST_DEST VIR_MIGRATE_UNDEFINE_SOURCE VIR_MIGRATE_NON_SHARED_INC VIR_MIGRATE_NON_SHARED_DISK which would allow the following currently available flags to be set: VIR_MIGRATE_PAUSED VIR_MIGRATE_CHANGE_PROTECTION VIR_MIGRATE_UNSAFE VIR_MIGRATE_OFFLINE VIR_MIGRATE_COMPRESSED VIR_MIGRATE_ABORT_ON_ERROR VIR_MIGRATE_AUTO_CONVERGE VIR_MIGRATE_RDMA_PIN_ALL 6) Deprecate the existing live_migration_flag and block_migration_flag config options. Operators would be expected to migrate to using the live_migration_tunneled or live_migration_extra_flags config options. During the deprecation period we would invite feedback as to whether additional config options are needed to cover unanticipated use cases. https://review.openstack.org/263436 Thanks in advance for any feedback. I'm going to guess that one piece of feedback will be that some subset of this needs a blueprint (and maybe a spec), and that the blueprint freeze was a month ago, so that subset needs to be punted until after Mitaka? I'd love to be wrong about that, though :) Thanks, Mark. [1] - https://review.openstack.org/228853 [2] - Data loss can occur when you have disk images on shared storage and you specify the VIR_MIGRATE_NON_SHARED_INC or VIR_MIGRATE_NON_SHARED_DISK because during the block migration the disk is copied back over itself while it is in use from another node. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev