On Mon, Jun 6, 2016 at 11:26 PM, Matt Riedemann <mrie...@linux.vnet.ibm.com> wrote:
> On 6/6/2016 12:15 PM, Matt Riedemann wrote: > >> On 1/8/2016 12:28 PM, Mark McLoughlin wrote: >> >>> On Fri, 2016-01-08 at 14:11 +0000, Daniel P. Berrange wrote: >>> >>>> On Thu, Jan 07, 2016 at 09:07:00PM +0000, Mark McLoughlin wrote: >>>> >>>>> On Thu, 2016-01-07 at 12:23 +0100, Sahid Orentino Ferdjaoui >>>>> wrote: >>>>> >>>>>> On Mon, Jan 04, 2016 at 09:12:06PM +0000, Mark McLoughlin >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> commit 8ecf93e[1] got me thinking - the live_migration_flag >>>>>>> config option unnecessarily allows operators choose arbitrary >>>>>>> behavior of the migrateToURI() libvirt call, to the extent >>>>>>> that we allow the operator to configure a behavior that can >>>>>>> result in data loss[1]. >>>>>>> >>>>>>> I see that danpb recently said something similar: >>>>>>> >>>>>>> https://review.openstack.org/171098 >>>>>>> >>>>>>> "Honestly, I wish we'd just kill off 'live_migration_flag' >>>>>>> and 'block_migration_flag' as config options. We really >>>>>>> should not be exposing low level libvirt API flags as admin >>>>>>> tunable settings. >>>>>>> >>>>>>> Nova should really be in charge of picking the correct set of >>>>>>> flags for the current libvirt version, and the operation it >>>>>>> needs to perform. We might need to add other more sensible >>>>>>> config options in their place [..]" >>>>>>> >>>>>> >>>>>> Nova should really handle internal flags and this serie is >>>>>> running in the right way. >>>>>> >>>>>> ... >>>>>>> >>>>>> >>>>>> 4) Add a new config option for tunneled versus native: >>>>>>> >>>>>>> [libvirt] live_migration_tunneled = true >>>>>>> >>>>>>> This enables the use of the VIR_MIGRATE_TUNNELLED flag. We >>>>>>> have historically defaulted to tunneled mode because it >>>>>>> requires the least configuration and is currently the only >>>>>>> way to have a secure migration channel. >>>>>>> >>>>>>> danpb's quote above continues with: >>>>>>> >>>>>>> "perhaps a "live_migration_secure_channel" to indicate that >>>>>>> migration must use encryption, which would imply use of >>>>>>> TUNNELLED flag" >>>>>>> >>>>>>> So we need to discuss whether the config option should >>>>>>> express the choice of tunneled vs native, or whether it >>>>>>> should express another choice which implies tunneled vs >>>>>>> native. >>>>>>> >>>>>>> https://review.openstack.org/263434 >>>>>>> >>>>>> >>>>>> We probably have to consider that operator does not know much >>>>>> about internal libvirt flags, so options we are exposing for >>>>>> him should reflect benefice of using them. I commented on your >>>>>> review we should at least explain benefice of using this option >>>>>> whatever the name is. >>>>>> >>>>> >>>>> As predicted, plenty of discussion on this point in the review >>>>> :) >>>>> >>>>> You're right that we don't give the operator any guidance in the >>>>> help message about how to choose true or false for this: >>>>> >>>>> Whether to use tunneled migration, where migration data is >>>>> transported over the libvirtd connection. If True, we use the >>>>> VIR_MIGRATE_TUNNELLED migration flag >>>>> >>>>> libvirt's own docs on this are here: >>>>> >>>>> https://libvirt.org/migration.html#transport >>>>> >>>>> which emphasizes: >>>>> >>>>> - the data copies involved in tunneling - the extra configuration >>>>> steps required for native - the encryption support you get when >>>>> tunneling >>>>> >>>>> The discussions I've seen on this topic wrt Nova have revolved >>>>> around: >>>>> >>>>> - that tunneling allows for an encrypted transport[1] - that >>>>> qemu's NBD based drive-mirror block migration isn't supported >>>>> using tunneled mode, and that danpb is working on fixing this >>>>> limitation in libvirt - "selective" block migration[2] won't work >>>>> with the fallback qemu block migration support, and so won't >>>>> currently work in tunneled mode >>>>> >>>> >>>> I'm not working on fixing it, but IIRC some other dev had proposed >>>> patches. >>>> >>>> >>>>> So, the advise to operators would be: >>>>> >>>>> - You may want to choose tunneled=False for improved block >>>>> migration capabilities, but this limitation will go away in >>>>> future. - You may want to choose tunneled=False if you wish to >>>>> trade and encrypted transport for a (potentially negligible) >>>>> performance improvement. >>>>> >>>>> Does that make sense? >>>>> >>>>> As for how to name the option, and as I said in the review, I >>>>> think it makes sense to be straightforward here and make it >>>>> clearly about choosing to disable libvirt's tunneled transport. >>>>> >>>>> If we name it any other way, I think our explanation for >>>>> operators will immediately jump to explaining (a) that it >>>>> influences the TUNNELLED flag, and (b) the differences between >>>>> the tunneled and native transports. So, if we're going to have to >>>>> talk about tunneled versus native, why obscure that detail? >>>>> >>>> >>>> Ultimately we need to recognise that libvirt's tunnelled mode was >>>> added as a hack, to work around fact that QEMU lacked any kind of >>>> native security capabilities & didn't appear likely to ever get >>>> them at that time. As well as not working with modern NBD based >>>> block device encryption, it really sucks for performance because it >>>> introduces many extra data copies. So it is going to be quite poor >>>> for large VMs with heavy rate of data dirtying. >>>> >>>> The only long term relative "benefit" of tunnelled mode is that it >>>> avoids the need to open extra firewall ports. >>>> >>>> IMHO, the long term future is to *never* use tunnelled mode for >>>> QEMU. This will be viable when my support for native TLS support in >>>> QEMU migration + NBD protocols is merged. I'm hopeful this wil be >>>> for QEMU 2.6 >>>> >>>> But, Pawel strongly disagrees. >>>>> >>>>> One last point I'd make is this isn't about adding a *new* >>>>> configuration capability for operators. As we deprecate and >>>>> remove these configuration options, we need to be careful not to >>>>> remove a capability that operators are currently depending on for >>>>> arguably reasonable reasons. >>>>> >>>> >>>> My view is that "live_migration_tunneled" is a reasonable >>>> parameter to add, because there is a genuine need to let admins >>>> choose this behaviour. We should make sure it is correctly done as >>>> a tri-state flag though, when it is 'None', Nova should pick what >>>> it things is the best approach based on QEMU version. Probably to >>>> use QEMU native when it supports TLS, otherwise use tunnelled if >>>> possible to get security. >>>> >>> >>> Great feedback. I buy that. >>> >>> [1] - https://review.openstack.org/#/c/171098/ [2] - >>>>> https://review.openstack.org/#/c/227278 >>>>> >>>>> >>>>> 5) Add a new config option for additional migration flags: >>>>>>> >>>>>>> [libvirt] live_migration_extra_flags = >>>>>>> VIR_MIGRATE_COMPRESSED >>>>>>> >>>>>>> This allows operators to continue to experiment with libvirt >>>>>>> behaviors in safe ways without each use case having to be >>>>>>> accounted for. >>>>>>> >>>>>>> https://review.openstack.org/263435 >>>>>>> >>>>>>> We would disallow setting the following flags via this >>>>>>> option: >>>>>>> >>>>>>> VIR_MIGRATE_LIVE VIR_MIGRATE_PEER2PEER VIR_MIGRATE_TUNNELLED >>>>>>> VIR_MIGRATE_PERSIST_DEST VIR_MIGRATE_UNDEFINE_SOURCE >>>>>>> VIR_MIGRATE_NON_SHARED_INC VIR_MIGRATE_NON_SHARED_DISK >>>>>>> >>>>>>> which would allow the following currently available flags to >>>>>>> be set: >>>>>>> >>>>>> >>>>>> VIR_MIGRATE_PAUSED VIR_MIGRATE_CHANGE_PROTECTION >>>>>>> VIR_MIGRATE_UNSAFE VIR_MIGRATE_OFFLINE >>>>>>> VIR_MIGRATE_COMPRESSED VIR_MIGRATE_ABORT_ON_ERROR >>>>>>> VIR_MIGRATE_AUTO_CONVERGE VIR_MIGRATE_RDMA_PIN_ALL >>>>>>> >>>>>> >>>>>> We can probably consider to provide VIR_MIGRATE_PAUSED and >>>>>> VIR_MIGRATE_COMPRESSED as dedicated options too ? >>>>>> >>>>> >>>>> Yes, I think any options we see regularly added to extra_flags >>>>> by operators, and as we understand the use cases better, then we >>>>> can add dedicated options for them. >>>>> >>>> >>>> I really don't see a case for letting the admin set >>>> VIR_MIGRATE_PAUSED at a host level. If we want the ability to force >>>> a running VM to end up paused after migration, this is a feature to >>>> be added to the Nova migration API. >>>> >>>> The VIR_MIGRATE_COMPRESSED is not as simple as just enabling a >>>> flag, there are other associated runtime tunables that need >>>> setting. There was a spec discussing this which was not approved as >>>> a suitable strategy for using it could not be agreed. >>>> >>>> In the review, Pawel is making a case for not allowing the >>>>> operator to enable COMPRESSED or AUTO_CONVERGE. >>>>> >>>> >>>> I agree really. As per my comments, I in fact struggle to see a >>>> credible case for allowing any of the remaining flags to be >>>> enabled. They are all cases that Nova should be made todo the right >>>> thing, possibly in relation to API parameters or other deployment >>>> choices. >>>> >>> >>> Fair enough. I figured it would be a necessary safety valve against >>> operators who value the flexibility of the current configuration >>> options, but you make a good case. >>> >>> I'll drop the extra_flags option and make tunneled a tri-state. >>> >>> Thanks, Mark. >>> >>> >>> __________________________________________________________________________ >>> >>> >>> >>> OpenStack Development Mailing List (not for usage questions) >> >>> Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >> I'm going to be a jackass and skip this entire thread with a question. >> >> The live migration CI job we have in the experimental queue is blocked >> on this failure right now [1]. >> >> "Live Migration failure: Operation not supported: Selecting disks to >> migrate is not implemented for tunnelled migration" >> >> That's running with ubuntu 16.04 and libvirt 1.2.17. >> >> I was looking at the deprecated live_migration_flag which tells you to >> use the live_migration_tunnelled flag, which defaults to None and is a >> no-op, even though the help text on the option says if it's omitted >> we'll do our best to figure out if you want to use tunneling based on >> the hypervisor's encryption support. That doesn't actually happen by >> default [2]. >> >> So what should happen? It seems some of this series was stalled or >> incomplete? >> >> We could change the configuration of our live migration CI job, but I'd >> like to see the job work with the defaults that we ship. >> >> I'm not entirely familiar with the thing that's causing the failure >> except I believe we're calling migrateToURI3 with the tunnelled flag >> (because it's in the default live_migration_flag option) and a list of >> devices [3][4]. >> >> Would it be better to use migrateToURI2 if migrate_disks is in params >> but the VIR_MIGRATE_TUNNELLED flag is set? >> >> [1] https://bugs.launchpad.net/nova/+bug/1589591 >> [2] >> >> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L559 >> >> [3] >> >> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5740-5751 >> >> [4] >> >> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/guest.py#L523 >> >> >> > Timofey is working on a fix here: > > https://review.openstack.org/#/c/326111/ > > I've left some comments inline, but I'm mostly wondering about this now: > > > https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5360 > > The code there says that if there are BDMs and libvirt<1.2.17 and you're > doing block migration, it's unsupported and fails. The comment also says > that it can't do that in tunneled method either - which is the bug we're > hitting. > > But should we add the block_migration_flag check in > check_can_live_migrate_source? Or does it work to live migrate in tunneled > mode as long as we don't pass 'migrate_disks'? So we got 2 options: - raise exception in case of tunneled block migration with migrate_disks param - as migrate flags are supposed to be nova internals this is bad option - get rid of one of these option: - don't use VIR_MIGRATE_TUNNELLED - looks most preferable for now - don't pass migrate disks param (it will force block live migration to fail for instances with attached volumes, and volume-backed instances with config-drive(vfat)) > -- > > Thanks, > > Matt Riedemann > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev