[Openstack-operators] [nova] Queens PTG recap - everything else
There was a whole lot of other stuff discussed at the PTG. The details are in [1]. I won't go into everything here, so I'm just highlighting some of the more concrete items that had owners or TODOs. Ironic -- The Ironic team came over on Wednesday afternoon. We talked a bit, had some laughs, it was a good time. Since I don't speak fluent baremetal, Dmitry Tantsur is going to recap those discussions in the mailing list. Thanks again, Dmitry. Privsep --- Michael Still has been going hog wild converting the nova libvirt driver code to use privsep instead of rootwrap. He has a series of changes tracked under this blueprint [2]. Most of the discussion was a refresh on privsep and a recap of what's already been merged and some discussion on outstanding patches. The goal for Queens is to get the entire libvirt driver converted and also try to get all of nova-compute converted, but we want to limit that to getting things merged early in the release to flush out bugs since a lot of these are weird, possibly untested code paths. There was also discussion of a kind of privsep heartbeat daemon to tell if it's running (even though it's not a separate service) but this is complicated and is not something we'll pursue for Queens. Websockify security proxy framework --- This is a long-standing security hardening feature [3] which has changed hands a few times and hasn't gotten much review. Sean Dague and Melanie Witt agreed to focus on reviewing this for Queens. Certificate validation -- This is another item that's been discussed since at least the Ocata summit but hasn't made much progress. Sean Dague agreed to help review this, and Eric Fried said he knew someone that could help review the security aspects of this change. Sean also suggested scheduling a hangout so the John Hopkins University team working on this can give a primer on the feature and what to look out for during review. We also suggested getting a scenario test written for this in the barbican tempest plugin, which runs as an experimental queue job for nova. Notifications - Given the state of the Searchlight project and how we don't plan on using Searchlight as a global proxy for the compute REST API, we are not going to work on parity with versioned notifications there. There are some cleanups we still need to do in Nova for versioned notifications from a performance perspective. We also agreed that we aren't going to consider deprecating legacy unversioned notifications until we have parity with the versioned notifications, especially given legacy unversioned notification consumers have not yet moved to using the versioned notifications. vGPU support This depends on nested resource providers (like lots of other things). It was not clear from the discussion if this is static or dynamic support, e.g. can we hot plug vGPUs using Cyborg? I assume we will not support hot plugging at first. We also need improved functional testing of this space before we can make big changes. Preemptible (spot) instances - This was continuing the discussion from the Boston forum session [5]. The major issue in Nova is that we don't want Nova to be in charge of orchestrating preempting instances when a request comes in for a "paid" instance. We agreed to start small where you can't burst over quota. Blazar also delivered some reservation features in Pike [6] which sound like they can be built on here, which also sound like expiration policies. Someone will have to prototype an external (to nova) "reaper" which will cull the preemptible instances based on some configurable policy. Honestly the notes here are confusing so we're going to need someone to drive this forward. That might mean picking up John Garbutt's draft spec for this (link not available right now). Driver updates -- Various teams from IBM gave updates on plans for their drivers in Queens. PowerVM (in tree): the team is proposing a few more capabilities to the driver in Queens. Details are in the spec [7]. zDPM (out of tree): this out of tree driver has had two releases (ocata and pike) and is working on 3rd party CI. One issue they have with Tempest is they can only boot from volume. zVM (out of tree): the team is working on refactoring some code into a library, similar to os-xenapi, os-powervm and oslo.vmware. They have CI running but are not yet reporting against nova changes. Endpoint discovery -- This is carry-over work from Ocata and Pike to standardize how Nova does endpoint discovery with other services, like keystone/placement/cinder/glance/neutron/ironic/barbican. The spec is here [8]. The dependent keystoneauth1 changes were released in Pike so we should be able to make quick progress on this early in Queens to flush out bugs. Documentation - We talked about the
Re: [Openstack-operators] [tripleo] Making containerized service deployment the default
On Mon, Sep 18, 2017 at 3:04 PM, Alex Schultzwrote: > Hey ops & devs, > > We talked about containers extensively at the PTG and one of the items > that needs to be addressed is that currently we still deploy the > services as bare metal services via puppet. For Queens we would like > to switch the default to be containerized services. With this switch > we would also start the deprecation process for deploying services as > bare metal services via puppet. We still execute the puppet > configuration as part of the container configuration process so the > code will continue to be leveraged but we would be investing more in > the continual CI of the containerized deployments and reducing the > traditional scenario coverage. > > As we switch over to containerized services by default, we would also > begin to reduce installed software on the overcloud images that we > currently use. We have an open item to better understand how we can > switch away from the golden images to a traditional software install > process during the deployment and make sure this is properly tested. > In theory it should work today by switching the default for > EnablePackageInstall[0] to true and configuring repositories, but this > is something we need to verify. > > If anyone has any objections to this default switch, please let us know. I think this is a great initiative. It would be nice to share some of the TripleO experience in containerized deployments so that we can use Puppet for containerized deployments. Perhaps we can work together on adding some classes which can help deploy and configure containerized services with Puppet. > > Thanks, > -Alex > > [0] > https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/tripleo-packages.yaml#L33-L36 > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] [publiccloud-wg] Extra meeting PublicCloudWorkingGroup
Hi everyone, We will have an "extra" meeting on Wednesday at 1400 UTC in #openstack-publiccloud Main purpose for this extra meeting will be to finalize the agenda for the meetup in London next week. Agenda and etherpad: https://etherpad.openstack.org/p/publiccloud-wg Meetup etherpad: https://etherpad.openstack.org/p/MEETUPS-2017-publiccloud-wg Regards, Tobias Co-chair PublicCloud WG smime.p7s Description: S/MIME Cryptographic Signature ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] [tripleo] Making containerized service deployment the default
Hey ops & devs, We talked about containers extensively at the PTG and one of the items that needs to be addressed is that currently we still deploy the services as bare metal services via puppet. For Queens we would like to switch the default to be containerized services. With this switch we would also start the deprecation process for deploying services as bare metal services via puppet. We still execute the puppet configuration as part of the container configuration process so the code will continue to be leveraged but we would be investing more in the continual CI of the containerized deployments and reducing the traditional scenario coverage. As we switch over to containerized services by default, we would also begin to reduce installed software on the overcloud images that we currently use. We have an open item to better understand how we can switch away from the golden images to a traditional software install process during the deployment and make sure this is properly tested. In theory it should work today by switching the default for EnablePackageInstall[0] to true and configuring repositories, but this is something we need to verify. If anyone has any objections to this default switch, please let us know. Thanks, -Alex [0] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/tripleo-packages.yaml#L33-L36 ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] [nova][neutron] Queens PTG recap - nova/neutron
There were a few nova/neutron interactions at the PTG, one on Tuesday [1] and one on Thursday [2]. Priorities -- 1. Neutron port binding extension for live migration: This was discussed at the Ocata summit in Barcelona [3] and resulted in a Neutron spec [4] and API definition in Pike. The point of this is to shorten the amount of network downtime when switching ports between the source and destination hosts during a live migration. Neutron would provide a new port binding API extension and if available, Nova would use that to bind ports on both the source and destination hosts during live migration and switch which one is active during post-migration. We discussed if this should be dependent on os-vif object negotiation and agreed both efforts could be worked concurrently and then we'll see if we should merge them at the end, mostly to avoid having to redo a bunch of work if vif negotiation comes later. We also discussed if we should make the port binding changes on the Nova side depend on moving port orchestration to conductor [5] and again agreed to work those separately and see how the port binding code looks if it's just started in the nova-compute service, mainly since we don't have an owner for [5]. Sean Mooney said he could work on the Nova changes for this. The nova spec [6], started by John Garbutt in Ocata, would need to get updated for Queens. Miguel Lavalle will drive the changes in Neutron. 2. Using os-vif for port binding negotiation: Sean Mooney and Rodolfo Alonso already have some proof of concept code for this. We will want to get the gate-tempest-dsvm-nova-os-vif-ubuntu-xenial-nv job to be voting with any of this code. We also said we could work this concurrently with the port binding for live migration work above. 3. Bandwidth-based scheduling: this has a spec already and some work was done in Neutron in Pike. There are multiple interested parties in this feature. This will depend on getting nested resource providers done in Nova, really within the first milestone. Rodolfo owns this as well. Other discussion There were several other use cases discussed in both [1] and [2] but for the most part they have dependencies on other work, or they don't have specs/designs/PoC code, or they don't have owners. So we on the Nova side aren't going to be focusing on those other items. [1] https://etherpad.openstack.org/p/placement-nova-neutron-queens-ptg [2] https://etherpad.openstack.org/p/nova-ptg-queens [3] https://etherpad.openstack.org/p/ocata-nova-neutron-session [4] https://specs.openstack.org/openstack/neutron-specs/specs/pike/portbinding_information_for_nova.html [5] https://blueprints.launchpad.net/nova/+spec/prep-for-network-aware-scheduling-pike [6] https://review.openstack.org/#/c/375580/ -- Thanks, Matt ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] [nova][cinder] Queens PTG recap - nova/cinder
On Thursday morning at the PTG the Nova and Cinder teams got together to talk through some items. Details are in the etherpad [1]. Bug 1547142 [2] --- This is a long-standing bug where Nova does not terminate connections when shelve offloading an instance. There was some confusion when this was originally reported about whether or not calling os-terminate_connection would fix the issue for all backends. The Cinder team said it should, and if not it's a bug in the volume drivers in Cinder. So we went ahead and rebased the fix [3] which is merged and making its way through the stable backports now. This fixes old-style attachments. For the new style attachments which get enabled in [4] we'll also have to make sure that we create a new volume attachment to keep the volume reserved but delete the old attachments for the old host connector. New style volume attach dev update -- The Cinder team gave an overview of the work completed in Pike and what is on-going in Queens for enabling Nova to use new-style volume attachments in Cinder, which are based on the 3.27 and 3.44 Cinder API microversions. This was also a chance to merge some patches in the Queens series and give background to the review teams, mostly on the Nova side. There was general agreement to get the new-style attachment flows merged early in Queens so we can flush out bugs and start working on multi-attach support. We also said that we would not work on migrating old style attachments to new style in Queens. We don't plan on removing the old flows in Nova anytime soon, and once we do we can start talking about migrating data then. Volume multi-attach --- Most of the discussion here was around shared volume connections and how to model those out of the Cinder API so that Nova can know when it should perform a final disconnect_volume call on the host when detaching a volume. We agreed that Cinder needs a new API microversion to model this which we will then update [4] to rely on that new microversion before enabling new style attachments. We also talked about whether or not we should allow boot from volume with an existing multi-attach volume. We decided to allow this but disable it via default policy. So there will be a new policy rule in both Nova and Cinder: 1. Nova: add a policy rule, disabled by default, to allow boot from volume with a multi-attach volume. 2. Cinder: allow multi-attach volumes based on the storage backend support, allow multi-attach but only for read-only volumes, or disable creating multi-attach volumes altogether. I'm a bit fuzzy on the details here, but looking at the existing Cinder API code I don't see any policy checks for creating a multiattach volume at all, so this is probably something good to add anyway since not all Nova compute drivers are going to support multiattach volumes right away. Ildiko Vancsa is updating the nova spec for multi-attach support for Queens with the new details. Refreshing volume connection_info - This was based on a mailing list discussion [5] and the PTG discussion was already summarized in that thread [6]. Cinder ephemeral storage This was a rehash of the Boston forum discussion [7]. We agreed to work on both the short term and long term options here. The short-term option is adding an "is_bfv" attribute on flavors in Nova, which defaults to False, but if True would perform a simple boot from volume using the specified image and flavor disk details. Think of this like get-me-a-network but for boot from volume. Anything more detailed, like volume type, guest_format, disk_bus, ephemeral or swap disks, would have to be handled through the normal API usage we have today. Also, user-defined or image-defined block device mapping attributes in the request would supersede the flavor. The long-term option option is Nova having a Cinder imagebackend driver for ephemeral storage. Chet Burgess has started looking at this, and it was recommended to look at the ScaleIO imagebackend as a template since they both have to solve problems with non-local storage. The good news is a Cinder ephemeral imagebackend driver in Nova would not need to deal with image caching, since Cinder can do that for us. -- All in all I felt we had a really productive set of topics and discussions between the teams with everyone being on the same page and going the same direction, which is nice to see. Boring is good. [1] https://etherpad.openstack.org/p/cinder-ptg-queens [2] https://bugs.launchpad.net/nova/+bug/1547142 [3] https://review.openstack.org/257275 [4] https://review.openstack.org/#/c/330285/ [5] http://lists.openstack.org/pipermail/openstack-dev/2017-June/118040.html [6] http://lists.openstack.org/pipermail/openstack-dev/2017-September/122170.html [7]
[Openstack-operators] [nova] Queens PTG recap - placement
Placement related items came up a lot at the Queens PTG. Some on Tuesday [1], some on Wednesday [2], some on Thursday [3] and some on Friday [4]. Priorities for Queens - The priorities for placement/scheduler related items in Queens are: 1. Migration allocations [5] - we realized late in Pike that the way we were tracking allocations across source and dest nodes during a move operation (cold migrate, live migrate, resize, evacuate) was confusing and error prone, and we had to "double up" allocations for the instance during the move. The idea here is to simplify the resource allocation modeling during a move operation by having the migration record be a consumer of resource allocations during the move, so we can keep the source/dest node allocations separate using the instance/migration records. This is mostly internal technical debt reduction and to simplify our accounting which should mean fewer bugs. 2. Alternate hosts - this is the work to have the scheduler determine a set of alternative hosts for reschedules. This is important for cells v2 where the cell conductor and nova-compute services can't reach the API database or scheduler, so reschedules need to happen within the cell given a list of pre-determined hosts chosen by the scheduler at the top. Ed Leafe has already started on some of this [6]. 3. Nested resource providers [7] - this has been around for awhile now but hasn't had the proper reviewer focus due to other priorities. We are making this a priority in Queens as it enables a lot of other use cases like bandwidth-aware scheduling and being able to eventually remove major chunks of the claims code in the ResourceTracker in the compute service. We agreed that in Queens we want to try and keep the scope of this small and focus on being able to model a simple SR-IOV PF/VF relationship. Modeling NUMA use cases will be post-Queens. We will need quite a bit of work on functional testing done along with this so that we have some fixtures and/or fake virt drivers in place to model things like CPU pinning, huge pages, NUMA, SR-IOV, etc which also verify allocations in Placement to know we are doing things correctly from the client perspective, similar to the functional tests added for verifying allocations during move operations in Pike. General device management - This was a more forward looking discussion and the notes are in the etherpad [3]. This is not really slated for Queens work except to make sure that things we do in Queens don't limit what we can do for generically managing devices later, and is tied heavily to the nested resource providers work. Other discussion Traits - supporting required traits in a flavor is on-going and the spec is here [8]. Shared storage providers [9] - we have decided to defer working on this from Queens given other priorities. Modeling move allocations with migration records should help here though. Modeling distance for (anti-)affinity use cases - this is being deferred from Queens. There are workarounds when running with multiple cells. Limits and ordering in Placement - Chris Dent has proposed a spec [10] so that we can limit the size of a response when getting resource providers from Placement during scheduling and also optionally configure the behavior of how Placement orders the returned set, so you can pack or spread possible build candidates. OSC plugin - I'm trying to push this work forward. We have the plugin installed with devstack now and a functional CI job for the repo but need to move some of the patches forward that add the CLI functionality. There was lots of other random stuff in [2] and [4] but for the most part are not prioritized, spec'ed out or have a clear owner, so those are not really getting attention for Queens. [1] https://etherpad.openstack.org/p/placement-nova-neutron-queens-ptg [2] https://etherpad.openstack.org/p/nova-ptg-queens-placement [3] https://etherpad.openstack.org/p/nova-ptg-queens-generic-device-management [4] https://etherpad.openstack.org/p/nova-ptg-queens [5] https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/migration-allocations.html [6] https://review.openstack.org/#/c/498830/ [7] https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/nested-resource-providers.html [8] https://review.openstack.org/#/c/468797/ [9] https://bugs.launchpad.net/nova/+bug/1707256 [10] https://review.openstack.org/#/c/504540/ -- Thanks, Matt ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [Openstack] MTU on Provider Networks
Great! Thank you both for the information. John Petrini ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators