[Openstack-operators] [nova] Queens PTG recap - everything else

2017-09-18 Thread Matt Riedemann
There was a whole lot of other stuff discussed at the PTG. The details 
are in [1]. I won't go into everything here, so I'm just highlighting 
some of the more concrete items that had owners or TODOs.


Ironic
--

The Ironic team came over on Wednesday afternoon. We talked a bit, had 
some laughs, it was a good time. Since I don't speak fluent baremetal, 
Dmitry Tantsur is going to recap those discussions in the mailing list. 
Thanks again, Dmitry.


Privsep
---

Michael Still has been going hog wild converting the nova libvirt driver 
code to use privsep instead of rootwrap. He has a series of changes 
tracked under this blueprint [2]. Most of the discussion was a refresh 
on privsep and a recap of what's already been merged and some discussion 
on outstanding patches. The goal for Queens is to get the entire libvirt 
driver converted and also try to get all of nova-compute converted, but 
we want to limit that to getting things merged early in the release to 
flush out bugs since a lot of these are weird, possibly untested code 
paths. There was also discussion of a kind of privsep heartbeat daemon 
to tell if it's running (even though it's not a separate service) but 
this is complicated and is not something we'll pursue for Queens.


Websockify security proxy framework
---

This is a long-standing security hardening feature [3] which has changed 
hands a few times and hasn't gotten much review. Sean Dague and Melanie 
Witt agreed to focus on reviewing this for Queens.


Certificate validation
--

This is another item that's been discussed since at least the Ocata 
summit but hasn't made much progress. Sean Dague agreed to help review 
this, and Eric Fried said he knew someone that could help review the 
security aspects of this change. Sean also suggested scheduling a 
hangout so the John Hopkins University team working on this can give a 
primer on the feature and what to look out for during review. We also 
suggested getting a scenario test written for this in the barbican 
tempest plugin, which runs as an experimental queue job for nova.


Notifications
-

Given the state of the Searchlight project and how we don't plan on 
using Searchlight as a global proxy for the compute REST API, we are not 
going to work on parity with versioned notifications there. There are 
some cleanups we still need to do in Nova for versioned notifications 
from a performance perspective. We also agreed that we aren't going to 
consider deprecating legacy unversioned notifications until we have 
parity with the versioned notifications, especially given legacy 
unversioned notification consumers have not yet moved to using the 
versioned notifications.


vGPU support


This depends on nested resource providers (like lots of other things). 
It was not clear from the discussion if this is static or dynamic 
support, e.g. can we hot plug vGPUs using Cyborg? I assume we will not 
support hot plugging at first. We also need improved functional testing 
of this space before we can make big changes.


Preemptible (spot) instances
-

This was continuing the discussion from the Boston forum session [5]. 
The major issue in Nova is that we don't want Nova to be in charge of 
orchestrating preempting instances when a request comes in for a "paid" 
instance. We agreed to start small where you can't burst over quota. 
Blazar also delivered some reservation features in Pike [6] which sound 
like they can be built on here, which also sound like expiration 
policies. Someone will have to prototype an external (to nova) "reaper" 
which will cull the preemptible instances based on some configurable 
policy. Honestly the notes here are confusing so we're going to need 
someone to drive this forward. That might mean picking up John Garbutt's 
draft spec for this (link not available right now).


Driver updates
--

Various teams from IBM gave updates on plans for their drivers in Queens.

PowerVM (in tree): the team is proposing a few more capabilities to the 
driver in Queens. Details are in the spec [7].


zDPM (out of tree): this out of tree driver has had two releases (ocata 
and pike) and is working on 3rd party CI. One issue they have with 
Tempest is they can only boot from volume.


zVM (out of tree): the team is working on refactoring some code into a 
library, similar to os-xenapi, os-powervm and oslo.vmware. They have CI 
running but are not yet reporting against nova changes.


Endpoint discovery
--

This is carry-over work from Ocata and Pike to standardize how Nova does 
endpoint discovery with other services, like 
keystone/placement/cinder/glance/neutron/ironic/barbican. The spec is 
here [8]. The dependent keystoneauth1 changes were released in Pike so 
we should be able to make quick progress on this early in Queens to 
flush out bugs.


Documentation
-

We talked about the 

Re: [Openstack-operators] [tripleo] Making containerized service deployment the default

2017-09-18 Thread Mohammed Naser
On Mon, Sep 18, 2017 at 3:04 PM, Alex Schultz  wrote:
> Hey ops & devs,
>
> We talked about containers extensively at the PTG and one of the items
> that needs to be addressed is that currently we still deploy the
> services as bare metal services via puppet. For Queens we would like
> to switch the default to be containerized services.  With this switch
> we would also start the deprecation process for deploying services as
> bare metal services via puppet.  We still execute the puppet
> configuration as part of the container configuration process so the
> code will continue to be leveraged but we would be investing more in
> the continual CI of the containerized deployments and reducing the
> traditional scenario coverage.
>
> As we switch over to containerized services by default, we would also
> begin to reduce installed software on the overcloud images that we
> currently use.  We have an open item to better understand how we can
> switch away from the golden images to a traditional software install
> process during the deployment and make sure this is properly tested.
> In theory it should work today by switching the default for
> EnablePackageInstall[0] to true and configuring repositories, but this
> is something we need to verify.
>
> If anyone has any objections to this default switch, please let us know.

I think this is a great initiative.  It would be nice to share some of
the TripleO experience in containerized deployments so that we can use
Puppet for containerized deployments.  Perhaps we can work together on
adding some classes which can help deploy and configure containerized
services with Puppet.

>
> Thanks,
> -Alex
>
> [0] 
> https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/tripleo-packages.yaml#L33-L36
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [publiccloud-wg] Extra meeting PublicCloudWorkingGroup

2017-09-18 Thread Tobias Rydberg

Hi everyone,

We will have an "extra" meeting on Wednesday at 1400 UTC in 
#openstack-publiccloud


Main purpose for this extra meeting will be to finalize the agenda for 
the meetup in London next week.


Agenda and etherpad: https://etherpad.openstack.org/p/publiccloud-wg
Meetup etherpad: 
https://etherpad.openstack.org/p/MEETUPS-2017-publiccloud-wg


Regards,
Tobias
Co-chair PublicCloud WG



smime.p7s
Description: S/MIME Cryptographic Signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [tripleo] Making containerized service deployment the default

2017-09-18 Thread Alex Schultz
Hey ops & devs,

We talked about containers extensively at the PTG and one of the items
that needs to be addressed is that currently we still deploy the
services as bare metal services via puppet. For Queens we would like
to switch the default to be containerized services.  With this switch
we would also start the deprecation process for deploying services as
bare metal services via puppet.  We still execute the puppet
configuration as part of the container configuration process so the
code will continue to be leveraged but we would be investing more in
the continual CI of the containerized deployments and reducing the
traditional scenario coverage.

As we switch over to containerized services by default, we would also
begin to reduce installed software on the overcloud images that we
currently use.  We have an open item to better understand how we can
switch away from the golden images to a traditional software install
process during the deployment and make sure this is properly tested.
In theory it should work today by switching the default for
EnablePackageInstall[0] to true and configuring repositories, but this
is something we need to verify.

If anyone has any objections to this default switch, please let us know.

Thanks,
-Alex

[0] 
https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/tripleo-packages.yaml#L33-L36

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][neutron] Queens PTG recap - nova/neutron

2017-09-18 Thread Matt Riedemann
There were a few nova/neutron interactions at the PTG, one on Tuesday 
[1] and one on Thursday [2].


Priorities
--

1. Neutron port binding extension for live migration: This was discussed 
at the Ocata summit in Barcelona [3] and resulted in a Neutron spec [4] 
and API definition in Pike. The point of this is to shorten the amount 
of network downtime when switching ports between the source and 
destination hosts during a live migration. Neutron would provide a new 
port binding API extension and if available, Nova would use that to bind 
ports on both the source and destination hosts during live migration and 
switch which one is active during post-migration. We discussed if this 
should be dependent on os-vif object negotiation and agreed both efforts 
could be worked concurrently and then we'll see if we should merge them 
at the end, mostly to avoid having to redo a bunch of work if vif 
negotiation comes later. We also discussed if we should make the port 
binding changes on the Nova side depend on moving port orchestration to 
conductor [5] and again agreed to work those separately and see how the 
port binding code looks if it's just started in the nova-compute 
service, mainly since we don't have an owner for [5]. Sean Mooney said 
he could work on the Nova changes for this. The nova spec [6], started 
by John Garbutt in Ocata, would need to get updated for Queens. Miguel 
Lavalle will drive the changes in Neutron.


2. Using os-vif for port binding negotiation: Sean Mooney and Rodolfo 
Alonso already have some proof of concept code for this. We will want to 
get the gate-tempest-dsvm-nova-os-vif-ubuntu-xenial-nv job to be voting 
with any of this code. We also said we could work this concurrently with 
the port binding for live migration work above.


3. Bandwidth-based scheduling: this has a spec already and some work was 
done in Neutron in Pike. There are multiple interested parties in this 
feature. This will depend on getting nested resource providers done in 
Nova, really within the first milestone. Rodolfo owns this as well.


Other discussion


There were several other use cases discussed in both [1] and [2] but for 
the most part they have dependencies on other work, or they don't have 
specs/designs/PoC code, or they don't have owners. So we on the Nova 
side aren't going to be focusing on those other items.


[1] https://etherpad.openstack.org/p/placement-nova-neutron-queens-ptg
[2] https://etherpad.openstack.org/p/nova-ptg-queens
[3] https://etherpad.openstack.org/p/ocata-nova-neutron-session
[4] 
https://specs.openstack.org/openstack/neutron-specs/specs/pike/portbinding_information_for_nova.html
[5] 
https://blueprints.launchpad.net/nova/+spec/prep-for-network-aware-scheduling-pike

[6] https://review.openstack.org/#/c/375580/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [nova][cinder] Queens PTG recap - nova/cinder

2017-09-18 Thread Matt Riedemann
On Thursday morning at the PTG the Nova and Cinder teams got together to 
talk through some items. Details are in the etherpad [1].


Bug 1547142 [2]
---

This is a long-standing bug where Nova does not terminate connections 
when shelve offloading an instance. There was some confusion when this 
was originally reported about whether or not calling 
os-terminate_connection would fix the issue for all backends. The Cinder 
team said it should, and if not it's a bug in the volume drivers in 
Cinder. So we went ahead and rebased the fix [3] which is merged and 
making its way through the stable backports now. This fixes old-style 
attachments. For the new style attachments which get enabled in [4] 
we'll also have to make sure that we create a new volume attachment to 
keep the volume reserved but delete the old attachments for the old host 
connector.


New style volume attach dev update
--

The Cinder team gave an overview of the work completed in Pike and what 
is on-going in Queens for enabling Nova to use new-style volume 
attachments in Cinder, which are based on the 3.27 and 3.44 Cinder API 
microversions. This was also a chance to merge some patches in the 
Queens series and give background to the review teams, mostly on the 
Nova side.


There was general agreement to get the new-style attachment flows merged 
early in Queens so we can flush out bugs and start working on 
multi-attach support.


We also said that we would not work on migrating old style attachments 
to new style in Queens. We don't plan on removing the old flows in Nova 
anytime soon, and once we do we can start talking about migrating data then.


Volume multi-attach
---

Most of the discussion here was around shared volume connections and how 
to model those out of the Cinder API so that Nova can know when it 
should perform a final disconnect_volume call on the host when detaching 
a volume. We agreed that Cinder needs a new API microversion to model 
this which we will then update [4] to rely on that new microversion 
before enabling new style attachments.


We also talked about whether or not we should allow boot from volume 
with an existing multi-attach volume. We decided to allow this but 
disable it via default policy. So there will be a new policy rule in 
both Nova and Cinder:


1. Nova: add a policy rule, disabled by default, to allow boot from 
volume with a multi-attach volume.


2. Cinder: allow multi-attach volumes based on the storage backend 
support, allow multi-attach but only for read-only volumes, or disable 
creating multi-attach volumes altogether. I'm a bit fuzzy on the details 
here, but looking at the existing Cinder API code I don't see any policy 
checks for creating a multiattach volume at all, so this is probably 
something good to add anyway since not all Nova compute drivers are 
going to support multiattach volumes right away.


Ildiko Vancsa is updating the nova spec for multi-attach support for 
Queens with the new details.


Refreshing volume connection_info
-

This was based on a mailing list discussion [5] and the PTG discussion 
was already summarized in that thread [6].


Cinder ephemeral storage


This was a rehash of the Boston forum discussion [7]. We agreed to work 
on both the short term and long term options here.


The short-term option is adding an "is_bfv" attribute on flavors in 
Nova, which defaults to False, but if True would perform a simple boot 
from volume using the specified image and flavor disk details. Think of 
this like get-me-a-network but for boot from volume. Anything more 
detailed, like volume type, guest_format, disk_bus, ephemeral or swap 
disks, would have to be handled through the normal API usage we have 
today. Also, user-defined or image-defined block device mapping 
attributes in the request would supersede the flavor.


The long-term option option is Nova having a Cinder imagebackend driver 
for ephemeral storage. Chet Burgess has started looking at this, and it 
was recommended to look at the ScaleIO imagebackend as a template since 
they both have to solve problems with non-local storage. The good news 
is a Cinder ephemeral imagebackend driver in Nova would not need to deal 
with image caching, since Cinder can do that for us.


--

All in all I felt we had a really productive set of topics and 
discussions between the teams with everyone being on the same page and 
going the same direction, which is nice to see. Boring is good.


[1] https://etherpad.openstack.org/p/cinder-ptg-queens
[2] https://bugs.launchpad.net/nova/+bug/1547142
[3] https://review.openstack.org/257275
[4] https://review.openstack.org/#/c/330285/
[5] http://lists.openstack.org/pipermail/openstack-dev/2017-June/118040.html
[6] 
http://lists.openstack.org/pipermail/openstack-dev/2017-September/122170.html

[7] 

[Openstack-operators] [nova] Queens PTG recap - placement

2017-09-18 Thread Matt Riedemann
Placement related items came up a lot at the Queens PTG. Some on Tuesday 
[1], some on Wednesday [2], some on Thursday [3] and some on Friday [4].


Priorities for Queens
-

The priorities for placement/scheduler related items in Queens are:

1. Migration allocations [5] - we realized late in Pike that the way we 
were tracking allocations across source and dest nodes during a move 
operation (cold migrate, live migrate, resize, evacuate) was confusing 
and error prone, and we had to "double up" allocations for the instance 
during the move. The idea here is to simplify the resource allocation 
modeling during a move operation by having the migration record be a 
consumer of resource allocations during the move, so we can keep the 
source/dest node allocations separate using the instance/migration 
records. This is mostly internal technical debt reduction and to 
simplify our accounting which should mean fewer bugs.


2. Alternate hosts - this is the work to have the scheduler determine a 
set of alternative hosts for reschedules. This is important for cells v2 
where the cell conductor and nova-compute services can't reach the API 
database or scheduler, so reschedules need to happen within the cell 
given a list of pre-determined hosts chosen by the scheduler at the top. 
Ed Leafe has already started on some of this [6].


3. Nested resource providers [7] - this has been around for awhile now 
but hasn't had the proper reviewer focus due to other priorities. We are 
making this a priority in Queens as it enables a lot of other use cases 
like bandwidth-aware scheduling and being able to eventually remove 
major chunks of the claims code in the ResourceTracker in the compute 
service. We agreed that in Queens we want to try and keep the scope of 
this small and focus on being able to model a simple SR-IOV PF/VF 
relationship. Modeling NUMA use cases will be post-Queens. We will need 
quite a bit of work on functional testing done along with this so that 
we have some fixtures and/or fake virt drivers in place to model things 
like CPU pinning, huge pages, NUMA, SR-IOV, etc which also verify 
allocations in Placement to know we are doing things correctly from the 
client perspective, similar to the functional tests added for verifying 
allocations during move operations in Pike.


General device management
-

This was a more forward looking discussion and the notes are in the 
etherpad [3]. This is not really slated for Queens work except to make 
sure that things we do in Queens don't limit what we can do for 
generically managing devices later, and is tied heavily to the nested 
resource providers work.


Other discussion


Traits - supporting required traits in a flavor is on-going and the spec 
is here [8].


Shared storage providers [9] - we have decided to defer working on this 
from Queens given other priorities. Modeling move allocations with 
migration records should help here though.


Modeling distance for (anti-)affinity use cases - this is being deferred 
from Queens. There are workarounds when running with multiple cells.


Limits and ordering in Placement - Chris Dent has proposed a spec [10] 
so that we can limit the size of a response when getting resource 
providers from Placement during scheduling and also optionally configure 
the behavior of how Placement orders the returned set, so you can pack 
or spread possible build candidates.


OSC plugin - I'm trying to push this work forward. We have the plugin 
installed with devstack now and a functional CI job for the repo but 
need to move some of the patches forward that add the CLI functionality.


There was lots of other random stuff in [2] and [4] but for the most 
part are not prioritized, spec'ed out or have a clear owner, so those 
are not really getting attention for Queens.


[1] https://etherpad.openstack.org/p/placement-nova-neutron-queens-ptg
[2] https://etherpad.openstack.org/p/nova-ptg-queens-placement
[3] 
https://etherpad.openstack.org/p/nova-ptg-queens-generic-device-management

[4] https://etherpad.openstack.org/p/nova-ptg-queens
[5] 
https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/migration-allocations.html

[6] https://review.openstack.org/#/c/498830/
[7] 
https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/nested-resource-providers.html

[8] https://review.openstack.org/#/c/468797/
[9] https://bugs.launchpad.net/nova/+bug/1707256
[10] https://review.openstack.org/#/c/504540/

--

Thanks,

Matt

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] MTU on Provider Networks

2017-09-18 Thread John Petrini
Great! Thank you both for the information.

John Petrini
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators