[openstack-dev] [gate] [cinder] A current major cause for gate failure - cinder backups

2016-08-23 Thread Sean Dague
The gate is in a bad state, as people may have noticed. We're only at a 
50% characterization for integrated-gate right now - 
http://status.openstack.org/elastic-recheck/data/integrated_gate.html 
which means there are a lot of unknown bugs in there.


Spot checking one job - gate-tempest-dsvm-postgres-full-ubuntu-xenial - 
6 of the 7 fails were failure of cinder backup - 
http://logs.openstack.org/92/355392/4/gate/gate-tempest-dsvm-postgres-full-ubuntu-xenial/582fbd7/console.html#_2016-08-17_04_55_24_109972 
- though they were often different tests.


With the current state of privsep logging (hundreds of lines at warn 
level) it is making it difficult for me to narrow this down further. I 
do suspect this might be another concurrency shake out from os-brick, so 
it probably needs folks familiar to go through logs with a fine toothed 
comb to get to root cause. If anyone can jump on that, it would be great.


This is probably not the only big new issue, but it seems like a pretty 
concrete one that solving would help drop out merge window (which is 16 
hours).


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Hold off on pushing new patches for config option cleanup

2016-08-22 Thread Sean Dague

On 08/22/2016 12:10 AM, Michael Still wrote:

So, if this is about preserving CI time, then its cool for me to merge
these on a US Sunday when the gate is otherwise idle, right?


yes.



Michael

On Fri, Aug 19, 2016 at 7:02 AM, Sean Dague mailto:s...@dague.net>> wrote:

On 08/18/2016 04:46 PM, Michael Still wrote:
> We're still ok with merging existing ones though?

Mostly we'd like to conserve the CI time now. It's about 14.5 node-hours
to run CI on these patches (probably on about 9 node-hours in the gate).
With ~800 nodes every patch represents 1.5% of our CI resources (per
hour) to pass through. There are a ton of these patches up there, so
even just landing gating ones consumes resources that could go towards
other more critical fixes / features.

I think the theory is these are fine to merge post freeze / milestone 3,
when the CI should have cooled down a bit and there is more head room.

    -Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>




--
Rackspace Australia


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Nova mascot - nominations and voting

2016-08-19 Thread Sean Dague
On 08/11/2016 10:14 AM, Sean Dague wrote:
> 
> So... I overstepped here and jumped to a conclusion based on an
> incorrect understanding of people's sentiments. And there has been some
> concern expressed that part of this conversation was private, which is a
> valid concern. I'm sorry about all of that.
> 
> Let's start afresh...
> 
> What's been publicly suggested so far (from all ML posts that seem to
> contain a suggestion):
> 
> ant - alexis (already chosen by infra)
> bee - alexis (already chosen by refstack)
> star - heidi
> supernova - markus, auggy, bob ball
> octopus - chris (already chosen by UX)
> 
> I'd suggest that we actually combine star/supernova into one item to
> give the graphic designers some flexibility and creativity. With less
> distinctive features than animals, the freedom is probably needed to
> make something cool.
> 
> Are there other suggestions? Those items already chosen by other teams
> are out of bounds per the FAQ
> (http://www.openstack.org/project-mascots). We can leave this open for
> the rest of the week, and if there are additional valid options do an
> ATC poll next week.

There were no additional suggestions over the last week, and the only
valid options (not taken by other teams) were the star/supernova that
was suggested.

So we've got our mascot, thanks folks.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Let's drop the postgresql gate job

2016-08-18 Thread Sean Dague
On 08/18/2016 02:22 PM, Sean Dague wrote:
> On 08/18/2016 11:00 AM, Matt Riedemann wrote:
>> It's that time of year again to talk about killing this job, at least
>> from the integrated gate (move it to experimental for people that care
>> about postgresql, or make it gating on a smaller subset of projects like
>> oslo.db).
>>
>> The postgresql job used to have three interesting things about it:
>>
>> 1. It ran keystone with eventlet (which is no longer a thing).
>> 2. It runs the n-api-meta service rather than using config drive.
>> 3. It uses postgresql for the database.
>>
>> So #1 is gone, and for #3, according to the April 2016 user survey (page
>> 40) [1], 4% of reporting deployments are using it in production.
>>
>> I don't think we're running n-api-meta in any other integrated gate
>> jobs, but I'm pretty sure there is at least one neutron job out there
>> that's running with it that way. We could also consider making the
>> nova-net dsvm full gate job run n-api-meta, or vice-versa with the
>> neutron dsvm full gate job.
>>
>> We also have to consider that with HP public cloud being gone as a node
>> provider and we've got fewer test nodes to run with, we have to make
>> tough decisions about which jobs we're going to run in the integrated gate.
>>
>> I'm bringing this up again because Nova has a few more jobs it would
>> like to make voting on it's repo (neutron LB and live migration, at
>> least in the check queue) but there are concerns about adding yet more
>> jobs that each change has to get through before it's merged, which means
>> if anything goes wrong in any of those we can have a 24 hour turnaround
>> on getting an approved change back through the gate.
>>
>> [1]
>> https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf
> 
> +1.
> 
> Postgresql in the gate hasn't provided any real value in a long time
> (tempest just really can't tickle the differences between the dbs,
> especially as projects put much better input validation in place).
> During icehouse the job was even accidentally running mysql for 6 weeks,
> and no one noticed.

Devstack Default change proposed - https://review.openstack.org/#/c/357446/
Devstack Gate default change proposed -
https://review.openstack.org/#/c/357443/
Project-config change proposed - https://review.openstack.org/#/c/357444

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Hold off on pushing new patches for config option cleanup

2016-08-18 Thread Sean Dague
On 08/18/2016 04:46 PM, Michael Still wrote:
> We're still ok with merging existing ones though?

Mostly we'd like to conserve the CI time now. It's about 14.5 node-hours
to run CI on these patches (probably on about 9 node-hours in the gate).
With ~800 nodes every patch represents 1.5% of our CI resources (per
hour) to pass through. There are a ton of these patches up there, so
even just landing gating ones consumes resources that could go towards
other more critical fixes / features.

I think the theory is these are fine to merge post freeze / milestone 3,
when the CI should have cooled down a bit and there is more head room.

    -Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Let's drop the postgresql gate job

2016-08-18 Thread Sean Dague
On 08/18/2016 03:31 PM, Matthew Thode wrote:
> On 08/18/2016 01:50 PM, Matt Riedemann wrote:
>> On 8/18/2016 1:18 PM, Matthew Thode wrote:
>>> Perhaps a better option would be to get oslo.db to run cross-project
>>> checks like we do in requirements.  That way the right team is covering
>>> the usage of postgres and we still have coverage while still lowering
>>> gate load for most projects.
>>>
>>>
>>>
>>> __
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> I don't see the value in this unless there are projects that have
>> pg-specific code in them. The reason we have cross-project unit test
>> jobs for reqs changes is requirements changes in upper-constraints can
>> break and wedge the gate for a project, or multiple project. E.g.
>> removing something in a backward incompatible way, or the project with
>> the unit test is mocking something out poorly (like we've seen lately
>> with nova and python-neutronclient releases).
>>
> 
> That makes sense, just improving the oslo.db test coverage for postgres
> (if that's even necessary) would be good.  The only other thing I'd like
> to see (and it may already be done) is to have pg upgrade test coverage,
> aka, I don't want to hit that keystone bug again :P  But that's a
> different conversation.

That's entirely doable inside the project. We do that in nova in our
unit tests. The important thing there is to not just run the schema
upgrades, but do them with some representative data in the tables.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Let's drop the postgresql gate job

2016-08-18 Thread Sean Dague
On 08/18/2016 11:00 AM, Matt Riedemann wrote:
> It's that time of year again to talk about killing this job, at least
> from the integrated gate (move it to experimental for people that care
> about postgresql, or make it gating on a smaller subset of projects like
> oslo.db).
> 
> The postgresql job used to have three interesting things about it:
> 
> 1. It ran keystone with eventlet (which is no longer a thing).
> 2. It runs the n-api-meta service rather than using config drive.
> 3. It uses postgresql for the database.
> 
> So #1 is gone, and for #3, according to the April 2016 user survey (page
> 40) [1], 4% of reporting deployments are using it in production.
> 
> I don't think we're running n-api-meta in any other integrated gate
> jobs, but I'm pretty sure there is at least one neutron job out there
> that's running with it that way. We could also consider making the
> nova-net dsvm full gate job run n-api-meta, or vice-versa with the
> neutron dsvm full gate job.
> 
> We also have to consider that with HP public cloud being gone as a node
> provider and we've got fewer test nodes to run with, we have to make
> tough decisions about which jobs we're going to run in the integrated gate.
> 
> I'm bringing this up again because Nova has a few more jobs it would
> like to make voting on it's repo (neutron LB and live migration, at
> least in the check queue) but there are concerns about adding yet more
> jobs that each change has to get through before it's merged, which means
> if anything goes wrong in any of those we can have a 24 hour turnaround
> on getting an approved change back through the gate.
> 
> [1]
> https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf

+1.

Postgresql in the gate hasn't provided any real value in a long time
(tempest just really can't tickle the differences between the dbs,
especially as projects put much better input validation in place).
During icehouse the job was even accidentally running mysql for 6 weeks,
and no one noticed.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] versioning the api-ref?

2016-08-18 Thread Sean Dague
On 08/18/2016 11:57 AM, Nikhil Komawar wrote:
> I guess the intent was to indicate the need for indicating the micro or
> in case of Glance minor version bump when required.
> 
> The API isn't drastically different, there are new and old elements as
> shown in the Nova api ref linked.

Right, so the point is that it should all be describable in a single
document. It's like the fact that when you go to python API docs you get
things like - https://docs.python.org/2/library/wsgiref.html

"New in version 2.5."

Perhaps if there is a concrete example of the expected differences
between what would be in the mitaka tree vs. newton tree was can figure
out an appropriate way to express that in api-ref.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Let's clean up APi reference

2016-08-18 Thread Sean Dague
On 08/18/2016 08:16 AM, Akihiro Motoki wrote:
> Hi Neutron team,
> 
> As you may know, the OpenStack API references have been moved into
> individual project repositories, but it contains a lot of wrong
> information now :-(
> 
> Let's clean up API reference.
> It's now time to start the clean up and finish the cleanup by Newton-1.
> 
> I prepared the etherpad page to share the guideline of the cleanup and
> useful information.
> This page shares my experience of 'router' resource cleanup.
> 
> https://etherpad.openstack.org/p/neutron-api-ref-sprint
> 
> I hope everyone work on at least one resource :)
> The etherpad page has the progress tracking section (bottom of the page)
> Make sure to add your name when you start to work.
> 
> Feel free to ask me if you have a question.
> 
> Thanks,
> Akihiro

Fwiw, I built a burndown dashboard for this with Nova -
http://burndown.dague.org/ (source -
https://github.com/sdague/rst-burndown). It should be reasonably
adaptable to other projects if you have a host to run it on.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] How do I get devstack with nova-network now?

2016-08-17 Thread Sean Dague
On 08/16/2016 10:56 PM, Matt Riedemann wrote:
> On 8/16/2016 9:52 PM, Matt Riedemann wrote:
>> My nova-net local.conf isn't working anymore apparently, neutron is
>> still getting installed and run rather than nova-network even though I
>> have this in my local.conf:
>>
>> stack@novanet:~$ cat devstack/local.conf  | grep enable_service
>> enable_service tempest
>> enable_service n-net
>> #enable_service q-svc
>> #enable_service q-agt
>> #enable_service q-dhcp
>> #enable_service q-l3
>> #enable_service q-meta
>> #enable_service q-lbaas
>> #enable_service q-lbaasv2
>>
>> This guide tells me about the default networking now:
>>
>> http://docs.openstack.org/developer/devstack/networking.html
>>
>> But doesn't tell me how to get nova-network running instead.
>>
>> It's also nearly my bedtime and I'm being lazy, so figured I'd post this
>> if for nothing else a heads up that people's local configs for
>> nova-network might no longer work with neutron being the default.
>>
> 
> Looks like I just have to be explicit about disabling the neutron
> services and enabling the n-net service:
> 
> enable_service n-net
> disable_service q-svc
> disable_service q-agt
> disable_service q-dhcp
> disable_service q-l3
> disable_service q-meta

Yes, a bunch of network manipulation happens behind the scenes if the
neutron services are running.

I *specifically* left out running nova-net from the networking section
for devstack. If you want to provide a patch that's cool, but figured in
it's deprecated state it's better to not even tell people about it.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc][ptl] establishing project-wide goals

2016-08-16 Thread Sean Dague
On 08/16/2016 05:36 AM, Thierry Carrez wrote:
> John Dickinson wrote:
>>>> Excerpts from John Dickinson's message of 2016-08-12 16:04:42 -0700:
>>>>> [...]
>>>>> The proposed plan has a lot of good in it, and I'm really happy to see 
>>>>> the TC
>>>>> working to bring common goals and vision to the entirety of the OpenStack
>>>>> community. Drop the "project teams are expected to prioritize these goals 
>>>>> above
>>>>> all other work", and my concerns evaporate. I'd be happy to agree to that 
>>>>> proposal.
>>>>
>>>> Saying that the community has goals but that no one is expected to
>>>> act to bring them about would be a meaningless waste of time and
>>>> energy.
>>>
>>> I think we can find wording that doesn't use the word "priority" (which
>>> is, I think, what John objects to the most) while still conveying that
>>> project teams are expected to act to bring them about (once they said
>>> they agreed with the goal).
>>>
>>> How about "project teams are expected to do everything they can to
>>> complete those goals within the boundaries of the target development
>>> cycle" ? Would that sound better ?
>>
>> Any chance you'd go for something like "project teams are expected to
>> make progress on these goals and report that progress to the TC every
>> cycle"?
> 
> The issue with this variant is that it removes the direct link between
> the goal and the development cycle. One of the goals of these goals
> (arh) is that we should be able to collectively complete them in a given
> timeframe, so that there is focus at the same time and we have a good
> story to show at the end. Those goals are smallish development cycle
> goals. They are specifically chosen to be complete-able within a cycle
> and with a clear definition of "done". It's what differentiates them
> from more traditional cross-project specs or strategic initiatives which
> can be more long-term (and on which "reporting progress to the TC every
> cycle" could be an option).

So, I think that's ok. But it's going to cause a least common
denominator of doable. For instance, python 3.5 is probably not doable
in Nova in a cycle. And the biggest issue is really not python 3.5 per
say, but our backlog of mox based unit tests (over a thousand), which
we've experienced are unreliable in odd ways on python3. They also tend
to be the oldest unit tests (we stopped letting people add new ones 2
years ago), in areas of the code that have a lower rate of change, and
folks are less familiar with (like the xenserver driver which is used by
very few folks).

So, those are getting tackled, but there is a lot there, and it will
take a while. (Note: this is one of the reasons I suggested the next
step with python3 be full stack testing, because I think we could
actually get Nova working there well in advance of the unit tests
ported, for the above issue. That however requires someone to take on
the work for full stack python3 setup and maintenance.)

Maybe this process also can expose "we're going to need help to get
there" for some of these goals.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] locking concern with os-brick

2016-08-15 Thread Sean Dague
On 08/14/2016 06:23 PM, Patrick East wrote:

> I like the sound of a more unified way to interact with compute node
> services. Having a standardized approach for inter-service
> synchronization for controlling system resources would be sweet (even if
> it is just a more sane way of using local file locks). Anyone know if
> there is existing work in this area we can build off of? Or is the path
> forward a new cross-project spec to try and lock down some requirements,
> use-cases, etc.?
> 
> As far as spending time to hack together solutions via the config
> settings for this.. we'll its pretty minimal wrt size of effort compared
> to solving the large issue. Don't get me wrong though, I'm a fan of
> doing both in parallel. Even if we have resources jump on board
> immediately I'm not convinced we have a great chance to "fix" this for N
> in a more elegant fashion, much less any of the older releases affected
> by this. That leads me to believe we still need the shared config
> setting for at least a little while in Devstack, and documentation for
> existing deployments or ones going up with N.

We were talking through some of the implications of this change in
#openstack-nova, and the following further concerns came out.

1) Unix permissions for services in distros

Both Ubuntu and RHEL have a dedicated service user per service. Nova
services run under nova user, cinder services under cinder. For those
services to share a lock path you need to do more than share the path.

You must also put both services in a group. Make the lockpath group
writable, and ensure all lockfiles get written with g+w permissions
(potentially overriding default system umask to get there).

2) Services in containers

For people pushing towards putting services in containers, you'd need to
do all sorts of additional work to make this lock path actually a shared
construct between 2 containers.


These are both pretty problematic changes for the entire deploy space
without good answers.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [cinder] [nova] locking concern with os-brick

2016-08-15 Thread Sean Dague
On 08/13/2016 06:07 PM, Matt Riedemann wrote:

> 
> I checked a tempest-dsvm CI run upstream and we don't follow this
> recommendation for our own CI on all changes in OpenStack, so before we
> make this note in the release notes, I'd like to see us use the same
> lock_path for c-vol and n-cpu in devstack for our CI runs.
> 
> Also, it should really be a note in the help text of the actual
> lock_path option IMO since it's a latent and persistent thing that
> people are going to need to remember after newton has long been released
> and people deploying OpenStack for the first time AFTER newton shouldn't
> have to know there was a release note telling them not to shoot
> themselves in the foot, it should be in the config option help text.

That patch to do this is where this all started, because I was not
comfortable landing that as a default change in master until all the
affected projects had landed release notes / docs around this. Otherwise
this could very well have snuck in to devstack, done the right thing
there, and never been noticed by others.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] locking concern with os-brick

2016-08-15 Thread Sean Dague
On 08/14/2016 06:23 PM, Patrick East wrote:

> I like the sound of a more unified way to interact with compute node
> services. Having a standardized approach for inter-service
> synchronization for controlling system resources would be sweet (even if
> it is just a more sane way of using local file locks). Anyone know if
> there is existing work in this area we can build off of? Or is the path
> forward a new cross-project spec to try and lock down some requirements,
> use-cases, etc.?
> 
> As far as spending time to hack together solutions via the config
> settings for this.. we'll its pretty minimal wrt size of effort compared
> to solving the large issue. Don't get me wrong though, I'm a fan of
> doing both in parallel. Even if we have resources jump on board
> immediately I'm not convinced we have a great chance to "fix" this for N
> in a more elegant fashion, much less any of the older releases affected
> by this. That leads me to believe we still need the shared config
> setting for at least a little while in Devstack, and documentation for
> existing deployments or ones going up with N.

So I think this breaks down into:

1) What are the exactly calls in os-brick that need this? What goes
wrong if they don't have it?

2) How do we communicate the need in a way that won't be missed by folks?

3) What is the least worst solution to this for Newton?

4) How do we make sure we don't do this again in future releases?

5) What is the more ideal long term solution here?

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc][cinder] tag:follows-standard-deprecation should be removed

2016-08-12 Thread Sean Dague
state, doesn't mean that the operator
> will be able to migrate if the drive is broken, but they'll have a
> chance depending on the state of the driver in question.  It could be
> horribly broken, but the breakage might be something fixable by someone
> that just knows Python.   If the driver is gone from tree entirely, then
> that's a lot more to overcome.
> 
> I don't think there is a way to make everyone happy all the time, but I
> think this buys operators a small window of opportunity to still manage
> their existing volumes before the driver is removed.  It also still
> allows the Cinder community to deal with unsupported drivers in a way
> that will motivate vendors to keep their stuff working.

This seems very reasonable. It allows the cinder team to mark stuff
unsupported at any point that vendors do not meet their upstream
commitments, but still provides some path forward for operators that
didn't realize their chosen vendor abandoned them and the community
until after they are in the midst of upgrade. It's very important that
the cinder team is able to keep a very visible hammer for vendors not
living up to their commitments.

Keeping some visible data around drivers that are flapping (going
unsupported, showing up with CI to get back out of the state,
disappearing again) would be great as well, to further give operators
data on what vendors are working in good faith and which aren't.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc][cinder] tag:follows-standard-deprecation should be removed

2016-08-12 Thread Sean Dague
On 08/12/2016 08:40 AM, Duncan Thomas wrote:
> On 12 Aug 2016 15:28, "Thierry Carrez"  <mailto:thie...@openstack.org>> wrote:
>>
>> Duncan Thomas wrote:
> 
>> I agree that leaving broken drivers in tree is not significantly better
>> from an operational perspective. But I think the best operational
>> experience would be to have an idea of how much risk you expose yourself
>> when you pick a driver, and have a number of them that are actually
>> /covered/ by the standard deprecation policy.
>>
>> So ideally there would be a number of in-tree drivers (on which the
>> Cinder team would apply the standard deprecation policy), and a separate
>> repository for 3rd-party drivers that can be removed at any time (and
>> which would /not/ have the follows-standard-deprecation-policy tag).
> 
> So we'd certainly have to move out all of the backends requiring
> proprietary hardware, since we couldn't commit to keeping them working
> if their vendors turn of their CI. That leaves ceph, lvm, NFS, drdb, and
> sheepdog, I think. There is not enough broad knowledge in the core team
> currently to support sheepdog or drdb without 'vendor' help. That would
> leave us with three drivers in the tree, and not actually provide much
> useful risk information to deployers at all.

I 100% understand the cinder policy of kicking drivers out without CI.
And I think there is a lot of value in ensuring what's in tree is tested.

However, from a user perspective basically it means that if you deploy
Newton cinder and build a storage infrastructure around anything other
than ceph, lvm, or NFS, you have a very real chance of never being able
to upgrade to Ocata, because your driver was fully deleted, unless you
are willing to completely change up your storage architecture during the
upgrade.

That is the kind of reality that should be front and center to the
users. Because it's not just a drop of standard deprecation, it's also a
removal of 'supports upgrade', as Netwon cinder config won't work with
Ocata.

Could there be more of an off ramp / on ramp here to the drivers? If a
driver CI fails to meet the reporting window mark it deprecated for the
next delete window. If a driver is in a deprecated state they need some
long window of continuous reporting to get out of that state (like 120
days or something). Bring in all new drivers in a
deprecated/experimental/untested state, which they only get to shrug off
after the onramp window?

It's definitely important that the project has the ability to clean out
the cruft, but it would be nice to not be overly brutal to our operators
at the same time.

And if not, I think that tags (or lack there of) aren't fully
communicating the situation here. Cinder docs should basically say "only
use ceph / lvm / nfs, as those are the only drivers that we can
guarantee will be in the next release".

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [cinder] [nova] locking concern with os-brick

2016-08-12 Thread Sean Dague
A devstack patch was pushed earlier this cycle around os-brick -
https://review.openstack.org/341744

Apparently there are some os-brick operations that are only safe if the
nova and cinder lock paths are set to be the same thing. Though that
hasn't yet hit release notes or other documentation yet that I can see.
Is this a thing that everyone is aware of at this point? Are project
teams ok with this new requirement? Given that lock_path has no default,
this means we're potentially shipping corruption by default to users.
The other way forward would be to revisit that lock_path by default
concern, and have a global default. Or have some way that users are
warned if we think they aren't in a compliant state.

I've put the devstack patch on a -2 hold until we get ACK from both Nova
and Cinder teams that everyone's cool with this.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] versioning the api-ref?

2016-08-12 Thread Sean Dague
On 08/11/2016 06:02 PM, Brian Rosmaita wrote:
> I have a question about the api-ref. Right now, for example, the new
> images v1/v2 api-refs are accurate for Mitaka.  But DocImpact bugs are
> being generated as we speak for changes in master that won't be
> available to consumers until Newton is released (unless they build from
> source). If those bug fixes get merged, then the api-ref will no longer
> be accurate for Mitaka API consumers (since it's published upon update).

I'm confused about this statement.

Are you saying that the Glance v2 API in Mitaka and Newton are different
in some user visible ways? But both are called the v2 API? How does an
end user know which to use?

The assumption with the api-ref work is that the API document should be
timeless (branchless), and hence why building from master is always
appropriate. That information works for all time.

We do support microversion markup in the document, you can see some of
that in action here in the Nova API Ref -
http://developer.openstack.org/api-ref/compute/?expanded=list-servers-detail


-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [requirements] near term gate optimization opportunity

2016-08-11 Thread Sean Dague

On 08/11/2016 07:56 PM, Tony Breeds wrote:

On Thu, Aug 11, 2016 at 09:13:12AM +0200, Andreas Jaeger wrote:

On 2016-08-10 23:06, Sean Dague wrote:

Based on reading some logs, it looks like requirements updates are
getting regenerated on every requirements land for all open patches,
even if they aren't impacted by it -
https://review.openstack.org/#/c/351991/

patch 10,11,12 in that series are just rebases, all happening within a
couple of hours.

With the check queue at ... 505 changes as of this email, this is
definitely adding some extra load to the system.

It would be a great optimization for someone to look at the script -
https://github.com/openstack-infra/project-config/blob/ab89ab40ed74db306ce10e36341d39f23231f012/jenkins/scripts/propose_update.sh
and make it so that if the commit did not change, don't push a rebased
review.


Good idea!

One thing to keep in mind: You want to rebase if there's a merge conflict...


What about adding a check and only rebasing if and only if the chnage in
question has a -1 from Jenkins?

That'd mean that in-flight reviews don't get rebased but we *do* rebase if
we're in merge conflict.

The downside to this is we'll be doing *more* gerrit queries but that's
probably ok.


That seems like a further optimization. Honestly, I would rarely expect 
these to be in merge conflict, and at worse, they would be so until the 
next requirements push.


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Nova mascot - nominations and voting

2016-08-11 Thread Sean Dague
On 08/10/2016 11:46 AM, Sean Dague wrote:
> On 08/10/2016 11:36 AM, Sean Dague wrote:
>> On 08/09/2016 07:30 PM, Heidi Joy Tretheway wrote:
>>> TL;DR: If you don’t want a mascot, you don’t have to. But Nova, you’ll
>>> be missed. :-)
>>>
>>> A few notes following up on Matt Riedemann, Clint Byrum, Daniel
>>> Berrange’s conversation regarding the Nova mascot…
>>>
>>> Nova doesn’t have to have a mascot if the majority of the team doesn’t
>>> want one. I’m not sure if the Nova community took a vote or if it was
>>> more of an informal discussion. We have 53 projects with confirmed
>>> logos, and we’re planning some great swag associated with the new
>>> project mascots. (I’m surprised the Nova team didn’t immediately request
>>> a star nova as their mascot. I’ll give you three guesses what Swift
>>> picked...)
>>
>> Ok, we've been having a bit of a nova core private email thread just to
>> figure out where everyone stood.

So... I overstepped here and jumped to a conclusion based on an
incorrect understanding of people's sentiments. And there has been some
concern expressed that part of this conversation was private, which is a
valid concern. I'm sorry about all of that.

Let's start afresh...

What's been publicly suggested so far (from all ML posts that seem to
contain a suggestion):

ant - alexis (already chosen by infra)
bee - alexis (already chosen by refstack)
star - heidi
supernova - markus, auggy, bob ball
octopus - chris (already chosen by UX)

I'd suggest that we actually combine star/supernova into one item to
give the graphic designers some flexibility and creativity. With less
distinctive features than animals, the freedom is probably needed to
make something cool.

Are there other suggestions? Those items already chosen by other teams
are out of bounds per the FAQ
(http://www.openstack.org/project-mascots). We can leave this open for
the rest of the week, and if there are additional valid options do an
ATC poll next week.

-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] test strategy for the serial console feature

2016-08-11 Thread Sean Dague
On 08/11/2016 05:45 AM, Markus Zoeller wrote:
> On 26.07.2016 12:16, Jordan Pittier wrote:
>> Hi Markus
>> You don"t really need a whole new job for this. Just turn that flag to True
>> on existing jobs.
>>
>> 30/40 seconds is acceptable. But I am surprised considering a VM usually
>> boots in 5 sec or so. Any idea of where that slowdown comes from ?
>>
>> On Tue, Jul 26, 2016 at 11:50 AM, Markus Zoeller <
>> mzoel...@linux.vnet.ibm.com> wrote:

We just had a big chat about this in the #openstack-nova IRC channel. To
summarize:

The class of bugs that are really problematic are:

 * https://bugs.launchpad.net/nova/+bug/1455252 - Launchpad bug 1455252
in OpenStack Compute (nova) "enabling serial console breaks live
migration" [High,In progress] - Assigned to sahid (sahid-ferdjaoui)

* https://bugs.launchpad.net/nova/+bug/1595962 - Launchpad bug 1595962
in OpenStack Compute (nova) "live migration with disabled vnc/spice not
possible" [Undecided,In progress] - Assigned to Markus Zoeller
(markus_z) (mzoeller)

Which are both in the category of serial console breaking live
migration. It's the serial device vs. live migration that's most
problematic. Serial consoles themselves haven't broken badly recently.
Given that we don't do live migration testing in most normal jobs, the
Tempest jobs aren't really going to help here.

The dedicated live-migration job is being targeted.

Serial console support is currently a function at the compute level.
Which is actually a little odd. Because it means that all guests on a
compute must be serial console, or must not. Imagine a compute running
Linux, Windows, FreeBSD guests. It's highly unlikely that you want to
force serial console one way or another on all of those the same way.
This is probably something that makes sense to add as an image
attribute, because images will need guest configuration to support
serial consoles. As an image attribute this would also help on testing
because we could mix / match in a single run.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [requirements] near term gate optimization opportunity

2016-08-10 Thread Sean Dague
Based on reading some logs, it looks like requirements updates are
getting regenerated on every requirements land for all open patches,
even if they aren't impacted by it -
https://review.openstack.org/#/c/351991/

patch 10,11,12 in that series are just rebases, all happening within a
couple of hours.

With the check queue at ... 505 changes as of this email, this is
definitely adding some extra load to the system.

It would be a great optimization for someone to look at the script -
https://github.com/openstack-infra/project-config/blob/ab89ab40ed74db306ce10e36341d39f23231f012/jenkins/scripts/propose_update.sh
and make it so that if the commit did not change, don't push a rebased
review.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Nova mascot

2016-08-10 Thread Sean Dague
On 08/10/2016 11:36 AM, Sean Dague wrote:
> On 08/09/2016 07:30 PM, Heidi Joy Tretheway wrote:
>> TL;DR: If you don’t want a mascot, you don’t have to. But Nova, you’ll
>> be missed. :-)
>>
>> A few notes following up on Matt Riedemann, Clint Byrum, Daniel
>> Berrange’s conversation regarding the Nova mascot…
>>
>> Nova doesn’t have to have a mascot if the majority of the team doesn’t
>> want one. I’m not sure if the Nova community took a vote or if it was
>> more of an informal discussion. We have 53 projects with confirmed
>> logos, and we’re planning some great swag associated with the new
>> project mascots. (I’m surprised the Nova team didn’t immediately request
>> a star nova as their mascot. I’ll give you three guesses what Swift
>> picked...)
> 
> Ok, we've been having a bit of a nova core private email thread just to
> figure out where everyone stood.
> 

"Milestone 3 is a really hard time to have new requests pop up, for
anything really, because right now the whole team is pretty heads down
trying to only *disappoint* no more than 50% of people that are trying
to land
features during the cycle that won't make it."

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Nova mascot

2016-08-10 Thread Sean Dague
On 08/09/2016 07:30 PM, Heidi Joy Tretheway wrote:
> TL;DR: If you don’t want a mascot, you don’t have to. But Nova, you’ll
> be missed. :-)
> 
> A few notes following up on Matt Riedemann, Clint Byrum, Daniel
> Berrange’s conversation regarding the Nova mascot…
> 
> Nova doesn’t have to have a mascot if the majority of the team doesn’t
> want one. I’m not sure if the Nova community took a vote or if it was
> more of an informal discussion. We have 53 projects with confirmed
> logos, and we’re planning some great swag associated with the new
> project mascots. (I’m surprised the Nova team didn’t immediately request
> a star nova as their mascot. I’ll give you three guesses what Swift
> picked...)

Ok, we've been having a bit of a nova core private email thread just to
figure out where everyone stood.

Milestone 3 is a really hard time to have new requests pop up, for
anything really, because right now the whole team is pretty heads down
trying to only no more than 50% of people that are trying to land
features during the cycle that won't make it.

To summarize the sentiment, I think there are many folks that aren't
enthusiastic about mascots. However there are also some that would like
to move forward with something, and not have Nova represented by a
broken image link.

I think given what's been on the public lists, and some of this back
channel, marking Nova down for "star / supernova" would be fine. I
imagine the creative folks working on logos can come up with something
cool around that.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Nova mascot

2016-08-10 Thread Sean Dague
On 08/10/2016 11:19 AM, Szankin, Maciej wrote:
> While supernova looks super cool on photographs, I frankly have no idea
> how this could look like a logo. You know, the cartoonish style. Tried
> to google for examples, but they do look terrible…

That's why you have graphics designers try a thing.

    -Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [requirements] History lesson please

2016-08-09 Thread Sean Dague
On 08/09/2016 11:25 AM, Matthew Thode wrote:
> On 08/09/2016 10:22 AM, Ian Cordasco wrote:
>> -Original Message-
>> From: Matthew Thode 
>> Reply: prometheanf...@gentoo.org , OpenStack 
>> Development Mailing List (not for usage questions) 
>> 
>> Date: August 9, 2016 at 09:53:53
>> To: openstack-dev@lists.openstack.org 
>> Subject:  Re: [openstack-dev] [requirements] History lesson please
>>
>>> One of the things on our todo list is to test the 'lower-constraints' to
>>> make sure they still work with the head of branch.
>>
>> That's not sufficient. You need to find versions in between the lowest 
>> tested version and the current version to also make sure you don't end up 
>> with breakage. You might have somepackage that has a lower version of 2.0.1 
>> and a current constraint of 2.12.3. You might even have a blacklist of 
>> versions in between those two versions, but you still need other versions to 
>> ensure that things in between those continue to work.
>>
>> THe tiniest of accidental incompatibilities can cause some of the most 
>> bizarre bugs.
>>
>> --  
>> Ian Cordasco
>>
> 
> I'm aware of this, but this would be a good start.

And, more importantly, assuming that testing is only valid if it covers
every scenario, sets the bar at entirely the wrong place.

A lower bound test would eliminate some of the worst fiction we've got.
Testing is never 100%. With a complex system like OpenStack, it's
probably not even 1% (of configs matrix for sure). But picking some
interesting representative scenarios and seeing that it's not completely
busted is worth while.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] devstack changed to neutron by default - merged

2016-08-09 Thread Sean Dague
https://review.openstack.org/#/c/350750/ has merged, which moves us over
to neutron by default in devstack.

If you have manually set services in your local.conf, you won't see any
changes. If you don't regularly set those services, you'll be using
neutron on your next stack.sh after this change.

The *one* major difference in configuration is that PUBLIC_INTERFACE
means something different with neutron. This now means the interface
that you would give to neutron and let it completely take over. There
will be docs coming soon to explain this a bit better on the devstack
documentation site (http://devstack.org) once I'm a few more cups of
coffee into the morning. However, in the mean time, if you see weird
fails during stacking, try deleting PUBLIC_INTERFACE from your local.conf.

    -Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [requirements] History lesson please

2016-08-09 Thread Sean Dague
On 08/09/2016 02:38 AM, Tony Breeds wrote:
> Hi all,
> I guess this is aimed at the long term requirements team members.
> 
> The current policy for approving requirements[1] bumps contains the following 
> text:
> 
> Changes to update the minimum version of a library developed by the
> OpenStack community can be approved by one reviewer, as long as the
> constraints are correct and the tests pass.
> 
> Perhaps I'm a little risk adverse but this seems a little strange to me.  Can
> folks that know more about how this came about help me understand why that is?
> 
> Yours Tony.
> 
> [1] 
> https://github.com/openstack/requirements/blob/master/README.rst#for-upgrading-requirements-versions

With constraints, the requirements minimum bump is pretty low risk. Very
little of our jobs are impacted by it.

It's in many ways more risking to leave minimums where they are and bump
constraints, because the minimums could be lying that they still work at
the lower level.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] devstack changing to neutron by default RSN - current issues with OVH

2016-08-08 Thread Sean Dague
In summary, it turns out we learned a few things:

1) neutron guests in our gate runs don't have the ability to route
outwards. For instance, if they tried to do a package update, it would fail.

2) adding the ability for them to route outwards (as would be expected
for things like package updates) was deemed table stakes for the
devstack default.

3) doing so fails one tempest test on OVH, because they seem to be
reflecting network traffic? We see connectivity between guests when it's
not expected.


My proposed path forward:

1) merge https://review.openstack.org/#/c/350750/ - devstack default change
2) merge https://review.openstack.org/#/c/352463/ - skip of tempest test
that will fail on OVH (which turns into a 10% fail rate for neutron)
3) look at moving something like
https://review.openstack.org/#/c/351876/3 into devstack-gate to handle
OVH special casing. This is going to take time, especially given that we
get maybe 2 iterations a day due to the gate being overloaded.
4) revert https://review.openstack.org/#/c/352463/

If we don't have the devstack default change merged by the middle of the
week, we probably need to abandon merging in this cycle at all, because
we need breathing space to address any possible fallout from the merge.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [api][os-api-ref] openstackdocstheme integration

2016-08-08 Thread Sean Dague
On 08/08/2016 08:44 AM, Doug Hellmann wrote:
> Excerpts from Hayes, Graham's message of 2016-08-08 11:28:35 +:
>> On 05/08/2016 19:15, Doug Hellmann wrote:
>>> Excerpts from Hayes, Graham's message of 2016-08-05 17:04:35 +:
>>>> Hey,
>>>>
>>>> We look like we are getting close to merging the os-api-ref integration
>>>> with openstackdocstheme.
>>>>
>>>> Unfortunately, there is no "phased" approach available - the version
>>>> released with compatibility for openstackdocstheme will not work
>>>> with oslo.sphinx.
>>>
>>> In what way doesn't it work? Is one of the themes missing something?
>>>
>>> Doug
>>
>> Both themes are laid out differently. One uses bootstrap and the other
>> doesn't, one has a different view on what should be hidden, and where
>> TOCs belong.
>>
>> The end result was that for the oslosphinx integration we included extra
>> CSS / JS, but that code can cause conflicts with openstackdocstheme.
> 
> Would putting that extra stuff into oslosphinx, as an optional part of
> the them, make the transition any easier?

It's actually somewhat the inverse problem (IIRC).

oslosphinx is written as an appropriate sphinx extension / theme, it
plays nice with others. You can tell the author(s) were familiar with
sphinx.

openstackdocstheme was done as a bootstrap UX, then grafted into sphinx
builds in a way that just barely works, as long as you don't include any
other sphinx extensions. The moment you do, things get really funky.
Given that it does things like carry it's own jquery (needed by
bootstrap), instead of doing the standard scripts include in sphinx.
This was clearly written by folks that were familiar with boostrap, and
not really with sphinx.

When we hacked together os-api-ref the incompatibilities with
openstackdocstheme were getting in the way, so it was done with
oslosphinx in mind. There were definitely styling elements we needed
differently, and instead of negotiating changing the style on everything
else, that styling was done in os-api-ref.

os-api-ref also needs some dynamic elements. For instance, section
expand / collapse, and sensible bookmarking. In a perfect world that
probably ends up in the theme layer, which means doing it in both
oslosphinx and openstackdocstheme, the extension only creating base markup.

However, the goal isn't to support both. It's to support
openstackdocstheme which is the strategic UX for all openstack docs
(even though it's actually not very sphinxy, which is a whole other
issue that will probably hurt us other places).

So, we could do a lot of work to smooth the transition, which would get
thrown away shortly after, or just create a flag day and have docs
broken for a bit until people get across it.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] devstack changing to neutron by default RSN

2016-08-05 Thread Sean Dague
On 08/05/2016 04:32 PM, Armando M. wrote:
> 
> 
> On 5 August 2016 at 13:05, Dan Smith  <mailto:d...@danplanet.com>> wrote:
> 
> > I haven't been able to reproduce it either, but it's unclear how packets
> > would get into a VM on an island since there is no router interface, and
> > the VM can't respond even if it did get it.
> >
> > I do see outbound pings from the connected VM get to eth0, hit the
> > masquerade rule, and continue on their way.  But those packets get
> > dropped at my ISP since they're in the 10/8 range, so perhaps something
> > in the datacenter where this is running is responding?  Grasping at
> > straws is right until we see the results of Armando's test patch.
> 
> Right, that's what I was thinking when I said "something with the
> provider" in my other reply. A provider could potentially always reflect
> 10/8 back at you to eliminate the possibility of ever escaping like
> that, which would presumably come back, hit the 10.1/20 route that we
> have and continue on in. I'm not entirely sure why that's not being hit
> right now (i.e. before this change), but I'm less familiar with the
> current state of the art than I am this patch.
> 
> 
> Still digging but we have a clean pass in [0]. The multinode setup
> involves br-ex [1,2], I am not quite sure how changing iptables rules
> fiddles with it, if at all.
> 
> [0]
> http://logs.openstack.org/76/351876/1/experimental/gate-tempest-dsvm-neutron-dvr-multinode-full/3a81575/logs/testr_results.html.gz
> [1] 
> https://github.com/openstack-infra/devstack-gate/blob/master/functions.sh#L1108
> [2] 
> https://github.com/openstack-infra/devstack-gate/blob/master/devstack-vm-gate.sh#L130

So... interesting relevant data which supports Dan and Brian's theory.

The test in question only runs on neutron configurations. Every failure
of the test is on OVH nodes. Every time that test has run not on OVH
nodes, it's passed. http://goo.gl/Sppc72 (logstash results). After the
last failure on the regular job that we had, Dan said we could add a
'-s' flag to be safe, and it looks like it *fixed* it. But the reality
is that it just ran on internap instead. And then when I updated the
commit message, that ran on rax.

OVH networking is kind of unique with the way they give us a /32
address, it's very possible other things in their infrastructure are
causing this reflection.

This would also speak to the fact that our gate tests probably never
produced guests which could actually talk to the outside world. We don't
ever test that they do.The masq rule openned this up for the first time
in our gate as well.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] devstack changing to neutron by default RSN

2016-08-05 Thread Sean Dague
On 08/05/2016 11:34 AM, Armando M. wrote:
> 
> 
> On 5 August 2016 at 05:59, Sean Dague  <mailto:s...@dague.net>> wrote:
> 
> On 08/04/2016 09:15 PM, Armando M. wrote:
> > So glad we are finally within the grasp of this!
> >
> > I posted [1], just to err on the side of caution and get the opportunity
> > to see how other gate jobs for Neutron might be affected by this change.
> >
> > Are there any devstack-gate changes lined up too that we should be 
> aware of?
> >
> > Cheers,
> > Armando
> >
> > [1] https://review.openstack.org/#/c/351450/
> <https://review.openstack.org/#/c/351450/>
> 
> Nothing at this point. devstack-gate bypasses the service defaults in
> devstack, so it doesn't impact that at all. Over time we'll want to make
> neutron the default choice for all devstack-gate setups, and nova-net to
> be the exception. But that actually can all be fully orthoginal to this
> change.
> 
> 
> Ack
>  
> 
> The experimental results don't quite look in yet, it looks like one test
> is failing on dvr (which is the one that tests for cross tenant
> connectivity) -
> 
> http://logs.openstack.org/50/350750/5/experimental/gate-tempest-dsvm-neutron-dvr/4958140/
> 
> <http://logs.openstack.org/50/350750/5/experimental/gate-tempest-dsvm-neutron-dvr/4958140/>
> 
> That test has been pretty twitchy during this patch series, and it's
> quite complex, so figuring out exactly why it's impacted here is a bit
> beyond me atm. I think we need to decide if that is going to get deeper
> inspection, we live with the fails, or we disable the test for now so we
> can move forward and get this out to everyone.
> 
> 
> Looking at the health trend for DVR [1], the test hasn't failed in a
> while, so I wonder if this is induced by the proposed switch, even
> though I can't correlate it just yet (still waiting for caffeine to kick
> in). Perhaps we can give ourselves today to look into it and pull the
> trigger for 351450 <https://review.openstack.org/#/c/351450/> on Monday?
> 
> [1] 
> http://status.openstack.org/openstack-health/#/job/gate-tempest-dsvm-neutron-dvr

The only functional difference in the new code that happens in the gate
is the iptables rule:

local default_dev=""
default_dev=$(ip route | grep ^default | awk '{print $5}')
sudo iptables -t nat -A POSTROUTING -o $default_dev -s
$FLOATING_RANGE -j MASQUERADE

That's the thing to consider. It is the bit that's a little janky, but
it was the best idea we had for making things act like we expect
otherwise on the single node environment (especially guests being able
to egress). It's worth noting, we never seem to test guest egress in the
gate (at least not that I could find), so this is something that might
just never have been working the way we expected.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] devstack changing to neutron by default RSN

2016-08-05 Thread Sean Dague
On 08/04/2016 09:15 PM, Armando M. wrote:
> So glad we are finally within the grasp of this!
> 
> I posted [1], just to err on the side of caution and get the opportunity
> to see how other gate jobs for Neutron might be affected by this change.
> 
> Are there any devstack-gate changes lined up too that we should be aware of?
> 
> Cheers,
> Armando
> 
> [1] https://review.openstack.org/#/c/351450/

Nothing at this point. devstack-gate bypasses the service defaults in
devstack, so it doesn't impact that at all. Over time we'll want to make
neutron the default choice for all devstack-gate setups, and nova-net to
be the exception. But that actually can all be fully orthoginal to this
change.

The experimental results don't quite look in yet, it looks like one test
is failing on dvr (which is the one that tests for cross tenant
connectivity) -
http://logs.openstack.org/50/350750/5/experimental/gate-tempest-dsvm-neutron-dvr/4958140/

That test has been pretty twitchy during this patch series, and it's
quite complex, so figuring out exactly why it's impacted here is a bit
beyond me atm. I think we need to decide if that is going to get deeper
inspection, we live with the fails, or we disable the test for now so we
can move forward and get this out to everyone.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Some thoughts on API microversions

2016-08-04 Thread Sean Dague
On 08/04/2016 12:47 PM, John Garbutt wrote:
> On 4 August 2016 at 14:18, Andrew Laski  wrote:
>> On Thu, Aug 4, 2016, at 08:20 AM, Sean Dague wrote:
>>> On 08/03/2016 08:54 PM, Andrew Laski wrote:
>>>> I've brought some of these thoughts up a few times in conversations
>>>> where the Nova team is trying to decide if a particular change warrants
>>>> a microversion. I'm sure I've annoyed some people by this point because
>>>> it wasn't germane to those discussions. So I'll lay this out in it's own
>>>> thread.
>>>>
>>>> I am a fan of microversions. I think they work wonderfully to express
>>>> when a resource representation changes, or when different data is
>>>> required in a request. This allows clients to make the same request
>>>> across multiple clouds and expect the exact same response format,
>>>> assuming those clouds support that particular microversion. I also think
>>>> they work well to express that a new resource is available. However I do
>>>> think think they have some shortcomings in expressing that a resource
>>>> has been removed. But in short I think microversions work great for
>>>> expressing that there have been changes to the structure and format of
>>>> the API.
>>>>
>>>> I think microversions are being overused as a signal for other types of
>>>> changes in the API because they are the only tool we have available. The
>>>> most recent example is a proposal to allow the revert_resize API call to
>>>> work when a resizing instance ends up in an error state. I consider
>>>> microversions to be problematic for changes like that because we end up
>>>> in one of two situations:
>>>>
>>>> 1. The microversion is a signal that the API now supports this action,
>>>> but users can perform the action at any microversion. What this really
>>>> indicates is that the deployment being queried has upgraded to a certain
>>>> point and has a new capability. The structure and format of the API have
>>>> not changed so an API microversion is the wrong tool here. And the
>>>> expected use of a microversion, in my opinion, is to demarcate that the
>>>> API is now different at this particular point.
>>>>
>>>> 2. The microversion is a signal that the API now supports this action,
>>>> and users are restricted to using it only on or after that microversion.
>>>> In many cases this is an artificial constraint placed just to satisfy
>>>> the expectation that the API does not change before the microversion.
>>>> But the reality is that if the API change was exposed to every
>>>> microversion it does not affect the ability I lauded above of a client
>>>> being able to send the same request and receive the same response from
>>>> disparate clouds. In other words exposing the new action for all
>>>> microversions does not affect the interoperability story of Nova which
>>>> is the real use case for microversions. I do recognize that the
>>>> situation may be more nuanced and constraining the action to specific
>>>> microversions may be necessary, but that's not always true.
>>>>
>>>> In case 1 above I think we could find a better way to do this. And I
>>>> don't think we should do case 2, though there may be special cases that
>>>> warrant it.
>>>>
>>>> As possible alternate signalling methods I would like to propose the
>>>> following for consideration:
>>>>
>>>> Exposing capabilities that a user is allowed to use. This has been
>>>> discussed before and there is general agreement that this is something
>>>> we would like in Nova. Capabilities will programatically inform users
>>>> that a new action has been added or an existing action can be performed
>>>> in more cases, like revert_resize. With that in place we can avoid the
>>>> ambiguous use of microversions to do that. In the meantime I would like
>>>> the team to consider not using microversions for this case. We have
>>>> enough of them being added that I think for now we could just wait for
>>>> the next microversion after a capability is added and document the new
>>>> capability there.
>>>
>>> The problem with this approach is that the capability add isn't on a
>>> microversion boundary, as long as we continue to believe that we wan

[openstack-dev] [all] devstack changing to neutron by default RSN

2016-08-04 Thread Sean Dague
One of the cycle goals in newton was to get devstack over to neutron by
default. Neutron is used by 90+% of our users, and nova network is
deprecated, and is not long for this world.

Because devstack is used by developers as well as by out test
infrastructure, the major stumbling block was coming up with a good
working default on a machine with only 1 interface, that doesn't leave
that interface in a totally broken state if you reboot the box (noting
that ovs changes are persistent by default, but brctl ones are not).

We think we've come up with a model that works. It's here -
https://review.openstack.org/#/c/350750/. And while it's surprisingly
short, it took a lot of thinking this one through to get there.

The crux of it is that we trust the value of PUBLIC_INTERFACE in a new
way on the neutron side. It is now unset by default (logic was changed
in n-net to keep things right there), and if set, then we assume you
really want neutron to manage this physical interface.

If not, that's cool. We're automatically creating a bridge interface
(with no physical interfaces in it) and managing that. For single node
testing this works fine. It passes all the tempest tests[1]. The only
thing that's really weird in this setup is that because there is no
physical interface in that bridge, there is no path to the outside world
from guests. That means no package updates on them.

We address that with an iptables masq rule. It's a little cheaty pants,
however of the options we've got, it didn't seem so bad. (Note: if you
have a better option and are willing to get knee deep in solving it,
please do so. More contributors the better.)

It's going to take a bit for docs to all roll over here, but I think we
actually want this out sooner rather than later to find any other edge
cases that it introduces. There will be some bumpiness here. However,
being able to bring up a full neutron with only the 4 passwords
specified in config is quite nice.

1. actually 5 tests fail for unrelated reasons, which is that tempest
isn't probably excluding tests for services that aren't running because
it makes some assumptions on the gate config. That will be fixed soon.

-Sean

-- 
Sean Dague
http://dague.net


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Some thoughts on API microversions

2016-08-04 Thread Sean Dague
aps there could
> be a header that indicates the date of the last commit in that code.
> That's not an ideal way to implement it but hopefully it makes it clear
> what I'm suggesting. Some marker that a user can use to determine that a
> new behavior is to be expected, but not one that's more intended to
> signal structural API changes.
> 
> Thoughts?

I think we're going to get a ton of push back from people on this. When
we first rolled out microversions I got a number of people asking if
they could hide the supported microversions, because it gave some
indication of the code level on the server (like people hide apache
version in production). Which entirely missed the point of the
infrastructure. I can't see folks allowing this in. Plus this is git,
and people have local patches, so I'm not sure there is any meaningful
concept here to expose.

I'm on board with a future where we have the monotonically increasing
microversions, as well as a side channel of discoverable capabilities.
But I think the moment you try to introduce a 3rd communication channel
for behavior, which has something looking like a version number, it
actually becomes way too confusing from a consumption point of view. And
it also breaks one of the things we were trying to do, which is
guaruntee old behavior (as much as possible) on old API versions.

I think that if we put some code version into place, we'll just assume
we can use that to signal changes, and stop realizing how disruptive it
is to make those changes for existing users.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [requirements] race in keystone unit tests

2016-08-03 Thread Sean Dague
On 08/03/2016 12:26 PM, Lance Bragstad wrote:
> Sending a follow-up because I think we ended up finding something
> relevant to this discussion.
> 
> As keystone moves towards making fernet the default, one of our work
> items was to mock the system clock in tests. This allows us to advance
> the clock by one second where we need to avoid sub-second race
> conditions. To do this we used freezegun [0]. We recently landed a bunch
> of fixes to do this.
> 
> It turns out that there is a possible race between when freezegun
> patches it's modules and when the test runs. This turned up in a patch I
> was working on locally and I noticed certain clock operations weren't
> using the fake time object from freezegun. As a work-around, we can
> leverage the set_time_override() method from oslo_utils.timeutils to
> make sure we are using the fake time from within the frozen time
> context. In my testing locally this worked.
> 
> If keystone requires a hybrid approach to patching
> (oslo_utils.timeutils.set_time_override() + freezegun), we should build
> it into a well documented hybrid context manager so that's its more
> apparent why we need it.
> 
> Sean, I can start working on this to see if it starts mitigating the
> races you're seeing.
> 
> [0] https://pypi.python.org/pypi/freezegun

Lance, thanks for digging into this! I think using the oslo
set_time_override sounds like the best approach, that's what I remember
other places to test time boundaries like this.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [requirements] race in keystone unit tests

2016-08-02 Thread Sean Dague
One of my concerns about stacking up project unit tests in the
requirements jobs, is the unit tests aren't as free of races as you
would imagine. Because they only previously impacted the one project
team, those teams are often just fast to recheck instead of get to the
bottom of it. Cross testing with them in a voting way changes their impact.

The keystone unit tests have a existing race condition in them, which
recently failed an unrelated requirements bump -
http://logs.openstack.org/50/348250/6/check/gate-cross-keystone-python27-db-ubuntu-xenial/962327d/console.html#_2016-08-02_03_52_14_537923

I'm not fully sure where to go from here. But wanted to make sure that
data is out there. Any keystone folks who can dive into and sort it out
would be highly appreciated.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] persistently single-vendor projects

2016-08-02 Thread Sean Dague
On 08/02/2016 06:16 AM, Chris Dent wrote:
> On Mon, 1 Aug 2016, James Bottomley wrote:
> 
>> Making no judgments about the particular exemplars here, I would just
>> like to point out that one reason why projects exist with very little
>> diversity is that they "just work".  Usually people get involved when
>> something doesn't work or they need something changed to work for them.
>> However, people do have a high tolerance for "works well enough"
>> meaning that a project can be functional, widely used and not
>> attracting diverse contributors.  A case in point for this type of
>> project in the non-openstack world would be openssl but there are many
>> others.
> 
> In a somewhat related point, the kinds of metrics we use in OpenStack to
> evaluate project health tend to have the unintended consequence of
> requiring the projects to always be growing and changing (i.e. churning)
> rather than trending towards stability and maturity.
> 
> I'd like to think that we can have some projects that can be called
> "done".
> 
> So we need to consider the side effects of the measurements we're
> taking and not let the letter of the new laws kill the spirit.
> 
> Yours in cliches,

I do understand that concern. Metrics games definitely can have
unintended consequences. Are there instances of software in our
ecosystem that you consider done, single vendor, and would be negatively
impacted by not being called OpenStack?

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] persistently single-vendor projects

2016-08-01 Thread Sean Dague
On 08/01/2016 12:24 PM, James Bottomley wrote:
> On Mon, 2016-08-01 at 11:38 -0400, Doug Hellmann wrote:
>> Excerpts from Adrian Otto's message of 2016-08-01 15:14:48 +:
>>> I am struggling to understand why we would want to remove projects
>>> from our big tent at all, as long as they are being actively
>>> developed under the principles of "four opens". It seems to me that
>>> working to disqualify such projects sends an alarming signal to our
>>> ecosystem. The reason we made the big tent to begin with was to set
>>> a tone of inclusion. This whole discussion seems like a step
>>> backward. What problem are we trying to solve, exactly?
>>>
>>> If we want to have tags to signal team diversity, that's fine. We
>>> do that now. But setting arbitrary requirements for big tent
>>> inclusion based on who participates definitely sounds like a
>>> mistake.
>>
>> Membership in the big tent comes with benefits that have a real
>> cost born by the rest of the community. Space at PTG and summit
>> forum events is probably the one that's easiest to quantify and to
>> point to as something limited that we want to use as productively
>> as possible. If 90% of the work of a project is being done by a
>> single company or organization (our current definition for
>> single-vendor), and that doesn't change after 18 months, then I
>> would take that as a signal that the community isn't interested
>> enough in the project to bear the associated costs.
>>
>> I'm interested in hearing other reasons that we should keep these
>> sorts of projects, though. I'm not yet ready to propose the change
>> to the policy myself.
> 
> Making no judgments about the particular exemplars here, I would just
> like to point out that one reason why projects exist with very little
> diversity is that they "just work".  Usually people get involved when
> something doesn't work or they need something changed to work for them.
>  However, people do have a high tolerance for "works well enough"
> meaning that a project can be functional, widely used and not
> attracting diverse contributors.  A case in point for this type of
> project in the non-openstack world would be openssl but there are many
> others.

I think openssl is a good example of what we are actually trying to
avoid. Over time that project boiled down to just a couple of people.
Which seemed ok, because everything seemed to be working fine, but only
because no one was pushing on it too hard. Then folks did, and we
realized that there was kind of a house of cards here, that's required
special intervention to address some of the issues found.

Keeping a diverse community up front helps mitigate some of this. It's
not a silver bullet by any means, but it does help ensure that the goals
of the project aren't only the goals of a single product team inside a
single entity.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] persistently single-vendor projects

2016-08-01 Thread Sean Dague
On 08/01/2016 10:28 AM, Davanum Srinivas wrote:
> Sean,
> 
> So we will programatically test the metrics (if we are not doing that
> already) to apply/remove "team:single-vendor" tag:
> 
> https://governance.openstack.org/reference/tags/team_single-vendor.html
> 
> And trigger exit when the tag is present for more than 3 cycles in a
> row (say as of release date?)
> 
> Thanks,
> -- Dims

An approach like that would be fine with me. I'm not sure we have a
formal proposal yet, but 3 cycles seems like a reasonable time frame.
I'm happy to debate if people think there are better timeframes instead.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] persistently single-vendor projects

2016-08-01 Thread Sean Dague
On 08/01/2016 09:58 AM, Davanum Srinivas wrote:
> Thierry, Ben, Doug,
> 
> How can we distinguish between. "Project is doing the right thing, but
> others are not joining" vs "Project is actively trying to keep people
> out"?

I think at some level, it's not really that different. If we treat them
as different, everyone will always believe they did all the right
things, but got no results. 3 cycles should be plenty of time to drop
single entity contributions below 90%. That means prioritizing bugs /
patches from outside groups (to drop below 90% on code commits),
mentoring every outside member that provides feedback (to drop below 90%
on reviews), shifting development resources towards mentoring / docs /
on ramp exercises for others in the community (to drop below 90% on core
team).

Digging out of a single vendor status is hard, and requires making that
your top priority. If teams aren't interested in putting that ahead of
development work, that's fine, but that doesn't make it a sustainable
OpenStack project.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc] establishing project-wide goals

2016-08-01 Thread Sean Dague
On 07/29/2016 04:55 PM, Doug Hellmann wrote:
> One of the outcomes of the discussion at the leadership training
> session earlier this year was the idea that the TC should set some
> community-wide goals for accomplishing specific technical tasks to
> get the projects synced up and moving in the same direction.
> 
> After several drafts via etherpad and input from other TC and SWG
> members, I've prepared the change for the governance repo [1] and
> am ready to open this discussion up to the broader community. Please
> read through the patch carefully, especially the "goals/index.rst"
> document which tries to lay out the expectations for what makes a
> good goal for this purpose and for how teams are meant to approach
> working on these goals.
> 
> I've also prepared two patches proposing specific goals for Ocata
> [2][3].  I've tried to keep these suggested goals for the first
> iteration limited to "finish what we've started" type items, so
> they are small and straightforward enough to be able to be completed.
> That will let us experiment with the process of managing goals this
> time around, and set us up for discussions that may need to happen
> at the Ocata summit about implementation.
> 
> For future cycles, we can iterate on making the goals "harder", and
> collecting suggestions for goals from the community during the forum
> discussions that will happen at summits starting in Boston.
> 
> Doug
> 
> [1] https://review.openstack.org/349068 describe a process for managing 
> community-wide goals
> [2] https://review.openstack.org/349069 add ocata goal "support python 3.5"
> [3] https://review.openstack.org/349070 add ocata goal "switch to oslo 
> libraries"

I like the direction this is headed. And I think for the test items, it
works pretty well.

I'm trying to think about how we'd use a model like this to support
something a little more abstract such as making upgrades easier. Where
we've got a few things that we know get in the way (policy in files,
rootwrap rules, paste ini changes), as well as validation, as well as
configuration changes. And what it looks like for persistently important
items which are going to take more than a cycle to get through.

Definitely seems worth giving it a shot on the current set of items, and
see how it fleshes out.

My only concern at this point is it seems like we're building nested
data structures that people are going to want to parse into some kind of
visualization in RST, which is a sub optimal parsing format. If we know
that people want to parse this in advance, yamling it up might be in
order. Because this mostly looks like it would reduce to one of those
green/yellow/red checker boards by project and task.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] persistently single-vendor projects

2016-08-01 Thread Sean Dague
On 07/31/2016 02:29 PM, Doug Hellmann wrote:
> Excerpts from Steven Dake (stdake)'s message of 2016-07-31 18:17:28 +:
>> Kevin,
>>
>> Just assessing your numbers, the team:diverse-affiliation tag covers what
>> is required to maintain that tag.  It covers more then core reviewers -
>> also covers commits and reviews.
>>
>> See:
>> https://github.com/openstack/governance/blob/master/reference/tags/team_div
>> erse-affiliation.rst
>>
>>
>> I can tell you from founding 3 projects with the team:diverse-affiliation
>> tag (Heat, Magnum, Kolla) team:deverse-affiliation is a very high bar to
>> meet.  I don't think its wise to have such strict requirements on single
>> vendor projects as those objectively defined in team:diverse-affiliation.
>>
>> But Doug's suggestion of timelines could make sense if the timelines gave
>> plenty of time to meet whatever requirements make sense and the
>> requirements led to some increase in diverse affiliation.
> 
> To be clear, I'm suggesting that projects with team:single-vendor be
> given enough time to lose that tag. That does not require them to grow
> diverse enough to get team:diverse-affiliation.

The idea of 3 cycles to loose the single-vendor tag sounds very
reasonable to me. This also is very much along the spirit of the tag in
that it should be one of the top priorities of the team to work on this.
I'd be in favor.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] [nova] [neutron] get_all_bw_counters in the Ironic virt driver

2016-07-29 Thread Sean Dague
On 07/29/2016 02:29 PM, Jay Pipes wrote:
> On 07/28/2016 09:02 PM, Devananda van der Veen wrote:
>> On 07/28/2016 05:40 PM, Brad Morgan wrote:
>>> I'd like to solicit some advice about potentially implementing
>>> get_all_bw_counters() in the Ironic virt driver.
>>>
>>> https://github.com/openstack/nova/blob/master/nova/virt/driver.py#L438
>>> Example Implementation:
>>> https://github.com/openstack/nova/blob/master/nova/virt/xenapi/driver.py#L320
>>>
>>>
>>> I'm ignoring the obvious question about how this data will actually be
>>> collected/fetched as that's probably it's own topic (involving
>>> neutron), but I
>>> have a few questions about the Nova -> Ironic interaction:
>>>
>>> Nova
>>> * Is get_all_bw_counters() going to stick around for the foreseeable
>>> future? If
>>> not, what (if anything) is the replacement?
> 
> I don't think Nova should be in the business of monitoring *any*
> transient metrics at all.
> 
> There are many tools out there -- Nagios, collectd, HEKA, Snap, gnocchi,
> monasca just to name a few -- that can do this work.
> 
> What action is taken if some threshold is reached is entirely
> deployment-dependent and not something that Nova should care about. Nova
> should just expose an API for other services to use to control the guest
> instances under its management, nothing more.

More importantly... *only* xenapi driver implements this, and it's not
exposed over the API. In reality that part of the virt driver layer
should probably be removed.

Like jay said, there are better tools for collecting this than Nova.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] os-virtual-interfaces isn't deprecated in 2.36

2016-07-29 Thread Sean Dague
On 07/28/2016 05:38 PM, Matt Riedemann wrote:
> On 7/28/2016 3:55 PM, Matt Riedemann wrote:
>> For os-attach-interfaces, we need that to attach/detach interfaces to a
>> server, so those actions don't go away with 2.36. We can also list and
>> show interfaces (ports) which is a proxy to neutron, but in this case it
>> seems a tad bit necessary, else to list ports for a given server you
>> have to know to list ports via neutron CLI and filter on
>> device_id=server.uuid.
> 
> On second thought, we could drop the proxy APIs to list/show ports for a
> given server. python-openstackclient could have a convenience CLI for
> listing ports for a server. And the show in os-attach-interfaces takes a
> server id but it's not used, so it's basically pointless and should just
> be replaced with neutron.
> 
> The question is, as these are proxies and the 2.36 microversion was for
> proxy API deprecation, can we still do those in 2.36 even though it's
> already merged? Or do they need to be 2.37? That seems like the more
> accurate thing to do, but then we really have some weird "which is the
> REAL proxy API microversion?" logic going on.
> 
> I think we could move forward with deprecation in novaclient either way.

We should definitely move forward with novaclient CLI deprecations.

We've said that microversions are idempotent, so fixing one in this case
isn't really what we want to do, it should just be another bump, with
things we apparently missed. I'm not sure it's super important that
there is a REAL proxy API microversion. We got most of it in one go, and
as long as we catch the stragglers in 2.39 (let's make that the last
merged one before the release so that we can figure out anything else we
missed, and keep get me a network as 2.37).

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Next steps for proxy API deprecation

2016-07-26 Thread Sean Dague
On 07/26/2016 01:14 PM, Matt Riedemann wrote:
> On 7/26/2016 11:59 AM, Matt Riedemann wrote:
>> Now that the 2.36 microversion change has merged [1], we can work on the
>> python-novaclient changes for this microversion.
>>
>> At the midcycle we agreed [2] to also return a 404 for network APIs,
>> including nova-network (which isn't a proxy), for consistency and
>> further signaling that nova-network is going away.
>>
>> In the client, we agreed to soften the impact for network CLIs by
>> determining if the latest microversion supported will fail (so will we
>> send >=2.36) and rather than fail, send 2.35 instead (if the user didn't
>> specifically specify a different version). However, we'd emit a warning
>> saying this is deprecated and will go away in the first major client
>> release (in Ocata? after nova-network is removed? after Ocata is
>> released?).
>>
>> We should probably just deprecate any CLIs/APIs in python-novaclient
>> today that are part of this server side API change, including network
>> CLIs/APIs in novaclient. The baremetal and image proxies in the client
>> are already deprecated, and the volume proxies were already removed.
>> That leaves the network proxies in the client.
>>
>> From my notes, Dan Smith was going to work on the novaclient changes for
>> 2.36 to not fail and use 2.35 - unless anyone else wants to volunteer to
>> do that work (please speak up).
>>
>> We can probably do the network CLI/API deprecations in the client in
>> parallel to the 2.36 support, but need someone to step up for that. I'll
>> try to get it started this week if no one else does.
>>
>> [1] https://review.openstack.org/#/c/337005/
>> [2] https://etherpad.openstack.org/p/nova-newton-midcycle
>>
> 
> I forgot to mention Tempest. We're going to have to probably put a
> max_microversion cap in several tests in Tempest to cap at 2.35 (or
> change those to use Neutron?). There are also going to be some response
> schema changes like for quota usage/limits, I'm not sure if anyone is
> looking at this yet. We could also get it done after feature freeze on
> 9/2, but I still need to land the get-me-a-network API change which is
> microversion 2.37 and has it's own Tempest test, although that test
> relies on Neutron so I might be OK for the most part.

Is that strictly true? We could also just configure all the jobs for
Nova network to set max microversion at 2.35. That would probably be
more straight forward way of approaching this, and make it a bit more
clear how serious we are here.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc] Equal Chances for all projects (was Plugins for all)

2016-07-26 Thread Sean Dague
On 07/26/2016 08:51 AM, Doug Hellmann wrote:

> 
> Given the amount of in-progress work to address the issue you've
> raised, I'm not convinced we need a global rule or policy. All of
> the teams mentioned are working toward the goal of providing stable
> APIs already, and no one seems to be arguing against that goal. The
> teams doing the work are not dragging their feet, and a rule or
> policy wouldn't make the work go any faster.
> 
> The directions for cross-project teams were given when the bit tent
> change went into effect were to "support all official teams" and
> definitely not "do the work for all official teams." There was also
> no requirement that the support be exactly equal, and it would be
> a mistake to try to require that because the issue is almost always
> lack of resources and not lack of desire.  Volunteers to contribute
> to the work that's needed will do more to help than a one-size-fits-all
> policy.

Yes, exactly.

Like Ihar said earlier, we get all kinds of breaks across out system all
the time. It's a complex system. If you look hard what you'll notice is
that project interactions where there are team members in common break a
bit less. For instance Grenade breaks Nova less than other projects.
oslo.versionedobjects breaks Nova less than oslo.messaging does. Why?
Because even with stable interfaces and tests, a lot of behavior remains
in flux given the patch rate on all projects. And when people understand
both sides of a problem, they are more likely to anticipate a problem
during review that no tests caught and didn't seem to change an interface.

This isn't a thing that gets fixed with policy. It gets fixed with people.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Remove duplicate code using Data Driven Tests (DDT)

2016-07-25 Thread Sean Dague
On 07/25/2016 08:05 AM, Daniel P. Berrange wrote:
> On Mon, Jul 25, 2016 at 07:57:08AM -0400, Sean Dague wrote:
>> On 07/22/2016 11:30 AM, Daniel P. Berrange wrote:
>>> On Thu, Jul 21, 2016 at 07:03:53AM -0700, Matt Riedemann wrote:
>>>> On 7/21/2016 2:03 AM, Bhor, Dinesh wrote:
>>>>
>>>> I agree that it's not a bug. I also agree that it helps in some specific
>>>> types of tests which are doing some kind of input validation (like the 
>>>> patch
>>>> you've proposed) or are simply iterating over some list of values (status
>>>> values on a server instance for example).
>>>>
>>>> Using DDT in Nova has come up before and one of the concerns was hiding
>>>> details in how the tests are run with a library, and if there would be a
>>>> learning curve. Depending on the usage, I personally don't have a problem
>>>> with it. When I used it in manila it took a little getting used to but I 
>>>> was
>>>> basically just looking at existing tests and figuring out what they were
>>>> doing when adding new ones.
>>>
>>> I don't think there's significant learning curve there - the way it
>>> lets you annotate the test methods is pretty easy to understand and
>>> the ddt docs spell it out clearly for newbies. We've far worse things
>>> in our code that create a hard learning curve which people will hit
>>> first :-)
>>>
>>> People have essentially been re-inventing ddt in nova tests already
>>> by defining one helper method and them having multiple tests methods
>>> all calling the same helper with a different dataset. So ddt is just
>>> formalizing what we're already doing in many places, with less code
>>> and greater clarity.
>>>
>>>> I definitely think DDT is easier to use/understand than something like
>>>> testscenarios, which we're already using in Nova.
>>>
>>> Yeah, testscenarios feels little over-engineered for what we want most
>>> of the time.
>>
>> Except, DDT is way less clear (and deterministic) about what's going on
>> with the test name munging. Which means failures are harder to track
>> back to individual tests and data load. So debugging the failures is harder.
> 
> I'm not sure what you think is unclear - given an annotated test:
> 
>@ddt.data({"foo": "test", "availability_zone": "nova1"},
>   {"name": "  test  ", "availability_zone": "nova1"},
>   {"name": "", "availability_zone": "nova1"},
>   {"name": "x" * 256, "availability_zone": "nova1"},
>   {"name": "test", "availability_zone": "x" * 256},
>   {"name": "test", "availability_zone": "  nova1  "},
>   {"name": "test", "availability_zone": ""},
>   {"name": "test", "availability_zone": "nova1", "foo": "bar"})
> def test_create_invalid_create_aggregate_data(self, value):
> 
> It is generated one test for each data item:
> 
>  test_create_invalid_create_aggregate_data_1
>  test_create_invalid_create_aggregate_data_2
>  test_create_invalid_create_aggregate_data_3
>  test_create_invalid_create_aggregate_data_4
>  test_create_invalid_create_aggregate_data_5
>  test_create_invalid_create_aggregate_data_6
>  test_create_invalid_create_aggregate_data_7
>  test_create_invalid_create_aggregate_data_8
> 
> This seems about as obvious as you can possibly get

At least when this was attempted to be introduced into Tempest, the
naming was a lot less clear, maybe it got better. But I still think
milestone 3 isn't the time to start a thing like this.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] os-brick privsep failures and an upgrade strategy?

2016-07-25 Thread Sean Dague
On 07/22/2016 09:20 AM, Angus Lees wrote:
> On Thu, 21 Jul 2016 at 09:27 Sean Dague  <mailto:s...@dague.net>> wrote:
> 
> On 07/12/2016 06:25 AM, Matt Riedemann wrote:
> 
> > We probably aren't doing anything while Sean Dague is on vacation.
> He's
> > back next week and we have the nova/cinder meetups, so I'm planning on
> > talking about the grenade issue in person and hopefully we'll have a
> > plan by the end of next week to move forward.
> 
> After some discussions at the Nova midcycle we threw together an
> approach where we just always allow privsep-helper from oslo.rootwrap.
> 
> https://review.openstack.org/344450
> 
> 
> Were these discussions captured anywhere?  I thought we'd discussed
> alternatives on os-dev, reached a conclusion, implemented the
> changes(*), and verified the results all a month ago - and that we were
> just awaiting nova approval.  So I'm surprised to see this sudden change
> in direction...
> 
> (*) Changes:
> https://review.openstack.org/#/c/329769/
> https://review.openstack.org/#/c/332610/
> mriedem's verification: https://review.openstack.org/#/c/331885/

By agreed we said that - https://review.openstack.org/#/c/332610/ was
the option of last resort if no better option could be figured out. But
then we ran into having to do this again for os-vif. And given the roll
out of privsep it looks like we'll basically have this same exception /
manual update another place in base IaaS for multiple cycles here as
this rolls out.

Which is exactly the opposite of our upgrade vision, which upgrades
should be seamless code rolling forward.

If we only had to do this once, maybe we mea culpa and do it. But we
know we at least have to do this twice, and coordinated nova and neutron
coupling the release. This gets exponentially worse.

After we brought that up in the room, we started going through other
options. Someone brought up "what about making rootwrap always do this
for privsep, instead of manually doing this for every project", and I
volunteered to look at the code to figure out how hard it would be. That
patch is up at https://review.openstack.org/344450.

I think the path forward here is about the following questions:

1) how important are seamless upgrades in our vision?
2) are root wrap rules supposed to be config (which is manually audited
by installers)?
3) is the software supposed to take into account and adapt to the rules
not being there (or disabled by an auditor)?
4) does always letting rootwrap call privsep regress our near term
security in any real way (given the flaws in existing rules)?
5) what will most quickly allow us to transition into a non rootwrap
world, with a privsep architecture that will give us a better security
model?

Making oslo.rootwrap trust privsep seems like the least worst option in
front of us, especially to actually get os-vif out there and deployed
this cycle as well.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Remove duplicate code using Data Driven Tests (DDT)

2016-07-25 Thread Sean Dague
On 07/22/2016 11:30 AM, Daniel P. Berrange wrote:
> On Thu, Jul 21, 2016 at 07:03:53AM -0700, Matt Riedemann wrote:
>> On 7/21/2016 2:03 AM, Bhor, Dinesh wrote:
>>
>> I agree that it's not a bug. I also agree that it helps in some specific
>> types of tests which are doing some kind of input validation (like the patch
>> you've proposed) or are simply iterating over some list of values (status
>> values on a server instance for example).
>>
>> Using DDT in Nova has come up before and one of the concerns was hiding
>> details in how the tests are run with a library, and if there would be a
>> learning curve. Depending on the usage, I personally don't have a problem
>> with it. When I used it in manila it took a little getting used to but I was
>> basically just looking at existing tests and figuring out what they were
>> doing when adding new ones.
> 
> I don't think there's significant learning curve there - the way it
> lets you annotate the test methods is pretty easy to understand and
> the ddt docs spell it out clearly for newbies. We've far worse things
> in our code that create a hard learning curve which people will hit
> first :-)
> 
> People have essentially been re-inventing ddt in nova tests already
> by defining one helper method and them having multiple tests methods
> all calling the same helper with a different dataset. So ddt is just
> formalizing what we're already doing in many places, with less code
> and greater clarity.
> 
>> I definitely think DDT is easier to use/understand than something like
>> testscenarios, which we're already using in Nova.
> 
> Yeah, testscenarios feels little over-engineered for what we want most
> of the time.

Except, DDT is way less clear (and deterministic) about what's going on
with the test name munging. Which means failures are harder to track
back to individual tests and data load. So debugging the failures is harder.

I agree with have a lot of bad patterns in the tests. But I also don't
think that embedding another pattern during milestone 3 is the right
time to do it. At least lets hold until next cycle opens up when there
is more time to actually look at trade offs here.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc][all] Plugins for all

2016-07-20 Thread Sean Dague

On 07/18/2016 06:49 AM, Dmitry Tantsur wrote:

On 07/17/2016 11:04 PM, Jay Pipes wrote:

On 07/14/2016 12:21 PM, Hayes, Graham wrote:

A lot of the effects are hard to see, and are not insurmountable, but
do cause projects to re-invent the wheel.

For example, quotas - there is no way for a project that is not nova,
neutron, cinder to hook into the standard CLI, or UI for setting
quotas.


There *is* no standard CLI or UI for setting quotas.


They can be done as either extra commands
(openstack dns quota set --foo bar) or as custom panels, but not
the way other quotas get set.


This has nothing to do with the big tent and everything to do with the
fact that the community at large has conflated quotas -- i.e. the limit
of a particular class of resource that a user or tenant can consume --
with the usage of a particular class of resource. The two things are not
the same nor do they need to be handled by the same service, IMHO.

I've proposed before that quotas -- i.e. the *limits* for different
resource classes that a consumer of those resources has -- be handled by
the Keystone API. This is the endpoint that I personally think makes the
most sense to house this information.

Usage information is necessarily the domain of the individual service
projects who must control allocation/consumption of resources under
their control. It would be *helpful* to have a set of best-practice
guidelines for projects to follow in safely accounting for consumption
of resources, but "the big tent" has nothing to do with our community
failing to do this. We've failed to do this from the beginning of
OpenStack, well before the big tent was just a spark in the minds of the
TC.


Tempest plugins are another example. Approximately 30 of the 36
current plugins are using resources that are not supposed to be
used, and are an unstable interface.


What "resources" are you referring to above? What is the "unstable
interface" you are referring to? Tempest should only ever be validating
public REST APIs.


Projects in tree in tempest
are at a much better position, as any change to the internal
API will have to be fixed before the gate merges, but other
out of tree plugins are in a place where they can be broken at any
point.


An example here would be super-useful, since as mentioned above, Tempest
should only be validating public HTTP/REST APIs.


Not entirely related example, but still in support of the original point
(IMO): grenade currently does not catch smoke tests coming from tempest
plugins when running after upgrade. It's just one missing call [1], and
it probably would not go unnoticed if Nova tests did not run ;)

[1] https://review.openstack.org/337372


The patch is merged.

Also... missing tests have gone missing for long times on all kinds of 
projects, including nova / neutron. Missing tests require people keeping 
an eye on things, because they don't fail when they disappear.


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] os-brick privsep failures and an upgrade strategy?

2016-07-20 Thread Sean Dague

On 07/12/2016 06:25 AM, Matt Riedemann wrote:


We probably aren't doing anything while Sean Dague is on vacation. He's
back next week and we have the nova/cinder meetups, so I'm planning on
talking about the grenade issue in person and hopefully we'll have a
plan by the end of next week to move forward.


After some discussions at the Nova midcycle we threw together an 
approach where we just always allow privsep-helper from oslo.rootwrap.


https://review.openstack.org/344450

We did a sniff test of this, and it worked to roll over the upgrade 
boundary, without an etc change, and work with osbrick 1.4.0 (currently 
blacklisted because of the upgrade issue). While I realize it wasn't the 
favorite approach by many it works. It's 3 lines of functional change. 
If we land this, release, and bump the minimum, we've got the upgrade 
issue solved in this cycle.


Please take a look and see if we can agree to this path forward.

-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [grenade] upgrades vs rootwrap

2016-07-18 Thread Sean Dague

On 07/04/2016 05:36 AM, Sean McGinnis wrote:

On Mon, Jul 04, 2016 at 01:59:09PM +0200, Thierry Carrez wrote:
[...]

The issue here is that oslo.rootwrap uses config files to determine
what to allow, but those are not really configuration files as far
as the application using them is concerned. Those must match the
code being executed.

So from Grenade perspective, those should really not be considered
configuration files, but application files.

[...]

+1

I have to agree with this perspective. They are config files, but they
are a special type of config file that is closely tied in to the code. I
think we should treat them as application files.

I say we allow these changes for grenade and move forward on this. I
think we all agree we want to move to privsep. As long as we document
this very clearly that these changes need to be made for upgrades, I'm
OK with that.

I would really like to be able to decided on this and move forward. I'm
afraid sticking with rootwrap for another cycle with just confuse things
and compound our issues.


So, can we just put them in python code inline then and abandon etc?

Special config files that we don't want anyone to touch, but we put in 
/etc, aren't really a thing. You really can't have it both ways. Either 
these are in the part of the filesystem where ops expect to change them, 
or they are not.


    -Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][nova-docker] Retiring nova-docker project

2016-07-18 Thread Sean Dague

On 07/07/2016 01:12 PM, Davanum Srinivas wrote:

Folks,

The nova-docker project[1] is barely alive[2]. So i'll kick off the
process of retiring the project [3]

Thanks,
Dims

[1] http://git.openstack.org/cgit/openstack/nova-docker/
[2] http://stackalytics.com/report/contribution/nova-docker/30
[3] http://docs.openstack.org/infra/manual/drivers.html#retiring-a-project


It seems prudent.

Thanks Dims for keeping this effort alive so long. It's unfortunate that 
no one else was up for maintaining this driver.


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Let's kill quota classes (again)

2016-07-18 Thread Sean Dague

On 07/14/2016 08:07 AM, Kevin L. Mitchell wrote:

The original concept of quota classes was to allow the default quotas
applied to a tenant to be a function of the type of tenant.  That is,
say you have a tiered setup, where you have gold-, silver-, and
bronze-level customers, with gold having lots of free quota and bronze
having a small amount of quota.  Rather than having to set the quotas
individually for each tenant you created, the idea is that you set the
_class_ of the tenant, and have quotas associated with the classes.
This also has the advantage that, if someone levels up (or down) to
another class of service, all you do is change the tenant's class, and
the new quotas apply immediately.

(By the way, the turnstile integration was not part of turnstile itself;
there's a turnstile plugin to allow it to integrate with nova that's
quota_class-aware, so you could also apply rate limits this way.)

Personally, it wouldn't break my heart if quota classes went away; I
think this level of functionality, if it seems reasonable to include,
should become part of a more unified quota API (which we're still
struggling to come up with anyway) so that everyone gets the benefit…or
perhaps shares the pain? ;)  Anyway, I'm not aware of anyone using this
functionality, though it might be worth asking about on the operators
list—for curiosity's sake, if nothing else.  It would be interesting to
see if anyone would be interested in the original idea, even if the
current implementation doesn't make sense :)


We've already dropped the hook turnstile was using, so I don't see any 
reason not to drop this bit as well. I don't think it will work for 
anyone with the current code.


I agree that this probably makes way more sense in common quota code 
then buried inside of Nova.


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] About deleting keypairs

2016-07-18 Thread Sean Dague

On 07/18/2016 08:14 AM, Matt Riedemann wrote:


Nova doesn't actually validate the user_id passed into the keypairs API
is valid, does it? Like flavor access and quotas, Nova is given an ID
but doesn't validate it with Keystone. So we don't actually need
Keystone to find these do we?

I'm not saying that's great, we already had a spec approved for Newton
to check the provided user/project ID with keystone for the flavor
access and quotas APIs, we could do the same for keypairs.

You could, however, write a script that deletes keypairs for user_ids
that don't exist in Keystone...


A user can be in more than one project, so delete of users in projects 
automatically has some edge conditions, enough so that I'm not sure we'd 
ever want that automatically.


My suggestion would be a periodic purge of your local records by looking 
up the userids in keystone. The dead keys are doing very little other 
than taking up space, so it's mostly just about compaction, which could 
be run on a weekly basis.


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Questions about instance actions' update and finish

2016-07-01 Thread Sean Dague
On 06/30/2016 08:31 AM, Andrew Laski wrote:
> 
> 
> On Wed, Jun 29, 2016, at 11:11 PM, Matt Riedemann wrote:
>> On 6/29/2016 10:10 PM, Matt Riedemann wrote:
>>> On 6/29/2016 6:40 AM, Andrew Laski wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Jun 28, 2016, at 09:27 PM, Zhenyu Zheng wrote:
>>>>> How about I sync updated_at and created_at in my patch, and leave the
>>>>> finish to the other BP, by this way, I can use updated_at for the
>>>>> timestamp filter I added and it don't need to change again once the
>>>>> finish BP is complete.
>>>>
>>>> Sounds good to me.
>>>>
>>>
>>> It's been a long day so my memory might be fried, but the options we
>>> talked about in the API meeting were:
>>>
>>> 1. Setting updated_at = created_at when the instance action record is
>>> created. Laski likes this, I'm not crazy about it, especially since we
>>> don't do that for anything else.
> 
> I would actually like for us to do this generally. I have the same
> thinking as Ed does elsewhere in this thread, the creation of a record
> is an update of that record. So take my comments as applying to Nova
> overall and not just this issue.

Agree. Also it just simplifies a number of things. We should just start
doing this going forward, and probably put some online data migrations
in place next cycle to update all the old records. Once updated_at can't
be null, we can handle things like this a bit better.

>>> 2. Update the instance action's updated_at when instance action events
>>> are created. I like this since the instance action is like a parent
>>> resource and the event is the child, so when we create/modify an event
>>> we can consider it an update to the parent. Laski thought this might be
>>> weird UX given we don't expose instance action events in the REST API
>>> unless you're an admin. This is also probably not something we'd do for
>>> other related resources like server groups and server group members (but
>>> we don't page on those either right now).
> 
> Right. My concern is just that the ordering of actions can change based
> on events happening which are not visible to the user. However thinking
> about it further we don't really allow multiple actions at once, except
> for a few special cases like delete, so this may not end up affecting
> any ordering as actions are mostly serial. I think this is a fine
> solution for the issue at hand. I just think #1 is a more general
> solution.
> 
>>>
>>> 3. Order the results by updated_at,created_at so that if updated_at
>>> isn't set for older records, created_at will be used. I think we all
>>> agreed in the meeting to do this regardless of #1 or #2 above.

I kind of hate that as the order, because then the marker is going to
have to be really funny double timestamp, right?

I guess that's the one thing I don't see in this patch is a functional
test that actually loads up instance actions and iterates through
demonstrating the pagination.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Questions about instance actions' update and finish

2016-07-01 Thread Sean Dague
On 07/01/2016 10:37 AM, Matt Riedemann wrote:
> On 6/30/2016 11:10 AM, Chris Friesen wrote:
>>
>> For what it's worth, this is how the timestamps work for POSIX
>> filesystems. When you create a file it sets the access/modify/change
>> timestamps to the file creation time.
>>
>> Chris
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> That's a good point.

I would be +2 on setting updated == created on initial create across the
board in the system. I think people actually expect this because they
assume it's like unix time stamps, then get confused when they get None
back.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Deprecated Configuration Option in Nova Mitaka Release

2016-07-01 Thread Sean Dague
On 06/30/2016 01:55 PM, HU, BIN wrote:
> I see, and thank you very much Dan. Also thank you Markus for unreleased 
> release notes.
> 
> Now I understand that it is not a plugin and unstable interface. And there is 
> a new "use_neutron" option for configuring Nova to use Neutron as its network 
> backend.
> 
> When we use Neutron, there are ML2 and ML3 plugins so that we can choose to 
> use different backend providers to actually perform those network functions. 
> For example, integration with ODL.
> 
> Shall we foresee a situation, where user can choose another network backend 
> directly, e.g. ODL, ONOS? Under this circumstance, a stable plugin interface 
> seems needed which can provide end users with more options and flexibility in 
> deployment.
> 
> What do you think?

Neutron is the network API that we've agreed to in OpenStack, and have
worked towards for years. Network backends should exist behind the
OpenStack Network API (Neutron).

If there are challenges in that software stack, then there is an
upstream community to engage there to work with to get what you need out
of that stack. I'd highly recommend that you do so sooner rather than
later on that front.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [grenade] upgrades vs rootwrap

2016-06-28 Thread Sean Dague

On 06/28/2016 01:46 AM, Angus Lees wrote:

Ok, thanks for the in-depth explanation.

My take away is that we need to file any rootwrap updates as exceptions
for now (so releasenotes and grenade scripts).


That is definitely the fall back if there is no better idea. However, we 
should try really hard to figure out if there is a non manual way 
through this. Even if that means some compat code that we keep for a 
release to just bridge the gap.


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [grenade] upgrades vs rootwrap

2016-06-27 Thread Sean Dague
On 06/26/2016 10:02 PM, Angus Lees wrote:
> On Fri, 24 Jun 2016 at 20:48 Sean Dague  <mailto:s...@dague.net>> wrote:
> 
> On 06/24/2016 05:12 AM, Thierry Carrez wrote:
> > I'm adding Possibility (0): change Grenade so that rootwrap
> filters from
> > N+1 are put in place before you upgrade.
> 
> If you do that as general course what you are saying is that every
> installer and install process includes overwriting all of rootwrap
> before every upgrade. Keep in mind we do upstream upgrade as offline,
> which means that we've fully shut down the cloud. This would remove the
> testing requirement that rootwrap configs were even compatible between N
> and N+1. And you think this is theoretical, you should see the patches
> I've gotten over the years to grenade because people didn't see an issue
> with that at all. :)
> 
> I do get that people don't like the constraints we've self imposed, but
> we've done that for very good reasons. The #1 complaint from operators,
> for ever, has been the pain and danger of upgrading. That's why we are
> still trademarking new Juno clouds. When you upgrade Apache, you don't
> have to change your config files.
> 
> 
> In case it got lost, I'm 100% on board with making upgrades safe and
> straightforward, and I understand that grenade is merely a tool to help
> us test ourselves against our process and not an enemy to be worked
> around.  I'm an ops guy proud and true and hate you all for making
> openstack hard to upgrade in the first place :P
> 
> Rootwrap configs need to be updated in line with new rootwrap-using code
> - that's just the way the rootwrap security mechanism works, since the
> security "trust" flows from the root-installed rootwrap config files.
> 
> I would like to clarify what our self-imposed upgrade rules are so that
> I can design code within those constraints, and no-one is answering my
> question so I'm just getting more confused as this thread progresses...
> 
> ***
> What are we trying to impose on ourselves for upgrades for the present
> and near future (ie: while rootwrap is still a thing)?
> ***
> 
> A. Sean says above that we do "offline" upgrades, by which I _think_ he
> means a host-by-host (or even global?) "turn everything (on the same
> host/container) off, upgrade all files on disk for that host/container,
> turn it all back on again".  If this is the model, then we can trivially
> update rootwrap files during the "upgrade" step, and I don't see any
> reason why we need to discuss anything further - except how we implement
> this in grenade.
> 
> B. We need to support a mix of old + new code running on the same
> host/container, running against the same config files (presumably
> because we're updating service-by-service, or want to minimise the
> service-unavailability during upgrades to literally just a process
> restart).  So we need to think about how and when we stage config vs
> code updates, and make sure that any overlap is appropriately allowed
> for (expand-contract, etc).
> 
> C. We would like to just never upgrade rootwrap (or other config) files
> ever again (implying a freeze in as_root command lines, effective ~a
> year ago).  Any config update is an exception dealt with through
> case-by-case process and release notes.
> 
> 
> I feel like the grenade check currently implements (B) with a 6 month
> lead time on config changes, but the "theory of upgrade" doc and our
> verbal policy might actually be (C) (see this thread, eg), and Sean
> above introduced the phrase "offline" which threw me completely into
> thinking maybe we're aiming for (A).  You can see why I'm looking for
> clarification  ;)

Ok, there is theory of what we are striving for, and there is what is
viable to test consistently.

The thing we are shooting for is making the code Continuously
Deployable. Which means the upgrade process should be "pip install -U
$foo && $foo-manage db-sync" on the API surfaces and "pip install -U
$foo; service restart" on everything else.

Logic we can put into the python install process is common logic shared
by all deployment tools, and we can encode it in there. So all
installers just get it.

The challenge is there is no facility for config file management in
python native packaging. Which means that software which *depends* on
config files for new or even working features now moves from the camp of
CDable to manual upgrade needed. What you need to do is in releasenotes,
not in code that's shipped with your software. Release notes are not
scriptable.

So, we've

Re: [openstack-dev] [all] Status of the OpenStack port to Python 3

2016-06-24 Thread Sean Dague
On 06/24/2016 11:48 AM, Doug Hellmann wrote:
> Excerpts from Dmitry Tantsur's message of 2016-06-24 10:59:14 +0200:
>> On 06/23/2016 11:21 PM, Clark Boylan wrote:
>>> On Thu, Jun 23, 2016, at 02:15 PM, Doug Hellmann wrote:
>>>> Excerpts from Thomas Goirand's message of 2016-06-23 23:04:28 +0200:
>>>>> On 06/23/2016 06:11 PM, Doug Hellmann wrote:
>>>>>> I'd like for the community to set a goal for Ocata to have Python
>>>>>> 3 functional tests running for all projects.
>>>>>>
>>>>>> As Tony points out, it's a bit late to have this as a priority for
>>>>>> Newton, though work can and should continue. But given how close
>>>>>> we are to having the initial phase of the port done (thanks Victor!),
>>>>>> and how far we are from discussions of priorities for Ocata, it
>>>>>> seems very reasonable to set a community-wide goal for our next
>>>>>> release cycle.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Doug
>>>>>
>>>>> +1
>>>>>
>>>>> Just think about it for a while. If we get Nova to work with Py3, and
>>>>> everything else is working, including all functional tests in Tempest,
>>>>> then after Otaca, we could even start to *REMOVE* Py2 support after
>>>>> Otaca+1. That would be really awesome to stop all the compat layer
>>>>> madness and use the new features available in Py3.
>>>>
>>>> We'll need to get some input from other distros and from deployers
>>>> before we decide on a timeline for dropping Python 2. For now, let's
>>>> focus on making Python 3 work. Then we can all rejoice while having the
>>>> discussion of how much longer to support Python 2. :-)
>>>>
>>>>>
>>>>> I really would love to ship a full stack running Py3 for Debian Stretch.
>>>>> However, for this, it'd be super helful to have as much visibility as
>>>>> possible. Are we setting a hard deadline for the Otaca release? Or is
>>>>> this just a goal we only "would like" to reach, but it's not really a
>>>>> big deal if we don't reach it?
>>>>
>>>> Let's see what PTLs have to say about planning, but I think if not
>>>> Ocata then we'd want to set one for the P release. We're running
>>>> out of supported lifetime for Python 2.7.
>>>
>>> Keep in mind that there is interest in running OpenStack on PyPy which
>>> is python 2.7. We don't have to continue supporting CPython 2.7
>>> necessarily but we may want to support python 2.7 by way of PyPy.
>>
>> PyPy folks have been working on python 3 support for some time already: 
>> http://doc.pypy.org/en/latest/release-pypy3.3-v5.2-alpha1.html
>> It's an alpha, but by the time we consider dropping Python 2 it will 
>> probably be released :)
> 
> We're targeting Python >=3.4, right now.  We'll have to decide as
> a community whether PyPy support is a sufficient reason to keep
> support for "older" versions (either 2.x or earlier versions of 3).
> Before we can have that discussion, though, we need to actually run on
> Python 3, so let's focus on that and evaluate the landscape of other
> interpreters when the porting work is done.

+1, please don't get ahead of things until there is real full stack
testing running on python3.

It would also be good if some of our operators were running on python 3
and providing feedback that it works in the real world before we even
talk about dropping. Because our upstream testing (even the full stack
testing) only can catch so much.

So next steps:

1) full stack testing of everything we've got on python3 - (are there
volunteers to get that going?)
2) complete Nova port to enable full stack testing on python3 for iaas base
3) encourage operators to deploy with python3 in production
4) gather real world feedback, develop rest of plan

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [grenade] upgrades vs rootwrap

2016-06-24 Thread Sean Dague
On 06/24/2016 05:19 AM, Daniel P. Berrange wrote:
> On Fri, Jun 24, 2016 at 11:12:27AM +0200, Thierry Carrez wrote:
>> No perfect answer here... I'm hesitating between (0), (1) and (4). (4) is
>> IMHO the right solution, but it's a larger change for downstream. (1) is a
>> bit of a hack, where we basically hardcode in rootwrap that it's being
>> transitioned to privsep. That's fine, but only if we get rid of rootwrap
>> soon. So only if we have a plan for (4) anyway. Option (0) is a bit of a
>> hard sell for upgrade procedures -- if we need to take a hit in that area,
>> let's do (4) directly...
>>
>> In summary, I think the choice is between (1)+(4) and doing (4) directly.
>> How doable is (4) in the timeframe we have ? Do we all agree that (4) is the
>> endgame ?
> 
> We've already merged change to privsep to allow nova/cinder/etc to
> initialize the default helper command to use rootwrap:
> 
>   
> https://github.com/openstack/oslo.privsep/commit/9bf606327d156de52c9418d5784cd7f29e243487
> 
> So we just need new release of privsep & add code to nova to initialize
> it and we're sorted.

Actually, I don't think so. Matt ran that test scenario, and we're
missing the rootwrap rule that lets privsep-helper run as root. So we
fail to start the daemon from the unpriv nova compute process post upgrade.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [grenade] upgrades vs rootwrap

2016-06-24 Thread Sean Dague
 about the same kind of approach with
paste, he's going to sniff that one out. Once we get to privsep, I think
we have it solved for rootwrap. But that transition is hard. Because the
existing system was designed without thinking about the upgrade
implications.

> No perfect answer here... I'm hesitating between (0), (1) and (4). (4)
> is IMHO the right solution, but it's a larger change for downstream. (1)
> is a bit of a hack, where we basically hardcode in rootwrap that it's
> being transitioned to privsep. That's fine, but only if we get rid of
> rootwrap soon. So only if we have a plan for (4) anyway. Option (0) is a
> bit of a hard sell for upgrade procedures -- if we need to take a hit in
> that area, let's do (4) directly...
> 
> In summary, I think the choice is between (1)+(4) and doing (4)
> directly. How doable is (4) in the timeframe we have ? Do we all agree
> that (4) is the endgame ?

4 seems like the right endgame to me. Honestly, with the shipped nova
compute policy we could probably rewrite the filesystem during nova
compute start and do this anyway


-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Status of the OpenStack port to Python 3

2016-06-23 Thread Sean Dague
So, given that everything in base iaas is over besides Nova, and there
is some python 3 support in Devstack, before Newton is over one could
get a python 3 (except Nova) job running, and start seeing the fallout
of full stack testing. We could even prioritize functional changes in
Nova to get full stack python 3 working (a lot of what is holding Nova
back is actually unit tests that aren't python 3 clean).

That seems like the next logical step, and I think would help add
incentive to full stack testing to show this actually working outside of
just isolated test suites.

On 06/23/2016 12:58 PM, Davanum Srinivas wrote:
> +1 from me as well Doug! ("community to set a goal for Ocata to have Python
> 3 functional tests running for all projects.")
> 
> -- Dims
> 
> On Thu, Jun 23, 2016 at 12:11 PM, Doug Hellmann  wrote:
>> Excerpts from Thomas Goirand's message of 2016-06-22 10:49:01 +0200:
>>> On 06/22/2016 09:18 AM, Victor Stinner wrote:
>>>> Hi,
>>>>
>>>> Current status: only 3 projects are not ported yet to Python 3:
>>>>
>>>> * nova (76% done)
>>>> * trove (42%)
>>>> * swift (0%)
>>>>
>>>>https://wiki.openstack.org/wiki/Python3
>>>>
>>>> Number of projects already ported:
>>>>
>>>> * 19 Oslo Libraries
>>>> * 4 Development Tools
>>>> * 22 OpenStack Clients
>>>> * 6 OpenStack Libraries (os-brick, taskflow, glance_store, ...)
>>>> * 12 OpenStack services approved by the TC
>>>> * 17 OpenStack services (not approved by the TC)
>>>>
>>>> Raw total: 80 projects. In fact, 3 remaining projects on 83 is only 4%,
>>>> we are so close! ;-)
>>>>
>>>> The next steps are to port the 3 remaining projects and work on
>>>> functional and integration tests on Python 3.
>>>>
>>>> Victor
>>>
>>> Hi Victor,
>>>
>>> Thanks a lot for your efforts on Py3.
>>>
>>> Do you think it looks like possible to have Nova ported to Py3 during
>>> the Newton cycle?
>>>
>>> Cheers,
>>>
>>> Thomas Goirand (zigo)
>>>
>>
>> I'd like for the community to set a goal for Ocata to have Python
>> 3 functional tests running for all projects.
>>
>> As Tony points out, it's a bit late to have this as a priority for
>> Newton, though work can and should continue. But given how close
>> we are to having the initial phase of the port done (thanks Victor!),
>> and how far we are from discussions of priorities for Ocata, it
>> seems very reasonable to set a community-wide goal for our next
>> release cycle.
>>
>> Thoughts?
>>
>> Doug
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 


-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [grenade] upgrades vs rootwrap

2016-06-23 Thread Sean Dague
On 06/23/2016 10:07 AM, Sean McGinnis wrote:
> On Thu, Jun 23, 2016 at 12:19:34AM +, Angus Lees wrote:
>> So how does rootwrap fit into the "theory of upgrade"?
>>
>> The doc talks about deprecating config, but is silent on when new required
>> config (rootwrap filters) should be installed.  By virtue of the way the
>> grenade code works (install N from scratch, attempt to run code from N+1),
>> we effectively have a policy that any new configs are installed at some
>> nebulous time *after* the N+1 code is deployed.  In particular, this means
>> a new rootwrap filter needs to be merged a whole release before that
>> rootwrap entry can be used - and anything else is treated as an "exception"
>> (see for example the nova from-* scripts which have basically updated
>> rootwrap for each release).
>>
>> --
>>
>> Stepping back, I feel like an "expand-contract" style upgrade process
>> involving rootwrap should look something like
>> 0. Update rootwrap to be the union of N and N+1 rootwrap filters,
>> 1. Rolling update from N to N+1 code,
>> 2. Remove N-only rootwrap entries.
>>
>> We could make that a bit easier for deployers by having a sensible
>> deprecation policy that requires our own rootwrap filters for each release
>> are already the union of this release and the last (which indeed is already
>> our policy aiui), and then it would just look like:
>> 0. Install rootwrap filters from N+1
>> 1. Rolling update to N+1 code
> 
> I think effectively this is what we've ended up with in the past.
> 
> We've had this issue for some time. There have been several releases
> where either Cinder drivers or os-brick changes have needed to add
> rootwrap changes. Theoretically we _should_ have hit these problems long
> ago.
> 
> I think the only reason it hasn't come up before is that these changes
> are usually for vendor storage backends. So they never got hit in
> grenade tests since those use LVM. We have third party CI, but grenade
> tests are not a part of that.
> 
> The switch to privsep now has really highlighted this gap. I think we
> need to make this implied constraint clear and have it documented. To
> upgrade we will need to make sure the rootwrap filters are in place
> _before_ we perform any upgrades.

Are we going to have to do this for every service individually as it
moves to privsep? Or is there a way we can do it common once, take the
hit, and everyone moves forward?

For instance, can we get oslo.rootwrap to make an exception, in code,
for privsep-helper? Thereby not having to touch a config file in etc to
roll forward.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-22 Thread Sean Dague
On 06/22/2016 03:13 PM, Jay Faulkner wrote:
> 
> 
> On 6/22/16 12:01 PM, Sean Dague wrote:
>> On 06/22/2016 02:37 PM, Chris Hoge wrote:
>>>> On Jun 22, 2016, at 11:24 AM, Sean Dague  wrote:
>>>>
>>>> On 06/22/2016 01:59 PM, Chris Hoge wrote:
>>>>>> On Jun 20, 2016, at 5:10 AM, Sean Dague >>>>> <mailto:s...@dague.net>> wrote:
>>>>>>
>>>>>> On 06/14/2016 07:19 PM, Chris Hoge wrote:
>>>>>>>> On Jun 14, 2016, at 3:59 PM, Edward Leafe >>>>>>> <mailto:e...@leafe.com>> wrote:
>>>>>>>>
>>>>>>>> On Jun 14, 2016, at 5:50 PM, Matthew Treinish >>>>>>> <mailto:mtrein...@kortar.org>> wrote:
>>>>>>>>
>>>>>>>>> But, if we add another possible state on the defcore side like
>>>>>>>>> conditional pass,
>>>>>>>>> warning, yellow, etc. (the name doesn't matter) which is used to
>>>>>>>>> indicate that
>>>>>>>>> things on product X could only pass when strict validation was
>>>>>>>>> disabled (and
>>>>>>>>> be clear about where and why) then my concerns would be alleviated.
>>>>>>>>> I just do
>>>>>>>>> not want this to end up not being visible to end users trying to
>>>>>>>>> evaluate
>>>>>>>>> interoperability of different clouds using the test results.
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Don't fail them, but don't cover up their incompatibility, either.
>>>>>>>> -- Ed Leafe
>>>>>>> That’s not my proposal. My requirement is that vendors who want to do
>>>>>>> this
>>>>>>> state exactly which APIs are sending back additional data, and that this
>>>>>>> information be published.
>>>>>>>
>>>>>>> There are different levels of incompatibility. A response with
>>>>>>> additional data
>>>>>>> that can be safely ignored is different from a changed response that
>>>>>>> would
>>>>>>> cause a client to fail.
>>>>>> It's actually not different. It's really not.
>>>>>>
>>>>>> This idea that it's safe to add response data is based on an assumption
>>>>>> that software versions only move forward. If you have a single deploy of
>>>>>> software, that's fine.
>>>>>>
>>>>>> However as noted, we've got production clouds on Juno <-> Mitaka in the
>>>>>> wild. Which means if we want to support horizontal transfer between
>>>>>> clouds, the user experienced timeline might be start on a Mitaka cloud,
>>>>>> then try to move to Juno. So anything added from Juno -> Mitaka without
>>>>>> signaling has exactly the same client breaking behavior as removing
>>>>>> attributes.
>>>>>>
>>>>>> Which is why microversions are needed for attribute adds.
>>>>> I’d like to note that Nova v2.0 is still a supported API, which
>>>>> as far as I understand allows for additional attributes and
>>>>> extensions. That Tempest doesn’t allow for disabling strict
>>>>> checking when using a v2.0 endpoint is a problem.
>>>>>
>>>>> The reporting of v2.0 in the Marketplace (which is what we do
>>>>> right now) is also a signal to a user that there may be vendor
>>>>> additions to the API.
>>>>>
>>>>> DefCore doesn’t disallow the use of a 2.0 endpoint as part
>>>>> of the interoperability standard.
>>>> This is a point of confusion.
>>>>
>>>> The API definition did not allow that. The implementation of the API
>>>> stack did.
>>> And downstream vendors took advantage of that. We may
>>> not like it, but it’s a reality in the current ecosystem.
>> And we started saying "stop it" 2 years ago. And we've consistently been
>> saying stop it all along. And now it's gone.
>>
>> And yes, for people that did not get ahead of this issue and engage the
>> community, it now hurts. But this has been a quite long process.
> I don't wanna wade

Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-22 Thread Sean Dague
On 06/22/2016 02:37 PM, Chris Hoge wrote:
> 
>> On Jun 22, 2016, at 11:24 AM, Sean Dague  wrote:
>>
>> On 06/22/2016 01:59 PM, Chris Hoge wrote:
>>>
>>>> On Jun 20, 2016, at 5:10 AM, Sean Dague >>> <mailto:s...@dague.net>> wrote:
>>>>
>>>> On 06/14/2016 07:19 PM, Chris Hoge wrote:
>>>>>
>>>>>> On Jun 14, 2016, at 3:59 PM, Edward Leafe >>>>> <mailto:e...@leafe.com>> wrote:
>>>>>>
>>>>>> On Jun 14, 2016, at 5:50 PM, Matthew Treinish >>>>> <mailto:mtrein...@kortar.org>> wrote:
>>>>>>
>>>>>>> But, if we add another possible state on the defcore side like
>>>>>>> conditional pass,
>>>>>>> warning, yellow, etc. (the name doesn't matter) which is used to
>>>>>>> indicate that
>>>>>>> things on product X could only pass when strict validation was
>>>>>>> disabled (and
>>>>>>> be clear about where and why) then my concerns would be alleviated.
>>>>>>> I just do
>>>>>>> not want this to end up not being visible to end users trying to
>>>>>>> evaluate
>>>>>>> interoperability of different clouds using the test results.
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> Don't fail them, but don't cover up their incompatibility, either.
>>>>>> -- Ed Leafe
>>>>>
>>>>> That’s not my proposal. My requirement is that vendors who want to do
>>>>> this
>>>>> state exactly which APIs are sending back additional data, and that this
>>>>> information be published.
>>>>>
>>>>> There are different levels of incompatibility. A response with
>>>>> additional data
>>>>> that can be safely ignored is different from a changed response that
>>>>> would
>>>>> cause a client to fail.
>>>>
>>>> It's actually not different. It's really not.
>>>>
>>>> This idea that it's safe to add response data is based on an assumption
>>>> that software versions only move forward. If you have a single deploy of
>>>> software, that's fine.
>>>>
>>>> However as noted, we've got production clouds on Juno <-> Mitaka in the
>>>> wild. Which means if we want to support horizontal transfer between
>>>> clouds, the user experienced timeline might be start on a Mitaka cloud,
>>>> then try to move to Juno. So anything added from Juno -> Mitaka without
>>>> signaling has exactly the same client breaking behavior as removing
>>>> attributes.
>>>>
>>>> Which is why microversions are needed for attribute adds.
>>>
>>> I’d like to note that Nova v2.0 is still a supported API, which
>>> as far as I understand allows for additional attributes and
>>> extensions. That Tempest doesn’t allow for disabling strict
>>> checking when using a v2.0 endpoint is a problem.
>>>
>>> The reporting of v2.0 in the Marketplace (which is what we do
>>> right now) is also a signal to a user that there may be vendor
>>> additions to the API.
>>>
>>> DefCore doesn’t disallow the use of a 2.0 endpoint as part
>>> of the interoperability standard.
>>
>> This is a point of confusion.
>>
>> The API definition did not allow that. The implementation of the API
>> stack did.
> 
> And downstream vendors took advantage of that. We may
> not like it, but it’s a reality in the current ecosystem.

And we started saying "stop it" 2 years ago. And we've consistently been
saying stop it all along. And now it's gone.

And yes, for people that did not get ahead of this issue and engage the
community, it now hurts. But this has been a quite long process.

>> In Liberty the v2.0 API is optionally provided by a different backend
>> stack that doesn't support extensions.
>> In Mitaka it is default v2.0 API on a non extensions backend
>> In Newton the old backend is deleted.
>>
>> From Newton forward there is still a v2.0 API, but all the code hooks
>> that provided facilities for extensions are gone.
> 
> It’s really important that the current documentation reflect the
> code and intent of the dev team. As of writing this e-mail, 
> 
> "• v2 (SUPPORTED) and v2 extensions (SUPPORTED) (Will
> be deprecated in the near future.)”[1]
> 
> Even with this being removed in Newton, DefCore still has
> to allow for it in every supported version.

The v2 extensions link there, you will notice, is upstream extensions.
All of which default on for the new code stack.

Everything documented there still works on the new code stack. The v2 +
v2 extensions linked there remains supported in Newton.

The wording on this page should be updated, it is in the Nova developer
docs, intended for people working on Nova upstream. They lag a bit from
where reality is, as does documentation everywhere.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-22 Thread Sean Dague
On 06/22/2016 01:59 PM, Chris Hoge wrote:
> 
>> On Jun 20, 2016, at 5:10 AM, Sean Dague > <mailto:s...@dague.net>> wrote:
>>
>> On 06/14/2016 07:19 PM, Chris Hoge wrote:
>>>
>>>> On Jun 14, 2016, at 3:59 PM, Edward Leafe >>> <mailto:e...@leafe.com>> wrote:
>>>>
>>>> On Jun 14, 2016, at 5:50 PM, Matthew Treinish >>> <mailto:mtrein...@kortar.org>> wrote:
>>>>
>>>>> But, if we add another possible state on the defcore side like
>>>>> conditional pass,
>>>>> warning, yellow, etc. (the name doesn't matter) which is used to
>>>>> indicate that
>>>>> things on product X could only pass when strict validation was
>>>>> disabled (and
>>>>> be clear about where and why) then my concerns would be alleviated.
>>>>> I just do
>>>>> not want this to end up not being visible to end users trying to
>>>>> evaluate
>>>>> interoperability of different clouds using the test results.
>>>>
>>>> +1
>>>>
>>>> Don't fail them, but don't cover up their incompatibility, either.
>>>> -- Ed Leafe
>>>
>>> That’s not my proposal. My requirement is that vendors who want to do
>>> this
>>> state exactly which APIs are sending back additional data, and that this
>>> information be published.
>>>
>>> There are different levels of incompatibility. A response with
>>> additional data
>>> that can be safely ignored is different from a changed response that
>>> would
>>> cause a client to fail.
>>
>> It's actually not different. It's really not.
>>
>> This idea that it's safe to add response data is based on an assumption
>> that software versions only move forward. If you have a single deploy of
>> software, that's fine.
>>
>> However as noted, we've got production clouds on Juno <-> Mitaka in the
>> wild. Which means if we want to support horizontal transfer between
>> clouds, the user experienced timeline might be start on a Mitaka cloud,
>> then try to move to Juno. So anything added from Juno -> Mitaka without
>> signaling has exactly the same client breaking behavior as removing
>> attributes.
>>
>> Which is why microversions are needed for attribute adds.
> 
> I’d like to note that Nova v2.0 is still a supported API, which
> as far as I understand allows for additional attributes and
> extensions. That Tempest doesn’t allow for disabling strict
> checking when using a v2.0 endpoint is a problem.
> 
> The reporting of v2.0 in the Marketplace (which is what we do
> right now) is also a signal to a user that there may be vendor
> additions to the API.
> 
> DefCore doesn’t disallow the use of a 2.0 endpoint as part
> of the interoperability standard.

This is a point of confusion.

The API definition did not allow that. The implementation of the API
stack did.

In Liberty the v2.0 API is optionally provided by a different backend
stack that doesn't support extensions.
In Mitaka it is default v2.0 API on a non extensions backend
In Newton the old backend is deleted.

From Newton forward there is still a v2.0 API, but all the code hooks
that provided facilities for extensions are gone.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Vendordata improvements

2016-06-22 Thread Sean Dague
On 06/22/2016 09:03 AM, Matt Riedemann wrote:
> On 6/21/2016 12:53 AM, Michael Still wrote:
>> So, https://review.openstack.org/#/c/317739 is basically done I think.
>> I'm after people's thoughts on:
>>
>>  - I need to do some more things, as described in the commit message.
>> Are we ok with them being in later patches to get reviews moving on this?
> 
> I'd be OK with caching/performance improvements in subsequent changes.
> Docs on this are going to be important to land, so they could be
> separate but if this is going to get into Newton I'd want the docs to be
> in Newton also.
> 
>>
>>  - I'm unsure what level of tempest testing makes sense here. How much
>> would you like to see? Do we need to add a vendordata REST service to
>> devstack? That might be complicated in the amount of time available...
> 
> I don't think Tempest tests anything from the metadata API service. We
> only have that running in a handful of jobs (the postgres job is the
> main one). We could probably write a test though that ssh's into a guest
> and then pulls the data from the metadata service. I'm not sure what
> you'd populate into the dynamic vendor data endpoint though, maybe just
> test data in devstack?
> 
> I think we should have functional tests in the Nova tree for this at
> least - how feasible would that be? In other words, would we have to
> stub out stuff to the point that it would be a useless test in nova's
> functional test tree?

I'm pretty sure this could be tested reasonably well in Nova's
functional test tree. You could have a real md server, and data behind
it. The only stubbing would be the access path because you'll be hitting
it from localhost instead of a server ip. But that shouldn't invalidate
it too substantially.

-Sean

> 
>>
>> Michael
>>
>> -- 
>> Rackspace Australia
>>
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> 


-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [requirements][packaging] Normalizing requirements file

2016-06-22 Thread Sean Dague
On 06/22/2016 01:36 AM, Haïkel wrote:
> 2016-06-22 7:23 GMT+02:00 Tony Breeds :
>>
>> I'm fine with doign something like this.  I wrote [1] some time ago but 
>> didn't
>> push on it as I needed to verify that this wouldn't create a "storm" of
>> pointless updates that just reorder things in every projects 
>> *requirements.txt.
>>
>> I think the first step is to get the 'tool' added to the requirements repo to
>> make it easy to run again when things get out of wack.
>>
>> perhaps openstack_requirements/cmds/normalize ?
>>
> 
> Thanks Swapnil and Tony for your positive comments.
> 
> I didn't submit the script as I wanted to see in real conditions, how
> well it fare and
> get feedback from my peers, first. I'll submit the script in a separate 
> review.
> 
>> we can bikeshed on the output format / incrementally improve things if we 
>> have
>> a common base.
>>
> 
> makes sense, I tried to stay as close to the main current style
> 
> Regards,
> H.
> 
>> So I think that's a -1 on your review as it stands until we have the tool 
>> merged.
>>
>> Yours Tony.
>>
>> [1] https://gist.github.com/tbreeds/f250b964383922bdea4645740ae4b195

The reason the requirements lines are in a completely odd order is
because that's the string representation from pip/pkg_resources. It
might be good to get that fixed at the same time, because the resolver
work in pip gets harder to compare results if our order and their order
for string representation is different.

And I agree, the upstream string order is kind of madness. :)

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Placement API WSGI code -- let's just use flask

2016-06-21 Thread Sean Dague
On 06/21/2016 10:10 AM, Clint Byrum wrote:
> Excerpts from Sean Dague's message of 2016-06-21 08:00:50 -0400:
>> On 06/21/2016 07:39 AM, Jay Pipes wrote:
>>> On 06/21/2016 05:43 AM, Sylvain Bauza wrote:
>>>> Le 21/06/2016 10:04, Chris Dent a écrit :
>>>>> On Mon, 20 Jun 2016, Jay Pipes wrote:
>>>>>
>>>>>> Flask seems to be the most widely used and known WSGI framework so
>>>>>> for consistency's sake, I'm recommending we just use it and not rock
>>>>>> this boat. There are more important things to get hung up on than
>>>>>> this battle right now.
>>>>>
>>>>> That seems perfectly reasonable. My main goal in starting the
>>>>> discussion was to ensure that we reach some kind of consensus,
>>>>> whatever it might be[1]. It won't be too much of an ordeal to
>>>>> turn the existing pure WSGI stuff into Flask stuff.
>>>>>
>>>>> From my standpoint doing the initial development in straight WSGI
>>>>> was a win as it allowed for a lot of clarity from the inside out.
>>>>> Now that that development has shown the shape of the API we can
>>>>> do what we need to do to make it clear from outside in.
>>>>>
>>>>> Next question: There's some support for not using Paste and
>>>>> paste.ini. Is anyone opposed to that?
>>>>>
>>>>
>>>> Given Flask is not something we support yet in Nova, could we discuss on
>>>> that during either a Nova meeting, or maybe wait for the midcycle ?
>>>
>>> I really don't want to wait for the mid-cycle. Happy to discuss in the
>>> Nova meeting, but my preference is to have Chris just modify his patch
>>> series to use Flask now and review it.
>>>
>>>> To be honest, Chris and you were saying that you don't like Flask, and
>>>> I'm a bit agreeing with you. Why now it's a good possibility ?
>>>
>>> Because Doug persuaded me that the benefits of being consistent with
>>> what the community is using outweigh my (and Chris') personal misgivings
>>> about the particular framework.
>>
>> Just to be clear
>>
>> http://codesearch.openstack.org/?q=Flask%3E%3D0.10&i=nope&files=&repos=
>>
>> Flask is used by 2 (relatively new) projects in OpenStack
>>
>> If we look at the iaas base layer:
>>
>> Keystone - custom WSGI with Routes / Paste
>> Glance - WSME + Routes / Paste
>> Cinder - custom WSGI with Routes / Paste
>> Neutron - pecan + Routes / Paste
>> Nova - custom WSGI with Routes / Paste
>>
> 
> When I see "custom WSGI" I have a few thoughts:
> 
> * custom == special snowflake. But REST API's aren't exactly novel.
> 
> * If using a framework means not writing or cargo culting any custom
> WSGI code, that seems like a win for maintainability from the get go.
> 
> * If using a framework means handling errors more consistently, that
> seems like a win for operators.
> 
> * I don't have a grasp on how much custom WSGI code is actually
> involved. That would help us all evaluate the meaning of the statements
> above (both yours, and mine).

And my point is, it's not actually much. Because you still have to do
paste, and you still have to do request validation, and you still have
to actually right controllers and views. Routes has a restful resource
model -
https://routes.readthedocs.io/en/latest/restful.html#restful-services -
which is really not more complicated than the router decorators you get
with other services.

The places that we have large complicated wsgi flows is because we
supported extensions that can modify requests/responses, or the whole
XML/JSON content switching (which was a nightmare). All of these things
are being deleted.

If you actually look at the Nova patches that Chris is building, the
logic that's the wsgi "framework" is quite small and the application
logic is pretty much what it is going to be in any framework -
https://review.openstack.org/#/c/329152/11/nova/api/openstack/placement/handlers/inventory.py

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Version header for OpenStack microversion support

2016-06-21 Thread Sean Dague
On 06/21/2016 10:47 AM, Monty Taylor wrote:

> 
> I'll agree with Clint here, and give an example.
> 
> When I talk to Nova and get a detail record for a server, Nova talks to
> Neutron and puts data that it receives into the addresses dict on the
> server record. This is not the neutron data structure. In fact, it has
> some information from the Network and some from the Port (it would be
> helpful if it had _more_ info, but that's not the point here)
> 
> In any case, the data structure returned by Nova is not related in any
> way to the version of Neutron that nova is talking to - nor should it be.
> 
> Here's an example (in yaml not json)
> 
>   addresses:
> GATEWAY_NET:
> - OS-EXT-IPS-MAC:mac_addr: fa:16:3e:ea:d8:0d
>   OS-EXT-IPS:type: fixed
>   addr: 172.99.106.178
>   version: 4
> 
> If you want a neutron record, you'll talk to neutron.

That's all well and good today, with all the things that we know about
today. And says nothing about what the Neutron API looks like in 6 years
time. Let's say that Neutron decides in 2020 that "fixed" is a non
meaningful name, and stops using it.

We just did a transition in Nova interacting with Glance where we *could
not* guarantee the semantics of the interaction from before. We decided
to *shrug* and just break it, because the only other option was to pin
to Glance v1 API for eternity.

So just because you can't imagine a situation right now where this isn't
a problem, doesn't mean that it's not going to hit you. And the API is a
place where we don't really get do overs without hurting users.

...

And getting back to the point of the argument it's all about:

OpenStack-API-Version: compute 2.11

vs.

OpenStack-API-Version: 2.11

8 bytes to be more explicit on our ACK, and to allow flexibility for
composite actions in the future (which may never be used, so 8 bytes is
our cost).

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Placement API WSGI code -- let's just use flask

2016-06-21 Thread Sean Dague
On 06/21/2016 10:11 AM, Clint Byrum wrote:
> Excerpts from Sean Dague's message of 2016-06-21 09:10:00 -0400:
>> The amount of wsgi glue above Routes / Paste is pretty minimal (after
>> you get rid of all the extensions facilities).
>>
>> Templating and Session handling are things we don't need. We're not a
>> webapp, we're a REST service. Saying that using a web app framework is
>> better than a little bit of wsgi glue seems weird to me.
>>
> 
> Actually we do have sessions. We just call them "tokens".

But that's not traditional sessions that use cookies and keep some
persistent state over the course of the session (besides auth). Which is
the kind of session support that these frameworks tend to provide.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Placement API WSGI code -- let's just use flask

2016-06-21 Thread Sean Dague
On 06/21/2016 08:42 AM, Doug Hellmann wrote:
> Excerpts from Sean Dague's message of 2016-06-21 08:00:50 -0400:
>> On 06/21/2016 07:39 AM, Jay Pipes wrote:
>>> On 06/21/2016 05:43 AM, Sylvain Bauza wrote:
>>>> Le 21/06/2016 10:04, Chris Dent a écrit :
>>>>> On Mon, 20 Jun 2016, Jay Pipes wrote:
>>>>>
>>>>>> Flask seems to be the most widely used and known WSGI framework so
>>>>>> for consistency's sake, I'm recommending we just use it and not rock
>>>>>> this boat. There are more important things to get hung up on than
>>>>>> this battle right now.
>>>>>
>>>>> That seems perfectly reasonable. My main goal in starting the
>>>>> discussion was to ensure that we reach some kind of consensus,
>>>>> whatever it might be[1]. It won't be too much of an ordeal to
>>>>> turn the existing pure WSGI stuff into Flask stuff.
>>>>>
>>>>> From my standpoint doing the initial development in straight WSGI
>>>>> was a win as it allowed for a lot of clarity from the inside out.
>>>>> Now that that development has shown the shape of the API we can
>>>>> do what we need to do to make it clear from outside in.
>>>>>
>>>>> Next question: There's some support for not using Paste and
>>>>> paste.ini. Is anyone opposed to that?
>>>>>
>>>>
>>>> Given Flask is not something we support yet in Nova, could we discuss on
>>>> that during either a Nova meeting, or maybe wait for the midcycle ?
>>>
>>> I really don't want to wait for the mid-cycle. Happy to discuss in the
>>> Nova meeting, but my preference is to have Chris just modify his patch
>>> series to use Flask now and review it.
>>>
>>>> To be honest, Chris and you were saying that you don't like Flask, and
>>>> I'm a bit agreeing with you. Why now it's a good possibility ?
>>>
>>> Because Doug persuaded me that the benefits of being consistent with
>>> what the community is using outweigh my (and Chris') personal misgivings
>>> about the particular framework.
>>
>> Just to be clear
>>
>> http://codesearch.openstack.org/?q=Flask%3E%3D0.10&i=nope&files=&repos=
>>
>> Flask is used by 2 (relatively new) projects in OpenStack
>>
>> If we look at the iaas base layer:
>>
>> Keystone - custom WSGI with Routes / Paste
>> Glance - WSME + Routes / Paste
>> Cinder - custom WSGI with Routes / Paste
>> Neutron - pecan + Routes / Paste
>> Nova - custom WSGI with Routes / Paste
>>
>> I honestly don't think raw WSGI is a bad choice here. People are going
>> to be pretty familiar with it in related projects at this level.
>>
>> Using selector instead of Routes makes things different for unclear
>> gain. Sticking with Routes seems more prudent.
>>
>> Doing Flask is fine, but do it because we think that's the way things
>> should be done, not because it's a common in our community, which it
>> clearly is not. The common pattern is custom WSGI + Routes / Paste (at
>> least at this layer in the stack).
>>
>> -Sean
>>
> 
> As I told Jay, I don't care which specific framework is used. I
> care about the fact that while we're trying to get other projects
> to standardize on frameworks supported upstream so we have tools
> with good documentation and we carry less code directly in this
> community, we have consistently had a hard time convincing the nova
> team to choose one instead of building one.
> 
> Jay didn't like the object-dispatch model used in Pecan, so I pointed
> out that Flask is also in use elsewhere. The fact that Flask is not yet
> widespread indicates that project teams are not needlessly rewriting
> existing API services, rather than lack of acceptance. If you don't like
> either Flaks or Pecan, look at Pyramid or Pylons or one of the others.
> But please stop building new frameworks that make your project so
> completely different from everything else in the Python ecosystem.

The amount of wsgi glue above Routes / Paste is pretty minimal (after
you get rid of all the extensions facilities).

Templating and Session handling are things we don't need. We're not a
webapp, we're a REST service. Saying that using a web app framework is
better than a little bit of wsgi glue seems weird to me.

Falcon looks like the only thing out there which is really stripped down
to this little bit of glue layer. So if the answer is "must use
framework" that seems like the right answer. However, Routes + Paste is
really the framework we are using broadly in OpenStack. And a lot of the
common middleware assume that.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Placement API WSGI code -- let's just use flask

2016-06-21 Thread Sean Dague
On 06/21/2016 07:39 AM, Jay Pipes wrote:
> On 06/21/2016 05:43 AM, Sylvain Bauza wrote:
>> Le 21/06/2016 10:04, Chris Dent a écrit :
>>> On Mon, 20 Jun 2016, Jay Pipes wrote:
>>>
>>>> Flask seems to be the most widely used and known WSGI framework so
>>>> for consistency's sake, I'm recommending we just use it and not rock
>>>> this boat. There are more important things to get hung up on than
>>>> this battle right now.
>>>
>>> That seems perfectly reasonable. My main goal in starting the
>>> discussion was to ensure that we reach some kind of consensus,
>>> whatever it might be[1]. It won't be too much of an ordeal to
>>> turn the existing pure WSGI stuff into Flask stuff.
>>>
>>> From my standpoint doing the initial development in straight WSGI
>>> was a win as it allowed for a lot of clarity from the inside out.
>>> Now that that development has shown the shape of the API we can
>>> do what we need to do to make it clear from outside in.
>>>
>>> Next question: There's some support for not using Paste and
>>> paste.ini. Is anyone opposed to that?
>>>
>>
>> Given Flask is not something we support yet in Nova, could we discuss on
>> that during either a Nova meeting, or maybe wait for the midcycle ?
> 
> I really don't want to wait for the mid-cycle. Happy to discuss in the
> Nova meeting, but my preference is to have Chris just modify his patch
> series to use Flask now and review it.
> 
>> To be honest, Chris and you were saying that you don't like Flask, and
>> I'm a bit agreeing with you. Why now it's a good possibility ?
> 
> Because Doug persuaded me that the benefits of being consistent with
> what the community is using outweigh my (and Chris') personal misgivings
> about the particular framework.

Just to be clear

http://codesearch.openstack.org/?q=Flask%3E%3D0.10&i=nope&files=&repos=

Flask is used by 2 (relatively new) projects in OpenStack

If we look at the iaas base layer:

Keystone - custom WSGI with Routes / Paste
Glance - WSME + Routes / Paste
Cinder - custom WSGI with Routes / Paste
Neutron - pecan + Routes / Paste
Nova - custom WSGI with Routes / Paste

I honestly don't think raw WSGI is a bad choice here. People are going
to be pretty familiar with it in related projects at this level.

Using selector instead of Routes makes things different for unclear
gain. Sticking with Routes seems more prudent.

Doing Flask is fine, but do it because we think that's the way things
should be done, not because it's a common in our community, which it
clearly is not. The common pattern is custom WSGI + Routes / Paste (at
least at this layer in the stack).

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Version header for OpenStack microversion support

2016-06-21 Thread Sean Dague
On 06/21/2016 01:59 AM, Clint Byrum wrote:
> Excerpts from Edward Leafe's message of 2016-06-20 20:41:56 -0500:
>> On Jun 18, 2016, at 9:03 AM, Clint Byrum  wrote:
>>
>>> Whatever API version is used behind the compute API is none of the user's
>>> business.
>>
>> Actually, yeah, it is.
>>
>> If I write an app or a tool that expects to send information in a certain 
>> format, and receive responses in a certain format, I don't want that to 
>> change when the cloud operator upgrades their system. I only want things to 
>> change when I specifically request that they change by specifying a new 
>> microversion.
>>
> 
> The things I get back in the compute API are the purview of the compute
> API, and nothing else.
> 
> Before we go too far down this road, is there actually an example of
> one API providing a proxy to another directly? If so, is it something
> we think is actually a good idea?

There are a ton of pure proxies in Nova, which are now getting
deprecated. We do have semantic break on the images proxy in Newton
because Glance v2 has different data restrictions than Glance v1.
(metadata keys are now case sensitive, and certain properties are now
reserve words).

> Because otherwise, the API I'm talking to needs to be clear about what
> it does and does not emit and/or accept. That contract would just be
> the microversion of the API I'm talking to.

Which is fine and good in theory, and it's the theory that we're working
on. But some resources, like servers, are pretty useless without network
information, which isn't owned by Nova any more. While I don't currently
anticipate a way we couldn't mash whatever we get from Neutron into the
current format, I also have been surprised enough by future software
evolution to feel more comfortable that we have a backup plan that
includes a signaling mechanism should we need it.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [api] [placement] strategy for placement api structure

2016-06-20 Thread Sean Dague
On 06/17/2016 12:14 PM, Chris Dent wrote:
> On Fri, 17 Jun 2016, Sylvain Bauza wrote:
> 
>> In the review, you explain why you don't trust Routes and I respect
>> that. That said, are those issues logged as real problems for our API
>> consumers, which are mostly client libraries that we own and other
>> projects we know, like Horizon ?
> 
> The implication of your question here is that it is okay to do HTTP
> incorrectly if people don't report problems with that lack of
> correctness?
>
>> If that is a problem for those, is there something we could improve,
>> instead of just getting rid of it ?
> 
> When I found the initial problem with Routes, it was because I was
> doing some intial nova testing (with gabbi-tempest[1]) and
> discovered it wasn't returning a 405 when it should. I made a bug
> 
> https://bugs.launchpad.net/nova/+bug/1567970
> 
> and tried to fix it but Routes fought me. If someone else can figure
> it out more power to them.
> 
> In any case selector's behavior in this case is just better. Better
> is better, right?

That's a Nova bug, is there an upstream Routes bug for that? I didn't
see one in looking around. While Routes isn't a super quick upstream,
they have merged our fixes in the past.

Better on what axis? This does add another way that people need to learn
to do this thing that they've all been doing another way. Largely to
address a set of issues that are theoretical to our consumers.
http://mcfunley.com/choose-boring-technology

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [requirements][all] VOTE to expand the Requirements Team

2016-06-20 Thread Sean Dague
On 06/16/2016 10:44 AM, Davanum Srinivas wrote:
> Folks,
> 
> At Austin the Release Management team reached a consensus to spin off
> with some new volunteers to take care of the requirements process and
> repository [1]. The following folks showed up and worked with me on
> getting familiar with the issues/problems/tasks (see [1] and [2]) and
> help with the day to day work.
> 
> Matthew Thode (prometheanfire)
> Dirk Mueller (dirk)
> Swapnil Kulkarni (coolsvap)
> Tony Breeds (tonyb)
> Thomas Bechtold (tbechtold)
> 
> So, please cast your VOTE to grant them +2/core rights on the
> requirements repository and keep up the good work w.r.t speeding up
> reviews, making sure new requirements don't break etc.
> 
> Also, please note that Thierry has been happy enough with our work to
> step down from core responsibilities :) Many thanks Thierry for
> helping with this effort and guidance. I'll make all the add/remove to
> the requirements-core team when this VOTE passes.

+1



-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Version header for OpenStack microversion support

2016-06-20 Thread Sean Dague
On 06/18/2016 06:32 AM, Jamie Lennox wrote:
> Quick question: why do we need the service type or name in there? You
> really should know what API you're talking to already and it's just
> something that makes it more difficult to handle all the different APIs
> in a common way.

It is also extremely useful in wire interactions to be explicit so that
you know for sure you are interacting with the thing you think you are.
There was also the potential question of compound API operations (a Nova
call that calls other microversioned services that may impact
representation) and whether that may need to be surfaced to the user.
For instance network portions of the 'servers' object may get impacted
by Neutron.

With all those possibilities, putting in the extra ~8 bytes to handle
contingencies seemed prudent.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Question about redundant API samples tests for microversions

2016-06-20 Thread Sean Dague
On 06/17/2016 04:16 PM, Matt Riedemann wrote:
> I was reviewing this today:
> 
> https://review.openstack.org/#/c/326940/
> 
> And I said to myself, 'self, do we really need to subclass the API
> samples functional tests for this microversion given this change doesn't
> modify the request/response body, it's only adding paging support?'.
> 
> https://review.openstack.org/#/c/326940/6/nova/tests/functional/api_sample_tests/test_hypervisors.py
> 
> 
> The only change here is listing hypervisors, and being able to page on
> those if the microversion is high enough. So the API samples don't
> change at all, they are just running against a different microversion.

Agree. If the samples are the same, I think we shouldn't have that extra
set of tests, and just test the interesting surface. I think part of the
confusion in the code is also that the subclassing to run tests with
different scenarios pattern exists a lot of places, and we use
testscenarios explicitly other places.

> 
> The same goes for the REST API unit tests really:
> 
> https://review.openstack.org/#/c/326940/6/nova/tests/unit/api/openstack/compute/test_hypervisors.py
> 
> 
> I'm not sure if the test subclassing is just done like this for new
> microversions because it's convenient or if it's because of regression
> testing - knowing that we aren't changing a bunch of other REST methods
> in the process, so the subclassed tests aren't testing anything
> different from the microversion that came before them.
> 
> The thing I don't like about the test subclassing is all of the
> redundant testing that goes on, and people might add tests to the parent
> class not realizing it's subclassed and thus duplicating test cases with
> no functional change.
> 
> Am I just having some Friday crazies? Ultimately this doesn't hurt
> anything really but thought I'd ask.

Honestly, I feel like subclassing tests is almost always an
anti-pattern. It looks like you are saving code up front, but it
massively couples things in ways that become super hard to deal with in
the future.

Test code doesn't need to be normalized to within an inch of it's life.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-20 Thread Sean Dague
On 06/14/2016 07:19 PM, Chris Hoge wrote:
> 
>> On Jun 14, 2016, at 3:59 PM, Edward Leafe  wrote:
>>
>> On Jun 14, 2016, at 5:50 PM, Matthew Treinish  wrote:
>>
>>> But, if we add another possible state on the defcore side like conditional 
>>> pass,
>>> warning, yellow, etc. (the name doesn't matter) which is used to indicate 
>>> that
>>> things on product X could only pass when strict validation was disabled (and
>>> be clear about where and why) then my concerns would be alleviated. I just 
>>> do
>>> not want this to end up not being visible to end users trying to evaluate
>>> interoperability of different clouds using the test results.
>>
>> +1
>>
>> Don't fail them, but don't cover up their incompatibility, either.
>> -- Ed Leafe
> 
> That’s not my proposal. My requirement is that vendors who want to do this
> state exactly which APIs are sending back additional data, and that this
> information be published.
> 
> There are different levels of incompatibility. A response with additional data
> that can be safely ignored is different from a changed response that would
> cause a client to fail.

It's actually not different. It's really not.

This idea that it's safe to add response data is based on an assumption
that software versions only move forward. If you have a single deploy of
software, that's fine.

However as noted, we've got production clouds on Juno <-> Mitaka in the
wild. Which means if we want to support horizontal transfer between
clouds, the user experienced timeline might be start on a Mitaka cloud,
then try to move to Juno. So anything added from Juno -> Mitaka without
signaling has exactly the same client breaking behavior as removing
attributes.

Which is why microversions are needed for attribute adds.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-15 Thread Sean Dague
On 06/15/2016 12:14 AM, Mark Voelker wrote:

> 
> It is perhaps important to note here that the DefCore seems to have two 
> meanings to a lot of people I talk to today: it’s a mark of interoperability 
> (the OpenStack Powered badge that says certain capabilities of this cloud 
> behave like other clouds bearing the mark) and it gives a cloud the ability 
> to call itself OpenStack (e.g. you can get a trademark/logo license agreement 
> from the Foundation).  
> 
> The OpenStack Powered program currently covers Icehouse through Mitaka.  
> Right now, that includes releases that were still on the Nova 2.0 API.  API 
> extensions were a supported thing [1] back in 2.0 and it was even explicitly 
> documented that they allowed for additional attributes in the responses and 
> “vendor specific niche functionality [1]”.  The change to the Tempest tests 
> [2] applied to the 2.0 API as well as 2.1 with the intent of preventing 
> further changes from getting into the 2.0 API at the gate, which totally 
> makes sense as a gate test.  If those same tests are used for DefCore 
> purposes, it does change what vendors need to do to be compliant with the 
> Guidelines rather immediately--even on older releases of OpenStack using 2.0, 
> which could be problematic (as noted elsewhere already [3]).

Right, that's fair. And part of why I think the pass* makes sense.
Liberty is the introduction of microversions on by default for clouds
from Nova upstream configs.

> So, through the interoperability lens: I think many folks acknowledge that 
> supporting extensions lead to a lot of variance between clouds, and that was 
> Not So Awesome for interoperability.  IIRC part of the rationale for 
> switching to microversions with a single monotonic counter and deprecating 
> extensions [4] was to set a course for eliminating a lot of that behavioral 
> variance.
> 
> From the “ability to call yourself OpenStack” lens: it feels sort of wrong to 
> tell a cloud that it can’t claim to be OpenStack because it’s running a 
> version that falls within the bounds of the Powered program with the 2.0 API 
> (when extensions weren't deprecated) and using the extension mechanism that 
> 2.0 supported for years.

To be clear, extensions weren't a part of the 2.0 API, they were a part
of the infrastructure. It's a subtle but different point. Nova still
supports the 2.0 API, but on different infrastructure, which
doesn't/won't support extensions.

Are people registering new Kilo (or earlier) clouds in the system today?
By the time folks get to Newton, none of that is going to work anyway in
code.

In an ideal world product teams would be close enough to upstream code
and changes to see all this coming and be on top of it. In the real
world, a lot of these teams are like a year (or more) behind, which
actually makes Defcore with a pass* an ideal alternative communication
channel to express that products are coming up to a cliff, and should
start working on plans now.

> I think that’s part of what makes this issue tricky for a lot of folks.
> 
> [1] http://docs.openstack.org/developer/nova/v2/extensions.html

It's an unfortunate accident of bugs in our publishing system that that
URL still exists. That was deleted in Oct 2015. I'll look at getting it
properly cleaned up.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] os-brick privsep failures and an upgrade strategy?

2016-06-14 Thread Sean Dague

On 06/14/2016 06:11 PM, Angus Lees wrote:

Yep (3) is quite possible, and the only reason it doesn't just do this
already is because there's no way to find the name of the rootwrap
command to use (from any library, privsep or os-brick) - and I was never
very happy with the current need to specify a command line in
oslo.config purely for this lame reason.

As Sean points out, all the others involve some sort of configuration
change preceding the code.  I had imagined rollouts would work by
pushing out the harmless conf or sudoers change first, but hadn't
appreciated the strict change phases imposed by grenade (and ourselves).

If all "end-application" devs are happy calling something like (3)
before the first privileged operation occurs, then we should be good.  I
might even take the opportunity to phrase it as a general privsep.init()
function, and then we can use it for any other top-of-main()
privilege-setup steps that need to be taken in the future.


That sounds promising. It would be fine to emit a warning if it only was 
using the default, asking people to make a configuration change to make 
it go away. We're totally good with things functioning with warnings 
after transitions, that ops can adjust during their timetable.


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-14 Thread Sean Dague

On 06/14/2016 07:28 PM, Monty Taylor wrote:

On 06/14/2016 05:42 PM, Doug Hellmann wrote:



I think this is the most important thing to me as it relates to this.
I'm obviously a huge proponent of clouds behaving more samely. But I
also think that, as Doug nicely describes above, we've sort of backed in
to removing something without a deprecation window ... largely because
of the complexities involved with the system here - and I'd like to make
sure that when we are being clear about behavior changes that we give
the warning period so that people can adapt.


I also think that "pass" to "pass *"  is useful social incentive. While 
I think communication of this new direction has happened pretty broadly, 
organizations are complex places, and it didn't filter everywhere it 
needed to with the urgency that was probably needed.


"pass *"  * - with a greylist which goes away in 6 months

Will hopefully be a reasonable enough push to get the behavior we want, 
which is everyone publishing the same interface.


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

2016-06-14 Thread Sean Dague
On 06/14/2016 01:57 PM, Chris Hoge wrote:

> 
> My proposal for addressing this problem approaches it at two levels:
> 
> * For the short term, I will submit a blueprint and patch to tempest that
>   allows configuration of a grey-list of Nova APIs where strict response
>   checking on additional properties will be disabled. So, for example,
>   if the 'create  servers' API call returned extra properties on that call,
>   the strict checking on this line[8] would be disabled at runtime.
>   Use of this code path will emit a deprecation warning, and the
>   code will be scheduled for removal in 2017 directly after the release
>   of the 2017.01 guideline. Vendors would be required so submit the
>   grey-list of APIs with additional response data that would be
>   published to their marketplace entry.

To understand more. Will there be a visible asterisk with their
registration that says they require a grey-list?

> * Longer term, vendors will be expected to work with upstream to update
>   the API for returning additional data that is compatible with
>   API micro-versioning as defined by the Nova team, and the waiver would
>   no longer be allowed after the release of the 2017.01 guideline.
> 
> For the next half-year, I feel that this approach strengthens interoperability
> by accurately capturing the current state of OpenStack deployments and
> client tools. Before this change, additional properties on responses
> weren't explicitly disallowed, and vendors and deployers took advantage
> of this in production. While this is behavior that the Nova and QA teams
> want to stop, it will take a bit more time to reach downstream. Also, as
> of right now, as far as I know the only client that does strict response
> checking for Nova responses is the Tempest client. Currently, additional
> properties in responses are ignored and do not break existing client
> functionality. There is currently little to no harm done to downstream
> users by temporarily allowing additional data to be returned in responses.

In general I'm ok with this, as long as three things are true:

1) registrations that need the grey list are visually indicated quite
clearly and publicly that they needed it to pass.

2) 2017.01 is a firm cutoff.

3) We have evidence that folks that are having challenges with the
strict enforcement have made getting compliant a top priority.


3 is the one where I don't have any data either way. But I didn't see
any specs submissions (which are required for API changes in Nova) for
Newton that would indicate anyone is working on this. For 2017 to be a
hard stop, that means folks are either deleting this from their
interface, or proposing in Ocata. Which is a really short runway if this
stuff isn't super straight forward and already upstream agreed.

So I'm provisionally ok with this, if folks in the know feel like 3 is
covered.

-Sean

P.S. The Tempest changes pretty much just anticipate the Nova changes
which are deleting all these facilities in Newton -
https://specs.openstack.org/openstack/nova-specs/specs/newton/approved/api-no-more-extensions.html
- so in some ways we aren't doing folks a ton of favors letting them
delay too far because they are about to hit a brick wall on the code side.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [nova] os-brick privsep failures and an upgrade strategy?

2016-06-14 Thread Sean Dague
On 06/14/2016 09:02 AM, Daniel P. Berrange wrote:
> On Tue, Jun 14, 2016 at 07:49:54AM -0400, Sean Dague wrote:
> 
> [snip]
> 
>> The crux of the problem is that os-brick 1.4 and privsep can't be used
>> without a config file change during the upgrade. Which violates our
>> policy, because it breaks rolling upgrades.
> 
> os-vif support is going to face exactly the same problem. We just followed
> os-brick's lead by adding a change to devstack to explicitly set the
> required config options in nova.conf to change privsep to use rootwrap
> instead of plain sudo.
> 
> Basically every single user of privsep is likely to face the same
> problem.
> 
>> So... we have a few options:
>>
>> 1) make an exception here with release notes, because it's the only way
>> to move forward.
> 
> That's quite user hostile I think.
> 
>> 2) have some way for os-brick to use either mode for a transition period
>> (depending on whether privsep is configured to work)
> 
> I'm not sure that's viable - at least for os-vif we started from
> a clean slate to assume use of privsep, so we won't be able to have
> any optional fallback to non-privsep mode.
> 
>> 3) Something else ?
> 
> 3) Add an API to oslo.privsep that lets us configure the default
>command to launch the helper. Nova would invoke this on startup
> 
>   privsep.set_default_helper("sudo nova-rootwrap ")
> 
> 4) Have oslo.privsep install a sudo rule that grants permission
>to run privsep-helper, without needing rootwrap.
> 
> 5) Have each user of privsep install a sudo rule to grants
>permission to run privsep-helper with just their specific
>entry point context, without needing rootwrap

4 & 5 are the same as 1, because python packages don't have standardized
management of /etc in their infrastructure. The code can't roll forward
without a config change.

Option #3 is a new one, I wonder if that would get us past here better.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [cinder] [nova] os-brick privsep failures and an upgrade strategy?

2016-06-14 Thread Sean Dague
os-brick 1.4 was released over the weekend, and was the first os-brick
to include privsep. We got a really odd failure rate in the
grenade-multinode jobs (1/3 - 1/2) after wards which was super non
obvious why. Hemma looks to have figured it out (this is a summary of
what I've seen on IRC to pull it all together)

Remembering the following -
https://github.com/openstack-dev/grenade#theory-of-upgrade and
https://governance.openstack.org/reference/tags/assert_supports-upgrade.html#requirements
- New code must work with N-1 configs. So this is `master` running with
`mitaka` configuration.

privsep requires a sudo rule or rootwrap rule (to get to sudo) to allow
the privsep daemon to be spawned for volume actions.

During gate testing we have a blanket sudoer rule for the stack user
during the run of grenade.sh. It has to do system level modifications
broadly to perform the upgrade. This sudoer rule is deleted at the end
of the grenade.sh run before Tempest tests are run, so that Tempest
tests don't accidentally require root privs on their target environment.

Grenade *also* makes sure that some resources live across the upgrade
boundary. This includes a boot from volume guest, which is torn down
before testing starts. And this is where things get interesting.

This means there is a volume teardown needed before grenade ends. But
there is only one. In single node grenade this happens about 30 seconds
for the end of the script, triggers the privsep daemon start, and then
we're done. And the 50_stack_sh sudoers file is removed. In multinode,
*if* the boot from volume server is on the upgrade node, then the same
thing happens. *However*, if it instead ended up on the subnode, which
is not upgraded, then the volume tear down in on the old node. No
os-brick calls are made on the upgraded node before grenade finishes.
The 50_stack_sh sudoers file is removed, as expected.

And now all volume tests on those nodes fail.

Which is what should happen. The point is that in production no one is
going to put a blanket sudoers rule like that in place. It's just we
needed it for this activity, and the userid on the services being the
same as the shell user (which is not root) let this fallback rule be used.

The crux of the problem is that os-brick 1.4 and privsep can't be used
without a config file change during the upgrade. Which violates our
policy, because it breaks rolling upgrades.

So... we have a few options:

1) make an exception here with release notes, because it's the only way
to move forward.

2) have some way for os-brick to use either mode for a transition period
(depending on whether privsep is configured to work)

3) Something else ?

https://bugs.launchpad.net/os-brick/+bug/1592043 is the bug we've got on
this. We should probably sort out the path forward here on the ML as
there are a bunch of folks in a bunch of different time zones that have
important perspectives here.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Nova API extensions removal plan

2016-06-14 Thread Sean Dague
Nova is getting towards it's final phases of the long term arc to really
standardize the API, which includes removing the API extensions
facility. This has been a long arc that was started in Atlanta. And has
been talked about in a lot of channels, but some interactions this past
week made us realize that some folks might not have realized this is
happening.

So we've now got an over arching spec about how and why we're removing
the API extensions facility from Nova, and alternatives that exist for
folks -
https://specs.openstack.org/openstack/nova-specs/specs/newton/approved/api-no-more-extensions.html

This is informative for folks, please take a look if you think this will
impact you.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic][infra][qa] Ironic grenade work nearly complete

2016-06-10 Thread Sean Dague
On 06/10/2016 08:41 AM, Jeremy Stanley wrote:
> On 2016-06-10 11:49:12 +0100 (+0100), Miles Gould wrote:
>> On 09/06/16 23:21, Jay Faulkner wrote:
>>> There was some discussion about whether or not the Ironic grenade job
>>> should be in the check pipeline (even as -nv) for grenade,
>>
>> Not having this would mean that changes to grenade could silently break
>> Ironic's CI, right? That sounds really bad.
> 
> That's like saying it's really bad that changes to devstack could
> silently break devstack-based jobs for random projects, and so they
> should be tested against every one of those jobs. At some point you
> have to draw a line between running a reasonably representative
> sample and running so many jobs that you'll never be able to merge
> another change again (because even very small nondeterministic
> failure rates compound to make that impossible at a certain scale).

Nothing should be voting in check in grenade that requires a plugin.

I'm fine with a few things in check nv if they are doing something out
of the ordinary that we think needs to be kept on top of. I also expect
that ironic folks are going to watch for those failures, and say, with
-1/+1 CR, when they are legit and when it was off the rails. A non
voting job that doesn't have domain experts validating the content
regularly with CR means it gets ignored if it fails a bunch.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][glance][qa] - Nova glance v2 work complete

2016-06-10 Thread Sean Dague
On 06/07/2016 04:55 PM, Matt Riedemann wrote:
> I tested the glance v2 stack (glance v1 disabled) using a devstack
> change here:
> 
> https://review.openstack.org/#/c/325322/
> 
> Now that the changes are merged up through the base nova image proxy and
> the libvirt driver, and we just have hyper-v/xen driver changes for that
> series, we should look at gating on this configuration.
> 
> I was originally thinking about adding a new job for this, but it's
> probably better if we just change one of the existing integrated gate
> jobs, like gate-tempest-dsvm-full or gate-tempest-dsvm-neutron-full.
> 
> Does anyone have an issue with that? Glance v1 is deprecated and the
> configuration option added to nova (use_glance_v1) defaults to True for
> compat but is deprecated, and the Nova team plans to drop it's v1 proxy
> code in Ocata. So it seems like changing config to use v2 in the gate
> jobs should be a non-issue. We'd want to keep at least one integrated
> gate job using glance v1 to make sure we don't regress anything there in
> Newton.

use_glance_v1=False has now been merged as the default, so all jobs are
now using glance v2 for the Nova <=> Glance communication -
https://review.openstack.org/#/c/321551/

Thanks to Mike and Sudipta for pushing this to completion.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Update on resource providers work

2016-06-08 Thread Sean Dague
On 06/08/2016 03:31 PM, Matt Riedemann wrote:
> On 6/6/2016 7:26 AM, Jay Pipes wrote:
>> Once the InventoryList and AllocationList objects are merged, then we
>> will focus on reviews of the placement REST API patches [3]. Again, we
>> are planning on having the nova-compute resource tracker call these REST
>> API calls directly (while continuing to use the Nova ComputeNode object
>> for saving legacy inventory information). Clearly, before the resource
>> tracker can call this placement REST API, we need the placement REST API
>> service to be created and a client for it added to OSC. Once this client
>> exists, we can add code to the resource tracker which uses it.
> 
> Wait, we're going to require python-openstackclient in Nova to call the
> placement REST API? That seems bad given the dependencies that OSC pulls
> in. Why not just create the REST API wrapper that we need within Nova
> and then split that out later to whichever client it's going to live in?

Yes, that ^^^

Just use keystoneauth1 and hand rolled json. We shouldn't be talking
about a ton of code.

Pulling python-openstackclient back into Nova as a dependency is really
a hard NACK for a bunch of reasons, including the way dependencies work.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][glance][qa] Test plans for glance v2 stack

2016-06-08 Thread Sean Dague
On 06/07/2016 04:55 PM, Matt Riedemann wrote:
> I tested the glance v2 stack (glance v1 disabled) using a devstack
> change here:
> 
> https://review.openstack.org/#/c/325322/
> 
> Now that the changes are merged up through the base nova image proxy and
> the libvirt driver, and we just have hyper-v/xen driver changes for that
> series, we should look at gating on this configuration.
> 
> I was originally thinking about adding a new job for this, but it's
> probably better if we just change one of the existing integrated gate
> jobs, like gate-tempest-dsvm-full or gate-tempest-dsvm-neutron-full.
> 
> Does anyone have an issue with that? Glance v1 is deprecated and the
> configuration option added to nova (use_glance_v1) defaults to True for
> compat but is deprecated, and the Nova team plans to drop it's v1 proxy
> code in Ocata. So it seems like changing config to use v2 in the gate
> jobs should be a non-issue. We'd want to keep at least one integrated
> gate job using glance v1 to make sure we don't regress anything there in
> Newton.

Honestly, I think we should take the Nova defaults (which will flip to
v2 shortly) and move forward. v1 usage in Nova will be deprecated in a
week. It will default to v2 for people in Newton and they will have to
manually change it to go back. And because we did the copy / paste
approach instead of common dynamic code, the chances for a v1 regression
that is not caught by our unit tests is very very small. It's basically
frozen code.

And we're going to delete the v1 code paths entirely in September. By
the time anyone deploys the Newton code the master v1 code will be
deleted. And our answer is going to be move to v2 for everyone.

It doesn't make sense to me to drive up the complexity by testing both
paths. We'll have a new tested opinionated default that we lead with.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] API changes on limit / marker / sort in Newton

2016-06-02 Thread Sean Dague
On 06/02/2016 12:53 PM, Everett Toews wrote:
> 
>> On Jun 1, 2016, at 2:01 PM, Matt Riedemann  
>> wrote:
>>
>> Agree with Sean, I'd prefer separate microversions since it makes getting 
>> these in easier since they are easier to review (and remember we make 
>> changes to python-novaclient for each of these also).
>>
>> Also agree with using a single spec in the future, like Sean did with the 
>> API deprecation spec - deprecating multiple APIs but a single spec since the 
>> changes are the same.
> 
> I appreciate that Nova has a long and storied history around it's API. 
> Nonetheless, since it seems you're considering moving to  a new microversion, 
> we'd appreciate it if you would consider adhering to the Sorting guideline 
> [1] and helping drive consensus into the Pagination guideline [2].

Everett,

Could you be more specific as to what your complaints are? This response
is extremely vague, and mildly passive aggressive, so I don't even know
where to start on responses.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-06-01 Thread Sean Dague
On 06/01/2016 01:33 PM, Matt Riedemann wrote:

> 
> Sounds like there was a bad check in nova which is fixed here:
> 
> https://review.openstack.org/#/c/323467/
> 
> And a d-g change depends on that here:
> 
> https://review.openstack.org/#/c/320925/
> 
> Is there anything more to do for this? I'm assuming we should backport
> the nova change to the stable branches because the d-g change is going
> to break those multinode jobs on stable, although they are already
> non-voting jobs so it doesn't really matter. But if we knowingly break
> those jobs on stable branches, we should fix them to work or exclude
> them from running on stable branch changes since it'd be a waste of test
> resources.

The intent is to backport them. We probably can land the d-g change
without waiting for the backports, but they are super straight forward,
so should be easy to go in quick.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] API changes on limit / marker / sort in Newton

2016-05-31 Thread Sean Dague
On 05/30/2016 10:05 PM, Zhenyu Zheng wrote:
> I think it is good to share codes and a single microversion can make
> life more easier during coding.
> Can we approve those specs first and then decide on the details in IRC
> and patch review? Because
> the non-priority spec deadline is so close.
> 
> Thanks
> 
> On Tue, May 31, 2016 at 1:09 AM, Ken'ichi Ohmichi  <mailto:ken1ohmi...@gmail.com>> wrote:
> 
> 2016-05-29 19:25 GMT-07:00 Alex Xu  <mailto:sou...@gmail.com>>:
> >
> >
> > 2016-05-20 20:05 GMT+08:00 Sean Dague  <mailto:s...@dague.net>>:
> >>
> >> There are a number of changes up for spec reviews that add parameters 
> to
> >> LIST interfaces in Newton:
> >>
> >> * keypairs-pagination (MERGED) -
> >>
> >> 
> https://github.com/openstack/nova-specs/blob/8d16fc11ee6d01b5a9fe1b8b7ab7fa6dff460e2a/specs/newton/approved/keypairs-pagination.rst#L2
> >> * os-instances-actions - https://review.openstack.org/#/c/240401/
> >> * hypervisors - https://review.openstack.org/#/c/240401/
> >> * os-migrations - https://review.openstack.org/#/c/239869/
> >>
> >> I think that limit / marker is always a legit thing to add, and I 
> almost
> >> wish we just had a single spec which is "add limit / marker to the
> >> following APIs in Newton"
> >>
> >
> > Are you looking for code sharing or one microversion? For code sharing, 
> it
> > sounds ok if people have some co-work. Probably we need a common 
> pagination
> > supported model_query function for all of those. For one microversion, 
> i'm a
> > little hesitate, we should keep one small change, or enable all in one
> > microversion. But if we have some base code for pagination support, we
> > probably can make the pagination as default thing support for all list
> > method?
> 
> It is nice to share some common code for this, that would be nice for
> writing the api doc also to know what APIs support them.
> And also nice to do it with a single microversion for the above
> resources, because we can avoid microversion bumping conflict and all
> of them don't seem a big change.

There is already common code for limit / marker.

I don't think these all need to be one microversion, they are honestly
easier to review if they are not.

However in future we should probably make 1 spec for all limit / marker
adds during a cycle. Just because the answer will be *yes* and seems
like more work to have everything be a dedicated spec.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] I'm going to expire open bug reports older than 18 months.

2016-05-31 Thread Sean Dague
On 05/30/2016 04:02 PM, Shoham Peller wrote:
> I support Clint's comment, and as an example, only today I was able to
> search a bug and to see it was reported 2 years ago and wasn't solved since.
> I've commented on the bug saying it happened to me in an up-to-date nova.
> I'm talking about a bug which is on your list -
> https://bugs.launchpad.net/nova/+bug/1298075
> 
> I guess I wouldn't
>  been able to do so if the bug was closed.

A closed bug still shows up in the search, and if you try to report a
bug. So you'd still see in in reporting.

That bug is actually a classic instance of something which shouldn't be
in the bug tracker. It's a known issue of all of OpenStack and
Keystone's token architecture. It requires a bunch of Keystone feature
work to be addressed.

Having a more public "Known Issues in OpenStack" googlable page might be
way more appropriate for this so we don't spend a ton of time
duplicating issues into these buckets.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] I'm going to expire open bug reports older than 18 months.

2016-05-31 Thread Sean Dague
On 05/30/2016 02:37 PM, Clint Byrum wrote:
> (Top posting as a general reply to the thread)
> 
> Bugs are precious data. As much as it feels like the bug list is full of
> cruft that won't ever get touched, one thing that we might be missing in
> doing this is that the user who encounters the bug and takes the time
> to actually find the bug tracker and report a bug, may be best served
> by finding that somebody else has experienced something similar. If you
> close this bug, that user is now going to be presented with the "I may
> be the first person to report this" flow instead of "yeah I've seen that
> error too!". The former can be a daunting task, but the latter provides
> extra incentive to press forward, since clearly there are others who
> need this, and more data is helpful to triagers and fixers.

I strongly disagree with this sentiment. Bugs are only useful if
actionable. Given the rate of change of the code base an 18 month old
bug without a reasonable reproduce case (which in almost all cases is
not there), is just debt. And more importantly they are sink holes where
well intended developers go off and burn 3 days realizing it's
completely irrelevant to the current project. Energy that could be spent
on relevant work.

> I 100% support those who are managing bugs doing whatever they need
> to do to make sure users' issues are being addressed as well as can be
> done with the resources available. However, I would also urge everyone
> to remember that the bug tracker is not only a way for developers to
> manage the bugs, it is also a way for the community of dedicated users
> to interact with the project as a whole.

Dedicated users reporting bugs that are actionable tend not to exist
longer than the supported window of the project.

I do also suggest that if people feel strongly that bugs shouldn't be
expired like this, they put their money where their mouth is and help on
the Bug Triage and addressing bugs through the system. Because the
alternative to expiring old bugs isn't old bugs getting more eyes, it's
all bugs getting less time by developers because the pile is so
insurmountable no one ever wants to look at it.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

2016-05-31 Thread Sean Dague
On 05/31/2016 05:39 AM, Daniel P. Berrange wrote:
> On Tue, May 24, 2016 at 01:59:17PM -0400, Sean Dague wrote:
>> The team working on live migration testing started with an experimental
>> job on Ubuntu 16.04 to try to be using the latest and greatest libvirt +
>> qemu under the assumption that a set of issues we were seeing are
>> solved. The short answer is, it doesn't look like this is going to work.
>>
>> We run tests on a bunch of different clouds. Those clouds expose
>> different cpu flags to us. These are not standard things that map to
>> "Haswell". It means live migration in the multinode cases can hit cpus
>> with different flags. So we found the requirement was to come up with a
>> least common denominator of cpu flags, which we call gate64, and push
>> that into the libvirt cpu_map.xml in devstack, and set whenever we are
>> in a multinode scenario.
>> (https://github.com/openstack-dev/devstack/blob/master/tools/cpu_map_update.py)
>>  Not ideal, but with libvirt 1.2.2 it works fine.
>>
>> It turns out it works fine because libvirt *actually* seems to take the
>> data from cpu_map.xml and do a translation to what it believes qemu will
>> understand. On these systems apparently this turns into "-cpu
>> Opteron_G1,-pse36"
>> (http://logs.openstack.org/29/42529/24/check/gate-tempest-dsvm-multinode-full/5f504c5/logs/libvirt/qemu/instance-000b.txt.gz)
>>
>> At some point between libvirt 1.2.2 and 1.3.1, this changed. Now libvirt
>> seems to be passing our cpu_model directly to qemu, and assumes that as
>> a user you will be responsible for writing all the  stanzas to
>> add/remove yourself. When libvirt sends 'gate64' to qemu, this explodes,
>> as qemu has no idea what we are talking about.
>> http://logs.openstack.org/34/319934/2/experimental/gate-tempest-dsvm-multinode-live-migration/b87d689/logs/screen-n-cpu.txt.gz#_2016-05-24_15_59_12_531
>>
>> Unlike libvirt, which has a text file (xml) that configures the cpus
>> that could exist in the world, qemu builds this in statically at compile
>> time:
>> http://git.qemu.org/?p=qemu.git;a=blob;f=target-i386/cpu.c;h=895a386d3b7a94e363ca1bb98821d3251e70c0e0;hb=HEAD#l694
>>
>>
>> So, the existing cpu_map.xml workaround for our testing situation will
>> no longer work.
>>
>> So, we have a number of open questions:
>>
>> * Have our cloud providers standardized enough that we might get away
>> without this custom cpu model? (Have some of them done it and only use
>> those for multinode?)
>> * Is there any way to get this feature back in libvirt to do the cpu
>> computation?
>> * Would we have to build a whole nova feature around setting libvirt xml
>>  to be able to test live migration in our clouds?
>> * Other options?
>> * Do we give up and go herd goats?
> 
> Rather than try to define our own custom CPU models, we can probably
> just use one of the standard CPU models and then explicitly tell
> libvirt which flags to turn off in order to get compatibility with
> our cloud environments.
> 
> This is not currently possible with Nova, since our nova.conf option
> only allow us to specify a bare CPU model. We would have to extend
> nova.conf to allow us to specify a list of CPU features to add or
> remove. Libvirt should then correctly pass these changes through
> to QEMU.

Yes, that's an option. Given that the libvirt team seemed to acknowledge
this as a regression, I'd rather not build a user exposed feature for
all of that just as a workaround for a libvirt regression.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


<    1   2   3   4   5   6   7   8   9   10   >