[Openstack-operators] [upgrades][skip-level][leapfrog] - RFC - Skipping releases when upgrading

2017-05-25 Thread Carter, Kevin
Hello Stackers,

As I'm sure many of you know there was a talk about doing "skip-level"[0]
upgrades at the OpenStack Summit which quite a few folks were interested
in. Today many of the interested parties got together and talked about
doing more of this in a formalized capacity. Essentially we're looking for
cloud upgrades with the possibility of skipping releases, ideally enabling
an N+3 upgrade. In our opinion it would go a very long way to solving cloud
consumer and deployer problems it folks didn't have to deal with an upgrade
every six months. While we talked about various issues and some of the
current approaches being kicked around we wanted to field our general chat
to the rest of the community and request input from folks that may have
already fought such a beast. If you've taken on an adventure like this how
did you approach it? Did it work? Any known issues, gotchas, or things
folks should be generally aware of?


During our chat today we generally landed on an in-place upgrade with known
API service downtime and little (at least as little as possible) data plane
downtime. The process discussed was basically:
a1. Create utility "thing-a-me" (container, venv, etc) which contains the
required code to run a service through all of the required upgrades.
a2. Stop service(s).
a3. Run migration(s)/upgrade(s) for all releases using the utility
"thing-a-me".
a4. Repeat for all services.

b1. Once all required migrations are complete run a deployment using the
target release.
b2. Ensure all services are restarted.
b3. Ensure cloud is functional.
b4. profit!

Obviously, there's a lot of hand waving here but such a process is being
developed by the OpenStack-Ansible project[1]. Currently, the OSA tooling
will allow deployers to upgrade from Juno/Kilo to Newton using Ubuntu
14.04. While this has worked in the lab, it's early in development (YMMV).
Also, the tooling is not very general purpose or portable outside of OSA
but it could serve as a guide or just a general talking point. Are there
other tools out there that solve for the multi-release upgrade? Are there
any folks that might want to share their expertise? Maybe a process outline
that worked? Best practices? Do folks believe tools are the right way to
solve this or would comprehensive upgrade documentation be better for the
general community?

As most of the upgrade issues center around database migrations, we
discussed some of the potential pitfalls at length. One approach was to
roll-up all DB migrations into a single repository and run all upgrades for
a given project in one step. Another was to simply have mutliple python
virtual environments and just run in-line migrations from a version
specific venv (this is what the OSA tooling does). Does one way work better
than the other? Any thoughts on how this could be better? Would having
N+2/3 migrations addressable within the projects, even if they're not
tested any longer, be helpful?

It was our general thought that folks would be interested in having the
ability to skip releases so we'd like to hear from the community to
validate our thinking. Additionally, we'd like to get more minds together
and see if folks are wanting to work on such an initiative, even if this
turns into nothing more than a co-op/channel where we can "phone a friend".
Would it be good to try and secure some PTG space to work on this? Should
we try and create working group going?

If you've made it this far, please forgive my stream of consciousness. I'm
trying to ask a lot of questions and distill long form conversation(s) into
as little text as possible all without writing a novel. With that said, I
hope this finds you well, I look forward to hearing from (and working with)
you soon.

[0] https://etherpad.openstack.org/p/BOS-forum-skip-level-upgrading
[1] https://github.com/openstack/openstack-ansible-ops/tree/
master/leap-upgrades


--

Kevin Carter
IRC: Cloudnull
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [keystone][nova][cinder][glance][neutron][horizon][policy] defining admin-ness

2017-05-25 Thread joehuang
I think a option 2 is better.

Best Regards
Chaoyi Huang (joehuang)

From: Lance Bragstad [lbrags...@gmail.com]
Sent: 25 May 2017 3:47
To: OpenStack Development Mailing List (not for usage questions); 
openstack-operators@lists.openstack.org
Subject: Re: [openstack-dev] 
[keystone][nova][cinder][glance][neutron][horizon][policy] defining admin-ness

I'd like to fill in a little more context here. I see three options with the 
current two proposals.

Option 1

Use a special admin project to denote elevated privileges. For those unfamiliar 
with the approach, it would rely on every deployment having an "admin" project 
defined in configuration [0].

How it works:

Role assignments on this project represent global scope which is denoted by a 
boolean attribute in the token response. A user with an 'admin' role assignment 
on this project is equivalent to the global or cloud administrator. Ideally, if 
a user has a 'reader' role assignment on the admin project, they could have 
access to list everything within the deployment, pending all the proper changes 
are made across the various services. The workflow requires a special project 
for any sort of elevated privilege.

Pros:
- Almost all the work is done to make keystone understand the admin project, 
there are already several patches in review to other projects to consume this
- Operators can create roles and assign them to the admin_project as needed 
after the upgrade to give proper global scope to their users

Cons:
- All global assignments are linked back to a single project
- Describing the flow is confusing because in order to give someone global 
access you have to give them a role assignment on a very specific project, 
which seems like an anti-pattern
- We currently don't allow some things to exist in the global sense (i.e. I 
can't launch instances without tenancy), the admin project could own resources
- What happens if the admin project disappears?
- Tooling or scripts will be written around the admin project, instead of 
treating all projects equally

Option 2

Implement global role assignments in keystone.

How it works:

Role assignments in keystone can be scoped to global context. Users can then 
ask for a globally scoped token

Pros:
- This approach represents a more accurate long term vision for role 
assignments (at least how we understand it today)
- Operators can create global roles and assign them as needed after the upgrade 
to give proper global scope to their users
- It's easier to explain global scope using global role assignments instead of 
a special project
- token.is_global = True and token.role = 'reader' is easier to understand than 
token.is_admin_project = True and token.role = 'reader'
- A global token can't be associated to a project, making it harder for 
operations that require a project to consume a global token (i.e. I shouldn't 
be able to launch an instance with a globally scoped token)

Cons:
- We need to start from scratch implementing global scope in keystone, steps 
for this are detailed in the spec

Option 3

We do option one and then follow it up with option two.

How it works:

We implement option one and continue solving the admin-ness issues in Pike by 
helping projects consume and enforce it. We then target the implementation of 
global roles for Queens.

Pros:
- If we make the interface in oslo.context for global roles consistent, then 
consuming projects shouldn't know the difference between using the 
admin_project or a global role assignment

Cons:
- It's more work and we're already strapped for resources
- We've told operators that the admin_project is a thing but after Queens they 
will be able to do real global role assignments, so they should now migrate 
*again*
- We have to support two paths for solving the same problem in keystone, more 
maintenance and more testing to ensure they both behave exactly the same way
  - This can get more complicated for projects dedicated to testing policy and 
RBAC, like Patrole


Looking for feedback here as to which one is preferred given timing and payoff, 
specifically from operators who would be doing the migrations to implement and 
maintain proper scope in their deployments.

Thanks for reading!


[0] 
https://github.com/openstack/keystone/blob/3d033df1c0fdc6cc9d2b02a702efca286371f2bd/etc/keystone.conf.sample#L2334-L2342

On Wed, May 24, 2017 at 10:35 AM, Lance Bragstad 
> wrote:
Hey all,

To date we have two proposed solutions for tackling the admin-ness issue we 
have across the services. One builds on the existing scope concepts by scoping 
to an admin project [0]. The other introduces global role assignments [1] as a 
way to denote elevated privileges.

I'd like to get some feedback from operators, as well as developers from other 
projects, on each approach. Since work is required in keystone, it would be 
good to get consensus before spec freeze (June 9th). If you have 

[Openstack-operators] [glance] comments requested on spec to fix security issue

2017-05-25 Thread Brian Rosmaita
Hello Operators,

There's a Glance spec up for fixing OSSN-0075.  It would be really
helpful to know how operators feel about the impact of the proposal
and the alternatives described in the spec:

https://review.openstack.org/#/c/468179/

(Something you may not know is that if you click on the
'gate-glance-specs-docs-ubuntu-xenial' link in the "Jenkins check"
table near the top of the page, you can see the HTML rendered version
of the spec, which may be easier to read than the raw .rst file.)

The Glance Project Team would appreciate your reviews on the spec,
because otherwise we're just guessing about the likely impact on
operators and end users.

thanks,
brian

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] preferred option to fix long-standing user-visible bug in nova?

2017-05-25 Thread Chris Friesen

On 05/25/2017 01:53 PM, Marc Heckmann wrote:

On Mon, 2017-05-15 at 11:46 -0600, Chris Friesen wrote:



What do operators think we should do?  I see two options, neither of
which is
really ideal:

1) Decide that the "new" behaviour has been out in the wild long
enough to
become the defacto standard and update the docs to reflect
this.  This breaks
the "None and 'prefer' are equivalent" model that was originally
intended.

2) Fix the bug to revert back to the original behaviour and backport
the fix to
Ocata.  Backporting to Newton might not happen since it's in phase
II
maintenance.  This could potentially break anyone that has come to
rely on the
"new" behaviour.


Whatever will or has been chosen should match the documentation.
Personally, we would never do anything other than specifying the policy
in the flavor as our flavors are associated w/ HW  profiles but I could
see how other operators might manage things differently. That being
said, that sort of thing should not necessarily be user controlled and
I haven't really explored Glance property protections..

So from my point of view "cpu_thread_policy" set in the flavor should
take precedence over anything else.


So a vote to keep the status quo and change the documentation to match?  (Since 
the current behaviour doesn't match the original documentation.)


Incidentally, it's allowed to be specified in an image because whether or not HT 
is desirable depends entirely on the application code.  It may be faster with 
"isolate", or it may be faster with "require" and double the vCPUs in the guest. 
 If the software in the guest is licensed per vCPU then "isolate" might make 
sense to maximize performance per licensing dollar.


"prefer" is almost never a sensible choice for anything that cares about 
performance--it was always intended to be a way to represent "the behaviour that 
you get if you don't specify a cpu thread policy".


Oh, and I'd assume that a customer would be billed for the number of host cores 
actually used...so "isolate" with N vCPUs and "require" with 2*N vCPUs would end 
up costing the same.


Chris




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [keystone][nova][cinder][glance][neutron][horizon][policy] defining admin-ness

2017-05-25 Thread Marc Heckmann
See below.

On Thu, 2017-05-25 at 15:49 -0500, Lance Bragstad wrote:


On Thu, May 25, 2017 at 2:36 PM, Marc Heckmann 
> wrote:
First of all @Lance, thanks for taking the time to write and summarize this for 
us. It's much appreciated.


Absolutely! it helps me think about it, too.


While I'm not aware of all the nuances, based on my own testing, I feel that we 
are really close with option 1.

That being said, as you already stated, option 2 is clearly more inline with 
the idea of having a "global" Cloud Admin role. So long term, #2 is more 
desirable.

Given the two sentences above, I certainly would prefer option 3 so that we can 
have a usable solution quickly. I certainly will continue to test and provide 
feedback for the option 1 part.



It sounds like eventually migrating everything from the is_admin_project to 
true global roles is a migration you're willing to make. This might be a loaded 
question and it will vary across deployments, but how long would you expect 
that migration to take for you're specific deployment(s)?


Maybe I'm over-simplifying, but if properly documented I would expect there to 
be a cut-over release at some point where we would need to switchover and 
create the proper globally scoped role(s). I guess we could live with 
is_admin_project for 2-3 releases in the interim.

-m


-m




On Thu, 2017-05-25 at 10:42 +1200, Adrian Turjak wrote:


On 25/05/17 07:47, Lance Bragstad wrote:

Option 2

Implement global role assignments in keystone.

How it works:

Role assignments in keystone can be scoped to global context. Users can then 
ask for a globally scoped token

Pros:
- This approach represents a more accurate long term vision for role 
assignments (at least how we understand it today)
- Operators can create global roles and assign them as needed after the upgrade 
to give proper global scope to their users
- It's easier to explain global scope using global role assignments instead of 
a special project
- token.is_global = True and token.role = 'reader' is easier to understand than 
token.is_admin_project = True and token.role = 'reader'
- A global token can't be associated to a project, making it harder for 
operations that require a project to consume a global token (i.e. I shouldn't 
be able to launch an instance with a globally scoped token)

Cons:
- We need to start from scratch implementing global scope in keystone, steps 
for this are detailed in the spec



On Wed, May 24, 2017 at 10:35 AM, Lance Bragstad 
> wrote:
Hey all,

To date we have two proposed solutions for tackling the admin-ness issue we 
have across the services. One builds on the existing scope concepts by scoping 
to an admin project [0]. The other introduces global role assignments [1] as a 
way to denote elevated privileges.

I'd like to get some feedback from operators, as well as developers from other 
projects, on each approach. Since work is required in keystone, it would be 
good to get consensus before spec freeze (June 9th). If you have specific 
questions on either approach, feel free to ping me or drop by the weekly policy 
meeting [2].

Thanks!


Please option 2. The concept of being an "admin" while you are only scoped to a 
project is stupid when that admin role gives you super user power yet you only 
have it when scoped to just that project. That concept never really made sense. 
Global scope makes so much more sense when that is the power the role gives.

At same time, it kind of would be nice to make scope actually matter. As admin 
you have a role on Project X, yet you can now (while scoped to this project) 
pretty much do anything anywhere! I think global roles is a great step in the 
right direction, but beyond and after that we need to seriously start looking 
at making scope itself matter, so that giving someone 'admin' or some such on a 
project actually only gives them something akin to project_admin or some sort 
of admin-lite powers scoped to that project and sub-projects. That though falls 
into the policy work being done, but should be noted, as it is related.

Still, at least global scope for roles make the superuser case make some actual 
sense, because (and I can't speak for other deployers), we have one project 
pretty much dedicated as an "admin_project" and it's just odd to actually need 
to give our service users roles in a project when that project is empty and a 
pointless construct for their purpose.

Also thanks for pushing this! I've been watching your global roles spec review 
in hopes we'd go down that path. :)

-Adrian

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list

Re: [Openstack-operators] [openstack-dev] [keystone][nova][cinder][glance][neutron][horizon][policy] defining admin-ness

2017-05-25 Thread Lance Bragstad
On Thu, May 25, 2017 at 2:36 PM, Marc Heckmann 
wrote:

> First of all @Lance, thanks for taking the time to write and summarize
> this for us. It's much appreciated.
>

Absolutely! it helps me think about it, too.


>
> While I'm not aware of all the nuances, based on my own testing, I feel
> that we are really close with option 1.
>
> That being said, as you already stated, option 2 is clearly more inline
> with the idea of having a "global" Cloud Admin role. So long term, #2 is
> more desirable.
>
> Given the two sentences above, I certainly would prefer option 3 so that
> we can have a usable solution quickly. I certainly will continue to test
> and provide feedback for the option 1 part.
>
>
It sounds like eventually migrating everything from the is_admin_project to
true global roles is a migration you're willing to make. This might be a
loaded question and it will vary across deployments, but how long would you
expect that migration to take for you're specific deployment(s)?


-m
>
>
>
>
> On Thu, 2017-05-25 at 10:42 +1200, Adrian Turjak wrote:
>
>
>
> On 25/05/17 07:47, Lance Bragstad wrote:
> 
>
> *Option 2*
>
> Implement global role assignments in keystone.
>
> *How it works:*
>
> Role assignments in keystone can be scoped to global context. Users can
> then ask for a globally scoped token
>
> Pros:
> - This approach represents a more accurate long term vision for role
> assignments (at least how we understand it today)
> - Operators can create global roles and assign them as needed after the
> upgrade to give proper global scope to their users
> - It's easier to explain global scope using global role assignments
> instead of a special project
> - token.is_global = True and token.role = 'reader' is easier to understand
> than token.is_admin_project = True and token.role = 'reader'
> - A global token can't be associated to a project, making it harder for
> operations that require a project to consume a global token (i.e. I
> shouldn't be able to launch an instance with a globally scoped token)
>
> Cons:
> - We need to start from scratch implementing global scope in keystone,
> steps for this are detailed in the spec
>
> 
>
>
> On Wed, May 24, 2017 at 10:35 AM, Lance Bragstad 
> wrote:
>
> Hey all,
>
> To date we have two proposed solutions for tackling the admin-ness issue
> we have across the services. One builds on the existing scope concepts by
> scoping to an admin project [0]. The other introduces global role
> assignments [1] as a way to denote elevated privileges.
>
> I'd like to get some feedback from operators, as well as developers from
> other projects, on each approach. Since work is required in keystone, it
> would be good to get consensus before spec freeze (June 9th). If you have
> specific questions on either approach, feel free to ping me or drop by the
> weekly policy meeting [2].
>
> Thanks!
>
>
> Please option 2. The concept of being an "admin" while you are only scoped
> to a project is stupid when that admin role gives you super user power yet
> you only have it when scoped to just that project. That concept never
> really made sense. Global scope makes so much more sense when that is the
> power the role gives.
>
> At same time, it kind of would be nice to make scope actually matter. As
> admin you have a role on Project X, yet you can now (while scoped to this
> project) pretty much do anything anywhere! I think global roles is a great
> step in the right direction, but beyond and after that we need to seriously
> start looking at making scope itself matter, so that giving someone 'admin'
> or some such on a project actually only gives them something akin to
> project_admin or some sort of admin-lite powers scoped to that project and
> sub-projects. That though falls into the policy work being done, but should
> be noted, as it is related.
>
> Still, at least global scope for roles make the superuser case make some
> actual sense, because (and I can't speak for other deployers), we have one
> project pretty much dedicated as an "admin_project" and it's just odd to
> actually need to give our service users roles in a project when that
> project is empty and a pointless construct for their purpose.
>
> Also thanks for pushing this! I've been watching your global roles spec
> review in hopes we'd go down that path. :)
>
> -Adrian
>
> ___
> OpenStack-operators mailing 
> listOpenStack-operators@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] preferred option to fix long-standing user-visible bug in nova?

2017-05-25 Thread Marc Heckmann
Sorry for the late reply, but see below.

On Mon, 2017-05-15 at 11:46 -0600, Chris Friesen wrote:
> Hi,
> 
> In Mitaka nova introduced the "cpu_thread_policy" which can be
> specified in 
> flavor extra-specs.  In the original spec, and in the original
> implementation, 
> not specifying the thread policy in the flavor was supposed to be
> equivalent to 
> specifying a policy of "prefer", and in both cases if the image set a
> policy 
> then nova would use the image policy.
> 
> In Newton, the code was changed to fix a bug but there was an
> unforeseen side 
> effect.  Now the behaviour is different depending on whether the
> flavor 
> specifies no policy at all or specifies a policy of
> "prefer".   Specifically, if 
> the flavor doesn't specify a policy at all and the image does then
> we'll use the 
> flavor policy.  However, if the flavor specifies a policy of "prefer"
> and the 
> image specifies a different policy then we'll use the flavor policy.
> 
> This is clearly a bug (tracked as part of bug #1687077), but it's now
> been out 
> in the wild for two releases (Newton and Ocata).
> 
> What do operators think we should do?  I see two options, neither of
> which is 
> really ideal:
> 
> 1) Decide that the "new" behaviour has been out in the wild long
> enough to 
> become the defacto standard and update the docs to reflect
> this.  This breaks 
> the "None and 'prefer' are equivalent" model that was originally
> intended.
> 
> 2) Fix the bug to revert back to the original behaviour and backport
> the fix to 
> Ocata.  Backporting to Newton might not happen since it's in phase
> II 
> maintenance.  This could potentially break anyone that has come to
> rely on the 
> "new" behaviour.

Whatever will or has been chosen should match the documentation.
Personally, we would never do anything other than specifying the policy
in the flavor as our flavors are associated w/ HW  profiles but I could
see how other operators might manage things differently. That being
said, that sort of thing should not necessarily be user controlled and
I haven't really explored Glance property protections.. 

So from my point of view "cpu_thread_policy" set in the flavor should
take precedence over anything else.

-m

> 
> Either change is trivial from a dev standpoint, so it's really an
> operator 
> issue--what makes the most sense for operators/users?
> 
> Chris
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato
> rs
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators