Re: [Openstack-operators] [openstack-dev] [keystone][nova][cinder][glance][neutron][horizon][policy] defining admin-ness

2017-06-06 Thread Marc Heckmann
On Tue, 2017-06-06 at 17:01 -0400, Erik McCormick wrote:
> On Tue, Jun 6, 2017 at 4:44 PM, Lance Bragstad 
> wrote:
> > 
> > 
> > On Tue, Jun 6, 2017 at 3:06 PM, Marc Heckmann  > t.com>
> > wrote:
> > > 
> > > Hi,
> > > 
> > > On Tue, 2017-06-06 at 10:09 -0500, Lance Bragstad wrote:
> > > 
> > > Also, with all the people involved with this thread, I'm curious
> > > what the
> > > best way is to get consensus. If I've tallied the responses
> > > properly, we
> > > have 5 in favor of option #2 and 1 in favor of option #3. This
> > > week is spec
> > > freeze for keystone, so I see a slim chance of this getting
> > > committed to
> > > Pike [0]. If we do have spare cycles across the team we could
> > > start working
> > > on an early version and get eyes on it. If we straighten out
> > > everyone
> > > concerns early we could land option #2 early in Queens.
> > > 
> > > 
> > > I was the only one in favour of option 3 only because I've spent
> > > a bunch
> > > of time playing with option #1 in the past. As I mentioned
> > > previously in the
> > > thread, if #2 is more in line with where the project is going,
> > > then I'm all
> > > for it. At this point, the admin scope issue has been around long
> > > enough
> > > that Queens doesn't seem that far off.
> > 
> > 
> > From an administrative point-of-view, would you consider option #1
> > or option
> > #2 to better long term?

#2

> > 
> 
> Count me as another +1 for option 2. It's the right way to go long
> term, and we've lived with how it is now long enough that I'm OK
> waiting a release or even 2 more for it with things as is. I think
> option 3 would just muddy the waters.
> 
> -Erik
> 
> > > 
> > > 
> > > -m
> > > 
> > > 
> > > I guess it comes down to how fast folks want it.
> > > 
> > > [0] https://review.openstack.org/#/c/464763/
> > > 
> > > On Tue, Jun 6, 2017 at 10:01 AM, Lance Bragstad  > > com>
> > > wrote:
> > > 
> > > I replied to John, but directly. I'm sending the responses I sent
> > > to him
> > > but with the intended audience on the thread. Sorry for not
> > > catching that
> > > earlier.
> > > 
> > > 
> > > On Fri, May 26, 2017 at 2:44 AM, John Garbutt  > > om>
> > > wrote:
> > > 
> > > +1 on not forcing Operators to transition to something new twice,
> > > even if
> > > we did go for option 3.
> > > 
> > > 
> > > The more I think about this, the more it worries me from a
> > > developer
> > > perspective. If we ended up going with option 3, then we'd be
> > > supporting
> > > both methods of elevating privileges. That means two paths for
> > > doing the
> > > same thing in keystone. It also means oslo.context,
> > > keystonemiddleware, or
> > > any other library consuming tokens that needs to understand
> > > elevated
> > > privileges needs to understand both approaches.
> > > 
> > > 
> > > 
> > > Do we have an agreed non-distruptive upgrade path mapped out yet?
> > > (For any
> > > of the options) We spoke about fallback rules you pass but with a
> > > warning to
> > > give us a smoother transition. I think that's my main objection
> > > with the
> > > existing patches, having to tell all admins to get their token
> > > for a
> > > different project, and give them roles in that project, all
> > > before being
> > > able to upgrade.
> > > 
> > > 
> > > Thanks for bringing up the upgrade case! You've kinda described
> > > an upgrade
> > > for option 1. This is what I was thinking for option 2:
> > > 
> > > - deployment upgrades to a release that supports global role
> > > assignments
> > > - operator creates a set of global roles (i.e. global_admin)
> > > - operator grants global roles to various people that need it
> > > (i.e. all
> > > admins)
> > > - operator informs admins to create globally scoped tokens
> > > - operator rolls out necessary policy changes
> > > 
> > > If I'm thinking about this properly, nothing would change at the
> > > project-scope level for existing users (who don'

Re: [Openstack-operators] [openstack-dev] [keystone][nova][cinder][glance][neutron][horizon][policy] defining admin-ness

2017-06-06 Thread Marc Heckmann
Hi,

On Tue, 2017-06-06 at 10:09 -0500, Lance Bragstad wrote:
Also, with all the people involved with this thread, I'm curious what the best 
way is to get consensus. If I've tallied the responses properly, we have 5 in 
favor of option #2 and 1 in favor of option #3. This week is spec freeze for 
keystone, so I see a slim chance of this getting committed to Pike [0]. If we 
do have spare cycles across the team we could start working on an early version 
and get eyes on it. If we straighten out everyone concerns early we could land 
option #2 early in Queens.

I was the only one in favour of option 3 only because I've spent a bunch of 
time playing with option #1 in the past. As I mentioned previously in the 
thread, if #2 is more in line with where the project is going, then I'm all for 
it. At this point, the admin scope issue has been around long enough that 
Queens doesn't seem that far off.

-m


I guess it comes down to how fast folks want it.

[0] https://review.openstack.org/#/c/464763/

On Tue, Jun 6, 2017 at 10:01 AM, Lance Bragstad 
mailto:lbrags...@gmail.com>> wrote:
I replied to John, but directly. I'm sending the responses I sent to him but 
with the intended audience on the thread. Sorry for not catching that earlier.


On Fri, May 26, 2017 at 2:44 AM, John Garbutt 
mailto:j...@johngarbutt.com>> wrote:
+1 on not forcing Operators to transition to something new twice, even if we 
did go for option 3.


The more I think about this, the more it worries me from a developer 
perspective. If we ended up going with option 3, then we'd be supporting both 
methods of elevating privileges. That means two paths for doing the same thing 
in keystone. It also means oslo.context, keystonemiddleware, or any other 
library consuming tokens that needs to understand elevated privileges needs to 
understand both approaches.


Do we have an agreed non-distruptive upgrade path mapped out yet? (For any of 
the options) We spoke about fallback rules you pass but with a warning to give 
us a smoother transition. I think that's my main objection with the existing 
patches, having to tell all admins to get their token for a different project, 
and give them roles in that project, all before being able to upgrade.


Thanks for bringing up the upgrade case! You've kinda described an upgrade for 
option 1. This is what I was thinking for option 2:

- deployment upgrades to a release that supports global role assignments
- operator creates a set of global roles (i.e. global_admin)
- operator grants global roles to various people that need it (i.e. all admins)
- operator informs admins to create globally scoped tokens
- operator rolls out necessary policy changes

If I'm thinking about this properly, nothing would change at the project-scope 
level for existing users (who don't need a global role assignment). I'm hoping 
someone can help firm ^ that up or improve it if needed.


Thanks,
johnthetubaguy

On Fri, 26 May 2017 at 08:09, Belmiro Moreira 
mailto:moreira.belmiro.email.li...@gmail.com>>
 wrote:
Hi,
thanks for bringing this into discussion in the Operators list.

Option 1 and 2 and not complementary but complety different.
So, considering "Option 2" and the goal to target it for Queens I would prefer 
not going into a migration path in
Pike and then again in Queens.

Belmiro

On Fri, May 26, 2017 at 2:52 AM, joehuang 
mailto:joehu...@huawei.com>> wrote:
I think a option 2 is better.

Best Regards
Chaoyi Huang (joehuang)

From: Lance Bragstad [lbrags...@gmail.com]
Sent: 25 May 2017 3:47
To: OpenStack Development Mailing List (not for usage questions); 
openstack-operators@lists.openstack.org
Subject: Re: [openstack-dev] 
[keystone][nova][cinder][glance][neutron][horizon][policy] defining admin-ness

I'd like to fill in a little more context here. I see three options with the 
current two proposals.

Option 1

Use a special admin project to denote elevated privileges. For those unfamiliar 
with the approach, it would rely on every deployment having an "admin" project 
defined in configuration [0].

How it works:

Role assignments on this project represent global scope which is denoted by a 
boolean attribute in the token response. A user with an 'admin' role assignment 
on this project is equivalent to the global or cloud administrator. Ideally, if 
a user has a 'reader' role assignment on the admin project, they could have 
access to list everything within the deployment, pending all the proper changes 
are made across the various services. The workflow requires a special project 
for any sort of elevated privilege.

Pros:
- Almost all the work is done to make keystone understand the admin project, 
there are already several patches in review to other projects to consume this
- Operators can create roles and assign them to the admin_project as needed 
after the upgrade to give proper global scope to their user

Re: [Openstack-operators] [openstack-dev] [keystone][nova][cinder][glance][neutron][horizon][policy] defining admin-ness

2017-05-25 Thread Marc Heckmann
See below.

On Thu, 2017-05-25 at 15:49 -0500, Lance Bragstad wrote:


On Thu, May 25, 2017 at 2:36 PM, Marc Heckmann 
mailto:marc.heckm...@ubisoft.com>> wrote:
First of all @Lance, thanks for taking the time to write and summarize this for 
us. It's much appreciated.


Absolutely! it helps me think about it, too.


While I'm not aware of all the nuances, based on my own testing, I feel that we 
are really close with option 1.

That being said, as you already stated, option 2 is clearly more inline with 
the idea of having a "global" Cloud Admin role. So long term, #2 is more 
desirable.

Given the two sentences above, I certainly would prefer option 3 so that we can 
have a usable solution quickly. I certainly will continue to test and provide 
feedback for the option 1 part.



It sounds like eventually migrating everything from the is_admin_project to 
true global roles is a migration you're willing to make. This might be a loaded 
question and it will vary across deployments, but how long would you expect 
that migration to take for you're specific deployment(s)?


Maybe I'm over-simplifying, but if properly documented I would expect there to 
be a cut-over release at some point where we would need to switchover and 
create the proper globally scoped role(s). I guess we could live with 
is_admin_project for 2-3 releases in the interim.

-m


-m




On Thu, 2017-05-25 at 10:42 +1200, Adrian Turjak wrote:


On 25/05/17 07:47, Lance Bragstad wrote:

Option 2

Implement global role assignments in keystone.

How it works:

Role assignments in keystone can be scoped to global context. Users can then 
ask for a globally scoped token

Pros:
- This approach represents a more accurate long term vision for role 
assignments (at least how we understand it today)
- Operators can create global roles and assign them as needed after the upgrade 
to give proper global scope to their users
- It's easier to explain global scope using global role assignments instead of 
a special project
- token.is_global = True and token.role = 'reader' is easier to understand than 
token.is_admin_project = True and token.role = 'reader'
- A global token can't be associated to a project, making it harder for 
operations that require a project to consume a global token (i.e. I shouldn't 
be able to launch an instance with a globally scoped token)

Cons:
- We need to start from scratch implementing global scope in keystone, steps 
for this are detailed in the spec



On Wed, May 24, 2017 at 10:35 AM, Lance Bragstad 
mailto:lbrags...@gmail.com>> wrote:
Hey all,

To date we have two proposed solutions for tackling the admin-ness issue we 
have across the services. One builds on the existing scope concepts by scoping 
to an admin project [0]. The other introduces global role assignments [1] as a 
way to denote elevated privileges.

I'd like to get some feedback from operators, as well as developers from other 
projects, on each approach. Since work is required in keystone, it would be 
good to get consensus before spec freeze (June 9th). If you have specific 
questions on either approach, feel free to ping me or drop by the weekly policy 
meeting [2].

Thanks!


Please option 2. The concept of being an "admin" while you are only scoped to a 
project is stupid when that admin role gives you super user power yet you only 
have it when scoped to just that project. That concept never really made sense. 
Global scope makes so much more sense when that is the power the role gives.

At same time, it kind of would be nice to make scope actually matter. As admin 
you have a role on Project X, yet you can now (while scoped to this project) 
pretty much do anything anywhere! I think global roles is a great step in the 
right direction, but beyond and after that we need to seriously start looking 
at making scope itself matter, so that giving someone 'admin' or some such on a 
project actually only gives them something akin to project_admin or some sort 
of admin-lite powers scoped to that project and sub-projects. That though falls 
into the policy work being done, but should be noted, as it is related.

Still, at least global scope for roles make the superuser case make some actual 
sense, because (and I can't speak for other deployers), we have one project 
pretty much dedicated as an "admin_project" and it's just odd to actually need 
to give our service users roles in a project when that project is empty and a 
pointless construct for their purpose.

Also thanks for pushing this! I've been watching your global roles spec review 
in hopes we'd go down that path. :)

-Adrian

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/opens

Re: [Openstack-operators] preferred option to fix long-standing user-visible bug in nova?

2017-05-25 Thread Marc Heckmann
Sorry for the late reply, but see below.

On Mon, 2017-05-15 at 11:46 -0600, Chris Friesen wrote:
> Hi,
> 
> In Mitaka nova introduced the "cpu_thread_policy" which can be
> specified in 
> flavor extra-specs.  In the original spec, and in the original
> implementation, 
> not specifying the thread policy in the flavor was supposed to be
> equivalent to 
> specifying a policy of "prefer", and in both cases if the image set a
> policy 
> then nova would use the image policy.
> 
> In Newton, the code was changed to fix a bug but there was an
> unforeseen side 
> effect.  Now the behaviour is different depending on whether the
> flavor 
> specifies no policy at all or specifies a policy of
> "prefer".   Specifically, if 
> the flavor doesn't specify a policy at all and the image does then
> we'll use the 
> flavor policy.  However, if the flavor specifies a policy of "prefer"
> and the 
> image specifies a different policy then we'll use the flavor policy.
> 
> This is clearly a bug (tracked as part of bug #1687077), but it's now
> been out 
> in the wild for two releases (Newton and Ocata).
> 
> What do operators think we should do?  I see two options, neither of
> which is 
> really ideal:
> 
> 1) Decide that the "new" behaviour has been out in the wild long
> enough to 
> become the defacto standard and update the docs to reflect
> this.  This breaks 
> the "None and 'prefer' are equivalent" model that was originally
> intended.
> 
> 2) Fix the bug to revert back to the original behaviour and backport
> the fix to 
> Ocata.  Backporting to Newton might not happen since it's in phase
> II 
> maintenance.  This could potentially break anyone that has come to
> rely on the 
> "new" behaviour.

Whatever will or has been chosen should match the documentation.
Personally, we would never do anything other than specifying the policy
in the flavor as our flavors are associated w/ HW  profiles but I could
see how other operators might manage things differently. That being
said, that sort of thing should not necessarily be user controlled and
I haven't really explored Glance property protections.. 

So from my point of view "cpu_thread_policy" set in the flavor should
take precedence over anything else.

-m

> 
> Either change is trivial from a dev standpoint, so it's really an
> operator 
> issue--what makes the most sense for operators/users?
> 
> Chris
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato
> rs
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [keystone][nova][cinder][glance][neutron][horizon][policy] defining admin-ness

2017-05-25 Thread Marc Heckmann
First of all @Lance, thanks for taking the time to write and summarize this for 
us. It's much appreciated.

While I'm not aware of all the nuances, based on my own testing, I feel that we 
are really close with option 1.

That being said, as you already stated, option 2 is clearly more inline with 
the idea of having a "global" Cloud Admin role. So long term, #2 is more 
desirable.

Given the two sentences above, I certainly would prefer option 3 so that we can 
have a usable solution quickly. I certainly will continue to test and provide 
feedback for the option 1 part.

-m




On Thu, 2017-05-25 at 10:42 +1200, Adrian Turjak wrote:


On 25/05/17 07:47, Lance Bragstad wrote:

Option 2

Implement global role assignments in keystone.

How it works:

Role assignments in keystone can be scoped to global context. Users can then 
ask for a globally scoped token

Pros:
- This approach represents a more accurate long term vision for role 
assignments (at least how we understand it today)
- Operators can create global roles and assign them as needed after the upgrade 
to give proper global scope to their users
- It's easier to explain global scope using global role assignments instead of 
a special project
- token.is_global = True and token.role = 'reader' is easier to understand than 
token.is_admin_project = True and token.role = 'reader'
- A global token can't be associated to a project, making it harder for 
operations that require a project to consume a global token (i.e. I shouldn't 
be able to launch an instance with a globally scoped token)

Cons:
- We need to start from scratch implementing global scope in keystone, steps 
for this are detailed in the spec



On Wed, May 24, 2017 at 10:35 AM, Lance Bragstad 
mailto:lbrags...@gmail.com>> wrote:
Hey all,

To date we have two proposed solutions for tackling the admin-ness issue we 
have across the services. One builds on the existing scope concepts by scoping 
to an admin project [0]. The other introduces global role assignments [1] as a 
way to denote elevated privileges.

I'd like to get some feedback from operators, as well as developers from other 
projects, on each approach. Since work is required in keystone, it would be 
good to get consensus before spec freeze (June 9th). If you have specific 
questions on either approach, feel free to ping me or drop by the weekly policy 
meeting [2].

Thanks!


Please option 2. The concept of being an "admin" while you are only scoped to a 
project is stupid when that admin role gives you super user power yet you only 
have it when scoped to just that project. That concept never really made sense. 
Global scope makes so much more sense when that is the power the role gives.

At same time, it kind of would be nice to make scope actually matter. As admin 
you have a role on Project X, yet you can now (while scoped to this project) 
pretty much do anything anywhere! I think global roles is a great step in the 
right direction, but beyond and after that we need to seriously start looking 
at making scope itself matter, so that giving someone 'admin' or some such on a 
project actually only gives them something akin to project_admin or some sort 
of admin-lite powers scoped to that project and sub-projects. That though falls 
into the policy work being done, but should be noted, as it is related.

Still, at least global scope for roles make the superuser case make some actual 
sense, because (and I can't speak for other deployers), we have one project 
pretty much dedicated as an "admin_project" and it's just odd to actually need 
to give our service users roles in a project when that project is empty and a 
pointless construct for their purpose.

Also thanks for pushing this! I've been watching your global roles spec review 
in hopes we'd go down that path. :)

-Adrian

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Marc Heckmann
On Tue, 2017-05-23 at 11:44 -0400, Jay Pipes wrote:
> On 05/23/2017 09:48 AM, Marc Heckmann wrote:
> > For the anti-affinity use case, it's really useful for smaller or
> > medium 
> > size operators who want to provide some form of failure domains to
> > users 
> > but do not have the resources to create AZ's at DC or even at rack
> > or 
> > row scale. Don't forget that as soon as you introduce AZs, you need
> > to 
> > grow those AZs at the same rate and have the same flavor offerings 
> > across those AZs.
> > 
> > For the retry thing, I think enough people have chimed in to echo
> > the 
> > general sentiment.
> 
> The purpose of my ML post was around getting rid of retries, not the 
> usefulness of affinity groups. That seems to have been missed,
> however.
> 
> Do you or David have any data on how often you've actually seen
> retries 
> due to the last-minute affinity constraint violation in real world 
> production?

No I don't have any data unfortunately. Mostly because we haven't
advertised the feature to end users yet. We only now are in a position
to do so because, previously there was a bug causing nova-scheduler to
grow in RAM usage if the required config flag to enable the feature was
 on.

I have however seen retry's triggered on hypervisors for other reasons.
I can try to dig up why specifically if that would be useful. I will
add that we do not use Ironic at all.

-m



> 
> Thanks,
> -jay
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato
> rs
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

2017-05-23 Thread Marc Heckmann
For the anti-affinity use case, it's really useful for smaller or medium size 
operators who want to provide some form of failure domains to users but do not 
have the resources to create AZ's at DC or even at rack or row scale. Don't 
forget that as soon as you introduce AZs, you need to grow those AZs at the 
same rate and have the same flavor offerings across those AZs.

For the retry thing, I think enough people have chimed in to echo the general 
sentiment.

-m


On Mon, 2017-05-22 at 16:30 -0600, David Medberry wrote:
I have to agree with James

My affinity and anti-affinity rules have nothing to do with NFV. a-a is almost 
always a failure domain solution. I'm not sure we have users actually choosing 
affinity (though it would likely be for network speed issues and/or some sort 
of badly architected need or perceived need for coupling.)

On Mon, May 22, 2017 at 12:45 PM, James Penick 
mailto:jpen...@gmail.com>> wrote:


On Mon, May 22, 2017 at 10:54 AM, Jay Pipes 
mailto:jaypi...@gmail.com>> wrote:
Hi Ops,

Hi!


For class b) causes, we should be able to solve this issue when the placement 
service understands affinity/anti-affinity (maybe Queens/Rocky). Until then, we 
propose that instead of raising a Reschedule when an affinity constraint was 
last-minute violated due to a racing scheduler decision, that we simply set the 
instance to an ERROR state.

Personally, I have only ever seen anti-affinity/affinity use cases in relation 
to NFV deployments, and in every NFV deployment of OpenStack there is a VNFM or 
MANO solution that is responsible for the orchestration of instances belonging 
to various service function chains. I think it is reasonable to expect the MANO 
system to be responsible for attempting a re-launch of an instance that was set 
to ERROR due to a last-minute affinity violation.

**Operators, do you agree with the above?**

I do not. My affinity and anti-affinity use cases reflect the need to build 
large applications across failure domains in a datacenter.

Anti-affinity: Most anti-affinity use cases relate to the ability to guarantee 
that instances are scheduled across failure domains, others relate to security 
compliance.

Affinity: Hadoop/Big data deployments have affinity use cases, where nodes 
processing data need to be in the same rack as the nodes which house the data. 
This is a common setup for large hadoop deployers.

I recognize that large Ironic users expressed their concerns about IPMI/BMC 
communication being unreliable and not wanting to have users manually retry a 
baremetal instance launch. But, on this particular point, I'm of the opinion 
that Nova just do one thing and do it well. Nova isn't an orchestrator, nor is 
it intending to be a "just continually try to get me to this eventual state" 
system like Kubernetes.

Kubernetes is a larger orchestration platform that provides autoscale. I don't 
expect Nova to provide autoscale, but

I agree that Nova should do one thing and do it really well, and in my mind 
that thing is reliable provisioning of compute resources. Kubernetes does 
autoscale among other things. I'm not asking for Nova to provide Autoscale, I 
-AM- asking OpenStack's compute platform to provision a discrete compute 
resource reliably. This means overcoming common and simple error cases. As a 
deployer of OpenStack I'm trying to build a cloud that wraps the chaos of 
infrastructure, and present a reliable facade. When my users issue a boot 
request, I want to see if fulfilled. I don't expect it to be a 100% guarantee 
across any possible failure, but I expect (and my users demand) that my 
"Infrastructure as a service" API make reasonable accommodation to overcome 
common failures.


If we removed Reschedule for class c) failures entirely, large Ironic deployers 
would have to train users to manually retry a failed launch or would need to 
write a simple retry mechanism into whatever client/UI that they expose to 
their users.

**Ironic operators, would the above decision force you to abandon Nova as the 
multi-tenant BMaaS facility?**


 I just glanced at one of my production clusters and found there are around 7K 
users defined, many of whom use OpenStack on a daily basis. When they issue a 
boot call, they expect that request to be honored. From their perspective, if 
they call AWS, they get what they ask for. If you remove reschedules you're not 
just breaking the expectation of a single deployer, but for my thousands of 
engineers who, every day, rely on OpenStack to manage their stack.

I don't have a "i'll take my football and go home" mentality. But if you remove 
the ability for the compute provisioning API to present a reliable facade over 
infrastructure, I have to go write something else, or patch it back in. Now 
it's even harder for me to get and stay current with OpenStack.

During the summit the agreement was, if I recall, that reschedules would happen 
within a cell, and not between the parent and cell. That was complet

Re: [Openstack-operators] Update on Nova scheduler poor performance with Ironic

2016-08-31 Thread Marc Heckmann
Hi,

On Wed, 2016-08-31 at 13:46 -0400, Mathieu Gagné wrote:
> On Wed, Aug 31, 2016 at 1:33 AM, Joshua Harlow  > wrote:
> > 
> > > 
> > > 
> > > Enabling this option will make it so Nova scheduler loads
> > > instance
> > > info asynchronously at start up. Depending on the number of
> > > hypervisors and instances, it can take several minutes. (we are
> > > talking about 10-15 minutes with 600+ Ironic nodes, or ~1s per
> > > node in
> > > our case)
> > 
> > This feels like a classic thing that could just be made better by a
> > scatter/gather (in threads or other?) to the database or other
> > service. 1s
> > per node seems ummm, sorta bad and/or non-optimal (I wonder if this
> > is low
> > hanging fruit to improve this). I can travel around the world 7.5
> > times in
> > that amount of time (if I was a light beam, haha).
> This behavior was only triggered under the following conditions:
> - Nova Kilo
> - scheduler_tracks_instance_changes=False
> 
> So someone installing the latest Nova version won't have this issue.
> Furthermore, if you enable scheduler_tracks_instance_changes,
> instances will be loaded asynchronously by chunk when nova-scheduler
> starts. (10 compute nodes at a time) But Jim found that enabling this
> config causes OOM errors.

Somewhat of thread hijack, but it's funny that this comes up now. We've
been getting OOMs on some our Liberty controllers in the past couple of
weeks in part because of Nova Scheduler memory usage (10GiB + right at
startup). 

We just now disabled "scheduler_tracks_instance_changes" and I confirm
that mem usage has become reasonable again.

I admit that we're having  a hard time figuring out exactly which
scheduler filters rely on the option though. 


 
> 
> So I investigated and found a very interesting bug presents if you
> run
> Nova in the Ironic context or anything where a single nova-compute
> process manages multiple or LOT of hypervisors. As explained
> previously, Nova loads the list of instances per compute node to help
> with placement decisions:
> https://github.com/openstack/nova/blob/kilo-eol/nova/scheduler/host_m
> anager.py#L590
> 
> Again, in Ironic context, a single nova-compute host manages ALL
> instances. This means this specific line found in _add_instance_info
> will load ALL instances managed by that single nova-compute host.
> What's even funnier is that _add_instance_info is called from
> get_all_host_states for every compute nodes (hypervisors), NOT
> nova-compute host. This means if you have 2000 hypervisors (Ironic
> nodes), this function will load 2000 instances per hypervisor found
> in
> get_all_host_states, ending with an overall process loading 2000^2
> rows from the database. Now I know why Jim Roll complained about OOM
> error. objects.InstanceList.get_by_host_and_node should be used
> instead, NOT objects.InstanceList.get_by_host. Will report this bug
> soon.
> 
> 
> > 
> > > 
> > > 
> > > There is a lot of side-effects to using it though. For example:
> > > - you can only run ONE nova-scheduler process since cache state
> > > won't
> > > be shared between processes and you don't want instances to be
> > > scheduled twice to the same node/hypervisor.
> > 
> > Out of curiosity, do you have only one scheduler process active and
> > passive
> > scheduler process(es) idle waiting to become active if the other
> > schedule
> > dies? (pretty simply done via something like
> > https://kazoo.readthedocs.io/en/latest/api/recipe/election.html) Or
> > do you
> > have some manual/other process that kicks off a new scheduler if
> > the 'main'
> > one dies?
> We use the HA feature of our virtualization infrastructure to handle
> failover. This is a compromise we are willing to accept for now. I
> agree that now everybody has access to this kind of feature in their
> infra.
> 
> 
> > 
> > > 
> > > 2) Run a single nova-compute service
> > > 
> > > I strongly suggest you DO NOT run multiple nova-compute services.
> > > If
> > > you do, you will have duplicated hypervisors loaded by the
> > > scheduler
> > > and you could end up with conflicting scheduling. You will also
> > > have
> > > twice as much hypervisors to load in the scheduler.
> > 
> > This seems scary (whenever I hear run a single of anything in a
> > *cloud*
> > platform, that makes me shiver). It'd be nice if we at least
> > recommended
> > people run https://kazoo.readthedocs.io/en/latest/api/recipe/electi
> > on.html
> > or have some active/passive automatic election process to handle
> > that single
> > thing dying (which they usually do, at odd times of the night).
> > Honestly I'd
> > (personally) really like to get to the bottom of how we as a group
> > of
> > developers ever got to the place where software was released
> > (and/or even
> > recommended to be used) in a *cloud* platform that ever required
> > only one of
> > anything to be ran (that's crazy bonkers, and yes there is history
> > here, but
> > damn, it just feels rotten as all hell, for lack of better words)

Re: [Openstack-operators] [keystone] Federation, domain mappings and v3 policy.json

2016-06-14 Thread Marc Heckmann
See below.

On Mon, 2016-06-13 at 22:12 -0400, Adam Young wrote:
> On 06/13/2016 07:08 PM, Marc Heckmann wrote:
> > 
> > Hi,
> > 
> > I currently have a lab setup using SAML2 federation with Microsoft
> > ADFS.
> > 
> > The federation part itself works wonderfully. However, I'm also
> > trying
> > to use the new project as domains feature along with the Keystone
> > v3
> > sample policy.json file for Keystone:
> > 
> > The idea is that I should be able to map users who are in a
> > specific
> > group in Active Directory to the admin role in a specific domain.
> > This
> > should work for Keystone with the sample v3 policy (let's ignore
> > problems with the admin role in other projects such as Nova). In
> > this
> > case I'm using the new project as domains feature, but I suspect
> > that
> > the problem would apply to regular domains as well.
> > 
> > The mapping works properly with the important caveat that the user
> > domain does not match the domain of the project(s) that I'm
> > assigning
> > the admin role to. Users who come in from Federation always belong
> > to
> > the "Federated" domain. This is the case even if I pre-create the
> > users
> > locally in a specific domain. This breaks sample v3 policy.json
> > because
> > the rules expect the user's domain to match the project's domain.
> > 
> > Does anyone know if there is anyway to achieve what I'm trying to
> > do
> > when using Federation?
> Can you post your mapping file?  Might be easier to tell from that
> what 
> you are trying to do?

Here is the simple mapping that I started with. The "upn" and "groups"
types are defined from the SAML claims using a mod_auth_mellon config
(see below). The mapping between ADFS groups and local Keystone groups
works great. 

  [
{
  "local": [
{
  "user": {
"name": "{0}"
  }
},
{
  "groups": "{1}",
  "domain": {
"id": "default"
  }
}
  ],
  "remote": [
{
  "type": "upn"
},
{
  "type": "groups"
}
  ]
}
  ]

Here is the group role assignment command that I'm using. The Active
Directory user is a member of the "Beta" AD group. 

"openstack role add --group-domain default --project-domain betaproj --
project adminproj --group Beta admin"

The role assignment works fine.

I then tried to use the following mapping to force the user into the a
specific domain, but it didn't change anything:

  [
{
  "local": [
{
  "user": {
"name": "{0}",
"domain": {
  "name": "betaproj"
}
  }
},
{
  "groups": "{1}",
  "domain": {
"name": "betaproj"
  }
}
  ],
  "remote": [
{
  "type": "upn"
},
{
  "type": "groups"
}
  ]
}
  ]

For completeness, The aforementioned Mellon config:

MellonSetEnvNoPrefix upn http://schemas.xmlsoap.org/ws/2005/05/identity
/claims/upn
MellonSetEnvNoPrefix groups http://schemas.xmlsoap.org/claims/Group
MellonMergeEnvVars On

Thanks again.

-m

> 
> > 
> > 
> > Thanks in advance.
> > 
> > -m
> > 
> > ___
> > OpenStack-operators mailing list
> > OpenStack-operators@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-opera
> > tors
> 
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operato
> rs
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [keystone] Federation, domain mappings and v3 policy.json

2016-06-13 Thread Marc Heckmann

Hi,

I currently have a lab setup using SAML2 federation with Microsoft
ADFS. 

The federation part itself works wonderfully. However, I'm also trying
to use the new project as domains feature along with the Keystone v3
sample policy.json file for Keystone:

The idea is that I should be able to map users who are in a specific
group in Active Directory to the admin role in a specific domain. This
should work for Keystone with the sample v3 policy (let's ignore
problems with the admin role in other projects such as Nova). In this
case I'm using the new project as domains feature, but I suspect that
the problem would apply to regular domains as well.

The mapping works properly with the important caveat that the user
domain does not match the domain of the project(s) that I'm assigning
the admin role to. Users who come in from Federation always belong to
the "Federated" domain. This is the case even if I pre-create the users
locally in a specific domain. This breaks sample v3 policy.json because
the rules expect the user's domain to match the project's domain. 

Does anyone know if there is anyway to achieve what I'm trying to do
when using Federation?

Thanks in advance.

-m

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] CentOS 7 KVM and QEMU 2.+

2015-11-12 Thread Marc Heckmann
Hello,

On Thu, 2015-11-12 at 16:54 +, Leslie-Alexandre DENIS wrote:
> Hello guys,
> 
> I'm struggling at finding a qemu(-kvm) version up-to-date for CentOS 7 with 
> official repositories
> and additional EPEL.
> 
> Currently the only package named qemu-kvm in these repositories is 
> *qemu-kvm-1.5.3-86.el7_1.8.x86_64*, which is a bit outdated.
> 
> As what I understand QEMU merged the forked qemu-kvm into the base code since 
> 1.3 and the Kernel is shipped with KVM module. Theoretically we can just 
> install qemu 2.+ and load KVM in order to use nova-compute with KVM 
> acceleration, right ?
> 
> The problem is that the packages openstack-nova{-compute} have a dependencies 
> with qemu-kvm. For example Fedora ships qemu-kvm as a subpackage of qemu and 
> it appears to be the same in fact, not the forked project [1].
> 
> 
> 
> In a word, guys how do you manage to have a QEMU v2.+ with latest libvirt on 
> your CentOS computes nodes ?
> Is somebody using the qemu packages from oVirt ? [2]

Both the RHEV and RHEL Openstack distributions are using the qemu 2.x
packages. These are based on packages from the oVirt project. 

The RDO rpms work fine with the oVirt qemu packages. This is what we are
using.

You can find more information about which RedHat distro spins get which
version of qemu here:

https://videos.cdn.redhat.com/summit2015/presentations/12752_red-hat-enterprise-virtualization-hypervisor-kvm-now-in-the-future.pdf

-m




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] raw ephemeral disks and qcow2 images

2015-10-23 Thread Marc Heckmann
Hi Daniel,

On Fri, 2015-10-23 at 12:05 +0100, Daniel P. Berrange wrote:
> On Thu, Oct 22, 2015 at 06:01:42PM +0000, Marc Heckmann wrote:
> > Hi,
> > 
> > On Thu, 2015-10-22 at 08:17 -0700, Abel Lopez wrote:
> > > I've actually looked for this for our RBD backed ephemeral instances,
> > > but found the options lacking. I last looked in Juno. 
> > > 
> > > On Thursday, October 22, 2015, Tim Bell  wrote:
> > >  
> > > 
> > > Has anyone had experience with setting up Nova with KVM so it
> > > has raw ephemeral disks but qcow2 images for the VMs ? We’ve
> > > got very large ephemeral disks and could benefit from the
> > > performance of raw volumes for this.
> > 
> > We looked into this for the very same reasons and it doesn't seem to be
> > supported.
> > 
> > That being said, I'm fearful of the boot time performance impact of
> > using RAW for ephemeral.
> 
> There should be no performance impact of using fully pre-allocated
> raw images. Any decent modern filesystem (ext4, xfs) supports fallocate
> which allows you to pre-allocate an arbitrary sizes plain file in
> constant time.

I was actually more afraid of the time it would take mkfs. to
create the filesystem. I was thinking that it might be non-negligible on
larger ephemeral sizes (approaching 1TB) going up to a minute or more.

I just tested this and it doesn't seem to be an issue at all (obviously
--fast is used for NTFS).

This would seem to make a setting to use raw only for ephemeral slightly
more attractive.

That being said, if Nova gets fixed to pre-initialize qcow2 internally,
then the raison-d'être for raw might become less important. I'm assuming
the Nova fix for qcow2 would be easier to implement? Any downsides to
pre-initialized qcow2?

In any case, thanks for the useful info!

> 
> By default Nova does *not* preallocate images - so both raw & qcow2
> will grow-on-demand as guest writes to sectors.
> 
> If the "preallocate_images" nova.conf option is set to "space", then
> Nova will call fallocate for both raw & qcow2 images to fully allocate
> the maximum space they require. There's no appreciable time overhead
> for this - it just prevents you overcommitting disk obviously.
> 
> If you fully preallocate a qcow2 image its performance should pretty
> much match raw images (modulo the l2-cache-size item mentioned
> below), unfortunately, the way Nova is preallocating qcow2 images is
> wrong - it preallocates the space on disk, but does not pre-initialize
> the internal qcow2 data structures to match :-( So we need to fix
> that for qcow2 in Nova.
> 
> > I suggest you check out the following presentation about qcow2
> > performance if you haven't already done so:
> > 
> > http://events.linuxfoundation.org/sites/events/files/slides/p0.pp_.pdf
> > 
> > I think it would be worthwhile for Openstack (and libvirt if required)
> > to support the "l2-cache-size" option for qcow2.
> 
> Yep, we should look at supporting that.
> 
> 
> Regards,
> Daniel


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] raw ephemeral disks and qcow2 images

2015-10-22 Thread Marc Heckmann
Hi,

On Thu, 2015-10-22 at 08:17 -0700, Abel Lopez wrote:
> I've actually looked for this for our RBD backed ephemeral instances,
> but found the options lacking. I last looked in Juno. 
> 
> On Thursday, October 22, 2015, Tim Bell  wrote:
>  
> 
> Has anyone had experience with setting up Nova with KVM so it
> has raw ephemeral disks but qcow2 images for the VMs ? We’ve
> got very large ephemeral disks and could benefit from the
> performance of raw volumes for this.

We looked into this for the very same reasons and it doesn't seem to be
supported.

That being said, I'm fearful of the boot time performance impact of
using RAW for ephemeral. 

I suggest you check out the following presentation about qcow2
performance if you haven't already done so:

http://events.linuxfoundation.org/sites/events/files/slides/p0.pp_.pdf

I think it would be worthwhile for Openstack (and libvirt if required)
to support the "l2-cache-size" option for qcow2. 

-m

> 
>  
> 
> Tim
> 
>  
> 
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Security around enterprise credentials and OpenStack API

2015-08-29 Thread Marc Heckmann
Sorry for the repost, it seems this mail was in the outbox of another machine 
that I hadn't turned on in a while.

Please ignore. 

> On Aug 29, 2015, at 11:56, Marc Heckmann  wrote:
> 
> Hi all,
> 
> I was going to post a similar question this evening, so I decided to just 
> bounce on Mathieu’s question. See below inline.
> 
>> On Mar 31, 2015, at 8:35 PM, Matt Fischer  wrote:
>> 
>> Mathieu,
>> 
>> We LDAP (AD) with a fallback to MySQL. This allows us to store service 
>> accounts (like nova) and "team accounts" for use in Jenkins/scripts etc in 
>> MySQL. We only do Identity via LDAP and we have a forked copy of this driver 
>> (https://github.com/SUSE-Cloud/keystone-hybrid-backend) to do this. We don't 
>> have any permissions to write into LDAP or move people into groups, so we 
>> keep a copy of users locally for purposes of user-list operations. The only 
>> interaction between OpenStack and LDAP for us is when that driver tries a 
>> bind.
>> 
>> 
>> 
>>> On Tue, Mar 31, 2015 at 6:06 PM, Mathieu Gagné  wrote:
>>> Hi,
>>> 
>>> Lets say I wish to use an existing enterprise LDAP service to manage my
>>> OpenStack users so I only have one place to manage users.
>>> 
>>> How would you manage authentication and credentials from a security
>>> point of view? Do you tell your users to use their enterprise
>>> credentials or do you use an other method/credentials?
> 
> We too have integration with enterprise credentials through LDAP, but as you 
> suggest, we certainly don’t want users to use those credentials in scripts or 
> store them on instances. Instead we have a custom Web portal where they can 
> create separate Keystone credentials for their project/tenant which are 
> stored in Keystone’s MySQL database. Our LDAP integration actually happens at 
> a level above Keystone. We don’t actually let users acquire Keystone tokens 
> using their LDAP accounts.
> 
> We’re not really happy with this solution, it’s a hack and we are looking to 
> revamp it entirely. The problem is that I never have been able to find a 
> clear answer on how to do this with Keystone. 
> 
> I’m actually quite partial to the way AWS IAM works. Especially the instance 
> “role" features. Roles in AWS IAM is similar to TRUSTS in Keystone except 
> that it is integrated into the instance metadata. It’s pretty cool. 
> 
> Other than that, RBAC policies in Openstack get us a good way towards IAM 
> like functionality. We just need a policy editor in Horizon.
> 
> Anyway, the problem is around delegation of credentials which are used 
> non-interactively. We need to limit what those users can do (through RBAC 
> policy) but also somehow make the credentials ephemeral.
> 
> If someone (Keystone developer?) could point us in the right direction, that 
> would be great.
> 
> Thanks in advance.
> 
>>> 
>>> The reason is that (usually) enterprise credentials also give access to
>>> a whole lot of systems other than OpenStack itself. And it goes without
>>> saying that I'm not fond of the idea of storing my password in plain
>>> text to be used by some scripts I created.
>>> 
>>> What's your opinion/suggestion? Do you guys have a second credential
>>> system solely used for OpenStack?
>>> 
>>> --
>>> Mathieu
>>> 
>>> ___
>>> OpenStack-operators mailing list
>>> OpenStack-operators@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> ___
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Dynamic Policy

2015-08-05 Thread Marc Heckmann
Echoing what others have said, we too have an abstraction layer in the
form of a custom UI to allow project "owners" to create/delete users.

As for your questions Adam, having policy in the Keystone database as
data seems like a no brainer. As you suggest it enables us to do so much
more.

For problem #2, that's already a problem today, so I don't see how it
has an impact (other than the problem of giving the keys to end-users).
In fact, I'd argue that it's an even bigger problem today as an admin
(i.e admin everywhere) can delete a project with running resources. A
"project_admin" role limited in scope could be delegated the rights to
create/delete users but not projects.

-m

On Wed, 2015-08-05 at 18:15 +, Kris G. Lindgren wrote:
> See inline.
> 
>  
> Kris Lindgren
> Senior Linux Systems Engineer
> GoDaddy, LLC.
> 
> 
> 
> On 8/5/15, 11:19 AM, "Adam Young"  wrote:
> 
> >On 08/05/2015 12:01 PM, Kris G. Lindgren wrote:
> >> We ran into this as well.
> >>
> >> What we did is create an external to keystone api, that we expose to our
> >> end users via a UI.  The api will let user create projects (with a
> >> specific defined quota) and also add users with the "project admins"
> >>role
> >> to the project.  Those admins can add/remove users from the project and
> >> also delete the project.  You can also be a "member", where you have the
> >> ability to spin up vm's under the project but not add/remove users or
> >> remove the project.  We also do some other stuff to clean up items in a
> >> project before its deleted.  We are working to move this functionality
> >>out
> >> of the current external API and into keystone.  I believe we were going
> >>to
> >> look at waffle-haus to add a paste filter to intercept the project
> >>create
> >> calls and do the needful.
> >>
> >> We also modified the policy.json files for the services that we care
> >>about
> >> to add the new roles that we created.
> >
> >Were you working around limitation by building an external system to
> >Keystone to provide a means of delegating the ability to assign roles?
> 
> Yes. Basically we wrapped a function that required admin permissions in an
> keystone API - so that non-admin users could do some admin level tasks.
> Also, we have ran into the admin on a project in keystone == admin
> everywhere problem.  We were using a created "project_admin" role to get
> around that limitation.
> 
> >
> >Would you have wanted to synchronize the roles you defined in your
> >Keytone instance with the policy files directly?  Did you have to modify
> >policy files beyond the Keystone policy file?
> 
> Yes. And Yes, we did modify other services policy files as well to handle
> the newly created project_admin role.
> 
> 
> >
> >> 
> >>   
> >> Kris Lindgren
> >> Senior Linux Systems Engineer
> >> GoDaddy, LLC.
> >>
> >>
> >>
> >>
> >> On 8/5/15, 9:39 AM, "Fox, Kevin M"  wrote:
> >>
> >>> As an Op, I've ran into this problem and keep running into it. I would
> >>> very much like a solution.
> >>>
> >>> Its also quite related to the nova instance user issue I've been
> >>>working
> >>> on, that's needed by the App Catalog project.
> >>>
> >>> So, yes, please keep fighting the good fight.
> >>>
> >>> Thanks,
> >>> Kevin
> >>> 
> >>> From: Adam Young [ayo...@redhat.com]
> >>> Sent: Wednesday, August 05, 2015 7:50 AM
> >>> To: openstack-operators@lists.openstack.org
> >>> Subject: [Openstack-operators] Dynamic Policy
> >>>
> >>> How do you delegate the ability to delegate?
> >>>
> >>> Lets say you are running a large cloud (purely hypothetical here) and
> >>> you want to let a user manage their own project.  They are "admin" but
> >>> they should be able to invite or eject people.
> >>>
> >>> In order to do this, an ordinary user needs to be able to make a role
> >>> assignment.  However, Keystone does not support this today:  if you are
> >>> admin somewhere, you are admin everywhere:
> >>>
> >>> https://bugs.launchpad.net/keystone/+bug/968696
> >>>
> >>> Access control in OpenStack is controlled by Policy.  An informal
> >>>survey
> >>> of operators shows that most people run with the stock policies such as
> >>> the Nova policy:
> >>>
> >>> http://git.openstack.org/cgit/openstack/nova/tree/etc/nova/policy.json
> >>>
> >>> In order to scope admin to the proejct, we would need to have rules
> >>>that
> >>> enforce this scoping:  Evey rule should check that the project_id in
> >>>the
> >>> token provided matches the  project_id of the resource of the API.
> >>>
> >>> If we manage to get the policy files rewritten this way, We then need a
> >>> way to limit what roles a user can assign.The default mechanism
> >>> would say that a user needs to have an administrative role on the
> >>> project (or domain) that they want to assign the role on.
> >>>
> >>> I don't think anything I've written thus far is controversial. Then,
> 

Re: [Openstack-operators] Sharing resources across OpenStack instances

2015-04-22 Thread Marc Heckmann
[top posting on this one]

Hi,

When you write "Openstack instances", I'm assuming that you're referring
to Openstack deployments right? 

We have different deployments based on geographic regions for
performance concerns but certainly not by department. Each Openstack
project is tied to a department/project budget code and re-billed
accordingly based on Ceilometer data. No need to have separate
deployments for that. Central IT owns all the Cloud infra.

In the separate deployments the only thing that we aim to have shared is
Swift and Keystone (it's not the case for us right now).

Glance images need to be identical between deployments but that's easily
achievable through automation both for the operator and the end user.

We make sure that the users understand that these are separate
regions/Clouds akin to AWS regions.

-m

On Wed, 2015-04-22 at 13:50 +, Fox, Kevin M wrote:
> This is a case for a cross project cloud (institutional?). It costs
> more to run two little clouds then one bigger one. Both in terms of
> man power, and in cases like these. under utilized resources.
> 
> #3 is interesting though. If there is to be an openstack app catalog,
> it would be inportant to be able to pull the needed images from
> outside the cloud easily.
> 
> Thanks,
> Kevin 
>  
> 
> __
> From: Adam Young
> Sent: Wednesday, April 22, 2015 6:32:17 AM
> To: openstack-operators@lists.openstack.org
> Subject: [Openstack-operators] Sharing resources across OpenStack
> instances
> 
> 
> Its been my understanding that many people are deploying small
> OpenStack 
> instances as a way to share the Hardware owned by their particular
> team, 
> group, or department.  The Keystone instance represents ownership,
> and 
> the identity of the users comes from a corporate LDAP server.
> 
> Is there much demand for the following scenarios?
> 
> 1.  A project team crosses organizational boundaries and has to work 
> with VMs in two separate OpenStack instances.  They need to set up a 
> network that requires talking to two neutron instances.
> 
> 2.  One group manages a powerful storage array.  Several OpenStack 
> instances need to be able to mount volumes from this array.
> Sometimes, 
> those volumes have to be transferred from VMs running in one instance
> to 
> another.
> 
> 3.  A group is producing nightly builds.  Part of this is an image 
> building system that posts to glance.  Ideally, multiple OpenStack 
> instances would be able to pull their images from the same glance.
> 
> 4. Hadoop ( or some other orchestrated task) requires more resources 
> than are in any single OpenStack instance, and needs to allocate 
> resources across two or more instances for a single job.
> 
> 
> I suspect that these kinds of architectures are becoming more
> common.  
> Can some of the operators validate these assumptions?  Are there
> other, 
> more common cases where Operations need to span multiple clouds which 
> would require integration of one Nova server with multiple Cinder, 
> Glance, or Neutron  servers managed in other OpenStack instances?
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Dynamic Policy for Access Control

2015-04-07 Thread Marc Heckmann
My apologies for not seeing this sooner as the topic is of great
interest. My comments below inline..

On Mon, 2015-02-23 at 16:41 +, Tim Bell wrote:
> > -Original Message-
> > From: Adam Young [mailto:ayo...@redhat.com]
> > Sent: 23 February 2015 16:45
> > To: openstack-operators@lists.openstack.org
> > Subject: [Openstack-operators] Dynamic Policy for Access Control
> > 
> > "Admin can do everything!"  has been a common lament, heard for multiple
> > summits.  Its more than just a development issue.  I'd like to fix that.  I 
> > think we
> > all would.
> > 
> > 
> > I'm looking to get some Operator input on the Dynamic Policy issue. I wrote 
> > up a
> > general overview last fall, after the Kilo summit:
> > 
> > https://adam.younglogic.com/2014/11/dynamic-policy-in-keystone/

I agree with everything in that post.

I would add the following comments:

1. I doubt this will change, but to be clear, we cannot lose the ability
to create custom roles and limit the capabilities of the standard roles.
For example, if I wanted to limit the ability to make images public or
limit the ability to associate a floating IP.

2. This work should not be done in vacuum. Ideally, Horizon support for
assigning roles to users and editing policy should be released at the
same time or not long after. I realize that this is easier said than
done, but it will be important in order for the feature to get used.

> > 
> > 
> > Some of what I am looking at is:  what are the general roles that Operators
> > would like to have by default when deploying OpenStack?
> > 
> 
> As I described in 
> http://openstack-in-production.blogspot.ch/2015/02/delegation-of-roles.html, 
> we've got (mapped  per-project to an AD group)
> 
> - operator (start/stop/reboot/console)
> - accounting (read ceilometer data for reporting)
> 
> > I've submitted a talk about policy for the Summit:
> > https://www.openstack.org/vote-vancouver/presentation/dynamic-policy-for-
> > access-control
> > 
> > If you want, please vote for it, but even if it does not get selected, I'd 
> > like to
> > discuss Policy with the operators at the summit, as input to  the Keystone
> > development effort.
> > 
> 
> Sounds like a good topic for the ops meetup track.
> 
> > Feedback greatly welcome.
> > 
> > ___
> > OpenStack-operators mailing list
> > OpenStack-operators@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Security around enterprise credentials and OpenStack API

2015-03-31 Thread Marc Heckmann
Hi all,

I was going to post a similar question this evening, so I decided to just 
bounce on Mathieu’s question. See below inline.

> On Mar 31, 2015, at 8:35 PM, Matt Fischer  wrote:
> 
> Mathieu,
> 
> We LDAP (AD) with a fallback to MySQL. This allows us to store service 
> accounts (like nova) and "team accounts" for use in Jenkins/scripts etc in 
> MySQL. We only do Identity via LDAP and we have a forked copy of this driver 
> (https://github.com/SUSE-Cloud/keystone-hybrid-backend) to do this. We don't 
> have any permissions to write into LDAP or move people into groups, so we 
> keep a copy of users locally for purposes of user-list operations. The only 
> interaction between OpenStack and LDAP for us is when that driver tries a 
> bind.
> 
> 
> 
>> On Tue, Mar 31, 2015 at 6:06 PM, Mathieu Gagné  wrote:
>> Hi,
>> 
>> Lets say I wish to use an existing enterprise LDAP service to manage my
>> OpenStack users so I only have one place to manage users.
>> 
>> How would you manage authentication and credentials from a security
>> point of view? Do you tell your users to use their enterprise
>> credentials or do you use an other method/credentials?

We too have integration with enterprise credentials through LDAP, but as you 
suggest, we certainly don’t want users to use those credentials in scripts or 
store them on instances. Instead we have a custom Web portal where they can 
create separate Keystone credentials for their project/tenant which are stored 
in Keystone’s MySQL database. Our LDAP integration actually happens at a level 
above Keystone. We don’t actually let users acquire Keystone tokens using their 
LDAP accounts.

We’re not really happy with this solution, it’s a hack and we are looking to 
revamp it entirely. The problem is that I never have been able to find a clear 
answer on how to do this with Keystone. 

I’m actually quite partial to the way AWS IAM works. Especially the instance 
“role" features. Roles in AWS IAM is similar to TRUSTS in Keystone except that 
it is integrated into the instance metadata. It’s pretty cool. 

Other than that, RBAC policies in Openstack get us a good way towards IAM like 
functionality. We just need a policy editor in Horizon.

Anyway, the problem is around delegation of credentials which are used 
non-interactively. We need to limit what those users can do (through RBAC 
policy) but also somehow make the credentials ephemeral.

If someone (Keystone developer?) could point us in the right direction, that 
would be great.

Thanks in advance.

>> 
>> The reason is that (usually) enterprise credentials also give access to
>> a whole lot of systems other than OpenStack itself. And it goes without
>> saying that I'm not fond of the idea of storing my password in plain
>> text to be used by some scripts I created.
>> 
>> What's your opinion/suggestion? Do you guys have a second credential
>> system solely used for OpenStack?
>> 
>> --
>> Mathieu
>> 
>> ___
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [cinder] Driver Filters and Goodness Weighers... Who plans to use this?

2015-02-11 Thread Marc Heckmann
Hi,

On Wed, 2015-02-11 at 12:39 +0100, Christian Berendt wrote:
> On 02/11/2015 07:54 AM, Mike Perez wrote:
> > Proposed Documentation:
> > http://thing.ee/x/doc-20150210/content/driver_filter_weighing.html
> 
> Thank you for bringing this to the attention of this list. There is a
> pending review for this documentation available at
> https://review.openstack.org/#/c/152325/.
> 
> Christian.
> 

I can definitely relate to the problem that it's trying to solve. We
have a backend that has both thin-provisioning and compression. Right
now, we have no easy way to filter which storage node that a volume
should be created on if compression and thin provisioning is to be taken
into account.

That being said, I'm not sure about the implementation as such. I could
see how it needs to remain flexible to accommodate a bunch of different
types of drivers.

I suppose that it's a start.

-m

> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Marc Heckmann
On Thu, 2015-02-05 at 06:39 -0800, Abel Lopez wrote:
> That is a very real concern. This I systems from images being named
> very uniquely, with versions, dates, etc. To the end user this is
> ALMOST as hard as a UUID. 
> Easy/generic names encourage users to use them, but there is an aspect
> of documentation and user training/education on the proper use of
> name-based automation. 

Yup, I agree with that. The best approach is probably a tweaked version
of what you proposed: Use a generic name for the latest image and rename
the outdated ones to something like "-OLD-$date" but don't
make them private. The documentation that you provide to your end users
should clearly tell them to use image names vs UUIDs and to discourage
them from using OLD images. For those that don't read the doc, the
naming alone will discourage bad practices. Some sort of automated motd
with a big fat warning if the image is older than a certain date would
help as well.

> 
> On Thursday, February 5, 2015, Marc Heckmann
>  wrote:
> On Thu, 2015-02-05 at 06:02 -0800, Abel Lopez wrote:
> > I always recommend the following:
> > All public images are named generically enough that they can
> be
> > replaced with a new version of the same name. This helps new
> instances
> > booting.
> > The prior image is renamed with -OLD-$date. This lets users
> know that
> > their image has been deprecated. This image is made private
> so no new
> > instances can be launched.
> > All images include an updated motd that indicates available
> security
> > updates.
> 
> I like this approach, but I have the following caveat: What if
> users are
> using the uuid of the image instead of the name in some
> automation
> scripts that they might have? If we make the "-OLD-$date"
> images
> private, then we just broke their scripts.
> 
> >
> >
> > We're discussing baking the images with automatic updates,
> but still
> > haven't reached an agreement.
> >
> > On Thursday, February 5, 2015, Tim Bell 
> wrote:
> > > -Original Message-
> > > From: George Shuklin
> [mailto:george.shuk...@gmail.com]
> > > Sent: 05 February 2015 14:10
> > > To: openstack-operators@lists.openstack.org
> > > Subject: [Openstack-operators] How to handle
> updates of
> > public images?
> > >
> > > Hello everyone.
> > >
> > > We are updating our public images regularly (to
> provide them
> > to customers in
> > > up-to-date state). But there is a problem: If some
> instance
> > starts from image it
> > > becomes 'used'. That means:
> > > * That image is used as _base for nova
> > > * If instance is reverted this image is used to
> recreate
> > instance's disk
> > > * If instance is rescued this image is used as
> rescue base
> > > * It is redownloaded during resize/migration (on a
> new
> > compute node)
> > >
> > > One more (our specific):
> > > We're using raw disks with _base on slow SATA
> drives (in
> > comparison to fast SSD
> > > for disks), and if that SATA fails, we replace it
> (and nova
> > redownloads stuff in
> > > _base).
> > >
> > > If image is deleted, it causes problems with nova
> (nova
> > can't download _base).
> > >
> > > The second part of the problem: glance disallows
> to update
> > image (upload new
> > > image with same ID), so we're forced to upload
> updated image
> > with new ID and
> > > to remove the old one. This causes problems
> described above.
> > > And if tenant boots from own snapshot and removes
> snapshot
> >   

Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread Marc Heckmann
On Thu, 2015-02-05 at 06:02 -0800, Abel Lopez wrote:
> I always recommend the following:
> All public images are named generically enough that they can be
> replaced with a new version of the same name. This helps new instances
> booting. 
> The prior image is renamed with -OLD-$date. This lets users know that
> their image has been deprecated. This image is made private so no new
> instances can be launched. 
> All images include an updated motd that indicates available security
> updates. 

I like this approach, but I have the following caveat: What if users are
using the uuid of the image instead of the name in some automation
scripts that they might have? If we make the "-OLD-$date" images
private, then we just broke their scripts.

> 
> 
> We're discussing baking the images with automatic updates, but still
> haven't reached an agreement. 
> 
> On Thursday, February 5, 2015, Tim Bell  wrote:
> > -Original Message-
> > From: George Shuklin [mailto:george.shuk...@gmail.com]
> > Sent: 05 February 2015 14:10
> > To: openstack-operators@lists.openstack.org
> > Subject: [Openstack-operators] How to handle updates of
> public images?
> >
> > Hello everyone.
> >
> > We are updating our public images regularly (to provide them
> to customers in
> > up-to-date state). But there is a problem: If some instance
> starts from image it
> > becomes 'used'. That means:
> > * That image is used as _base for nova
> > * If instance is reverted this image is used to recreate
> instance's disk
> > * If instance is rescued this image is used as rescue base
> > * It is redownloaded during resize/migration (on a new
> compute node)
> >
> > One more (our specific):
> > We're using raw disks with _base on slow SATA drives (in
> comparison to fast SSD
> > for disks), and if that SATA fails, we replace it (and nova
> redownloads stuff in
> > _base).
> >
> > If image is deleted, it causes problems with nova (nova
> can't download _base).
> >
> > The second part of the problem: glance disallows to update
> image (upload new
> > image with same ID), so we're forced to upload updated image
> with new ID and
> > to remove the old one. This causes problems described above.
> > And if tenant boots from own snapshot and removes snapshot
> without removing
> > instance, it causes same problem even without our activity.
> >
> > How do you handle public image updates in your case?
> >
> 
> We have a similar problem. For the Horizon based end users,
> we've defined a panel using image meta data. Details are at
> 
> http://openstack-in-production.blogspot.ch/2015/02/choosing-right-image.html.
> 
> For the CLI users, we propose to use the sort options from
> Glance to find the latest image of a particular OS.
> 
> It would be good if there was a way of marking an image as
> hidden so that it can still be used for snapshots/migration
> but would not be shown in image list operations.
> 
> > Thanks!
> >
> > ___
> > OpenStack-operators mailing list
> > OpenStack-operators@lists.openstack.org
> >
> 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators