from:"Clint Byrum"

Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Clint Byrum

Excerpts from Steven Hardy's message of 2014-08-27 10:08:36 -0700:
 On Wed, Aug 27, 2014 at 09:40:31AM -0700, Clint Byrum wrote:
  Excerpts from Zane Bitter's message of 2014-08-27 08:41:29 -0700:
   On 27/08/14 11:04, Steven Hardy wrote:
On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
I am little bit skeptical about using Swift for this use case 
because of
its eventual consistency issue. I am not sure Swift cluster is 
good to be
used for this kind of problem. Please note that Swift cluster may 
give you
old data at some point of time.
   
This is probably not a major problem, but it's certainly worth 
considering.
   
My assumption is that the latency of making the replicas consistent 
will be
small relative to the timeout for things like SoftwareDeployments, so 
all
we need is to ensure that instances  eventually get the new data, act on
   
   That part is fine, but if they get the new data and then later get the 
   old data back again... that would not be so good.
   
  
  Agreed, and I had not considered that this can happen.
  
  There is a not-so-simple answer though:
  
  * Heat inserts this as initial metadata:
  
  {metadata: {}, update-url: xx, version: 0}
  
  * Polling goes to update-url and ignores metadata = 0
  
  * Polling finds new metadata in same format, and continues the loop
  without talking to Heat
  
  However, this makes me rethink why we are having performance problems.
  MOST of the performance problems have two root causes:
  
  * We parse the entire stack to show metadata, because we have to see if
there are custom access controls defined in any of the resources used.
I actually worked on a patch set to deprecate this part of the resource
plugin API because it is impossible to scale this way.
  * We rely on the engine to respond because of the parsing issue.
  
  If however we could just push metadata into the db fully resolved
  whenever things in the stack change, and cache the response in the API
  using Last-Modified/Etag headers, I think we'd be less inclined to care
  so much about swift for polling. However we are still left with the many
  thousands of keystone users being created vs. thousands of swift tempurls.
 
 There's probably a few relatively simple optimisations we can do if the
 keystone user thing becomes the bottleneck:
 - Make the user an attribute of the stack and only create one per
   stack/tree-of-stacks
 - Make the user an attribute of each server resource (probably more secure
   but less optimal if your optimal is less keystone users).
 
 I don't think the many keystone users thing is actually a problem right now
 though, or is it?

1000 servers means 1000 keystone users to manage, and all of the tokens
and backend churn that implies.

It's not a problem, but it is quite a bit heavier than tempurls.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Design Summit reloaded

2014-08-27 Thread Clint Byrum

Excerpts from Thierry Carrez's message of 2014-08-27 05:51:55 -0700:
 Hi everyone,
 
 I've been thinking about what changes we can bring to the Design Summit
 format to make it more productive. I've heard the feedback from the
 mid-cycle meetups and would like to apply some of those ideas for Paris,
 within the constraints we have (already booked space and time). Here is
 something we could do:
 
 Day 1. Cross-project sessions / incubated projects / other projects
 
 I think that worked well last time. 3 parallel rooms where we can
 address top cross-project questions, discuss the results of the various
 experiments we conducted during juno. Don't hesitate to schedule 2 slots
 for discussions, so that we have time to come to the bottom of those
 issues. Incubated projects (and maybe other projects, if space allows)
 occupy the remaining space on day 1, and could occupy pods on the
 other days.
 

I like it. The only thing I would add is that it would be quite useful if
the use of pods were at least partially enhanced by an unconference style
interest list.  What I mean is, on day 1 have people suggest topics and
vote on suggested topics to discuss at the pods, and from then on the pods
can host these topics. This is for the other things that aren't well
defined until the summit and don't have their own rooms for days 2 and 3.

This is driven by the fact that the pods in Atlanta were almost always
busy doing something other than whatever the track that owned them
wanted. A few projects pods grew to 30-40 people a few times, eating up
all the chairs for the surrounding pods. TripleO often sat at the Heat
pod because of this for instance.

I don't think they should be fully scheduled. They're also just great
places to gather and have a good discussion, but it would be useful to
plan for topic flexibility and help coalesce interested parties, rather
than have them be silos that get taken over randomly. Especially since
there is a temptation to push the other topics to them already.

 Day 2 and Day 3. Scheduled sessions for various programs
 
 That's our traditional scheduled space. We'll have a 33% less slots
 available. So, rather than trying to cover all the scope, the idea would
 be to focus those sessions on specific issues which really require
 face-to-face discussion (which can't be solved on the ML or using spec
 discussion) *or* require a lot of user feedback. That way, appearing in
 the general schedule is very helpful. This will require us to be a lot
 stricter on what we accept there and what we don't -- we won't have
 space for courtesy sessions anymore, and traditional/unnecessary
 sessions (like my traditional release schedule one) should just move
 to the mailing-list.
 
 Day 4. Contributors meetups
 
 On the last day, we could try to split the space so that we can conduct
 parallel midcycle-meetup-like contributors gatherings, with no time
 boundaries and an open agenda. Large projects could get a full day,
 smaller projects would get half a day (but could continue the discussion
 in a local bar). Ideally that meetup would end with some alignment on
 release goals, but the idea is to make the best of that time together to
 solve the issues you have. Friday would finish with the design summit
 feedback session, for those who are still around.
 

Love this. Please if we can also fully enclose these meetups and the
session rooms in dry erase boards that would be ideal.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Design Summit reloaded

2014-08-27 Thread Clint Byrum

Excerpts from Sean Dague's message of 2014-08-27 06:26:38 -0700:
 On 08/27/2014 08:51 AM, Thierry Carrez wrote:
  Hi everyone,
  
  I've been thinking about what changes we can bring to the Design Summit
  format to make it more productive. I've heard the feedback from the
  mid-cycle meetups and would like to apply some of those ideas for Paris,
  within the constraints we have (already booked space and time). Here is
  something we could do:
  
  Day 1. Cross-project sessions / incubated projects / other projects
  
  I think that worked well last time. 3 parallel rooms where we can
  address top cross-project questions, discuss the results of the various
  experiments we conducted during juno. Don't hesitate to schedule 2 slots
  for discussions, so that we have time to come to the bottom of those
  issues. Incubated projects (and maybe other projects, if space allows)
  occupy the remaining space on day 1, and could occupy pods on the
  other days.
  
  Day 2 and Day 3. Scheduled sessions for various programs
  
  That's our traditional scheduled space. We'll have a 33% less slots
  available. So, rather than trying to cover all the scope, the idea would
  be to focus those sessions on specific issues which really require
  face-to-face discussion (which can't be solved on the ML or using spec
  discussion) *or* require a lot of user feedback. That way, appearing in
  the general schedule is very helpful. This will require us to be a lot
  stricter on what we accept there and what we don't -- we won't have
  space for courtesy sessions anymore, and traditional/unnecessary
  sessions (like my traditional release schedule one) should just move
  to the mailing-list.
  
  Day 4. Contributors meetups
  
  On the last day, we could try to split the space so that we can conduct
  parallel midcycle-meetup-like contributors gatherings, with no time
  boundaries and an open agenda. Large projects could get a full day,
  smaller projects would get half a day (but could continue the discussion
  in a local bar). Ideally that meetup would end with some alignment on
  release goals, but the idea is to make the best of that time together to
  solve the issues you have. Friday would finish with the design summit
  feedback session, for those who are still around.
  
  
  I think this proposal makes the best use of our setup: discuss clear
  cross-project issues, address key specific topics which need
  face-to-face time and broader attendance, then try to replicate the
  success of midcycle meetup-like open unscheduled time to discuss
  whatever is hot at this point.
  
  There are still details to work out (is it possible split the space,
  should we use the usual design summit CFP website to organize the
  scheduled time...), but I would first like to have your feedback on
  this format. Also if you have alternative proposals that would make a
  better use of our 4 days, let me know.
 
 I definitely like this approach. I think it will be really interesting
 to collect feedback from people about the value they got from days 2  3
 vs. Day 4.
 
 I also wonder if we should lose a slot from days 1 - 3 and expand the
 hallway time. Hallway track is always pretty interesting, and honestly
 at a lot of interesting ideas spring up. The 10 minute transitions often
 seem to feel like you are rushing between places too quickly some times.

Yes please. I'd also be fine with just giving back 5 minutes from each
session to facilitate this.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Design Summit reloaded

2014-08-27 Thread Clint Byrum

Excerpts from Anita Kuno's message of 2014-08-27 13:48:25 -0700:
 On 08/27/2014 02:46 PM, John Griffith wrote:
  On Wed, Aug 27, 2014 at 9:25 AM, Flavio Percoco fla...@redhat.com wrote:
  
  On 08/27/2014 03:26 PM, Sean Dague wrote:
  On 08/27/2014 08:51 AM, Thierry Carrez wrote:
  Hi everyone,
 
  I've been thinking about what changes we can bring to the Design Summit
  format to make it more productive. I've heard the feedback from the
  mid-cycle meetups and would like to apply some of those ideas for Paris,
  within the constraints we have (already booked space and time). Here is
  something we could do:
 
  Day 1. Cross-project sessions / incubated projects / other projects
 
  I think that worked well last time. 3 parallel rooms where we can
  address top cross-project questions, discuss the results of the various
  experiments we conducted during juno. Don't hesitate to schedule 2 slots
  for discussions, so that we have time to come to the bottom of those
  issues. Incubated projects (and maybe other projects, if space allows)
  occupy the remaining space on day 1, and could occupy pods on the
  other days.
 
  Day 2 and Day 3. Scheduled sessions for various programs
 
  That's our traditional scheduled space. We'll have a 33% less slots
  available. So, rather than trying to cover all the scope, the idea would
  be to focus those sessions on specific issues which really require
  face-to-face discussion (which can't be solved on the ML or using spec
  discussion) *or* require a lot of user feedback. That way, appearing in
  the general schedule is very helpful. This will require us to be a lot
  stricter on what we accept there and what we don't -- we won't have
  space for courtesy sessions anymore, and traditional/unnecessary
  sessions (like my traditional release schedule one) should just move
  to the mailing-list.
 
  Day 4. Contributors meetups
 
  On the last day, we could try to split the space so that we can conduct
  parallel midcycle-meetup-like contributors gatherings, with no time
  boundaries and an open agenda. Large projects could get a full day,
  smaller projects would get half a day (but could continue the discussion
  in a local bar). Ideally that meetup would end with some alignment on
  release goals, but the idea is to make the best of that time together to
  solve the issues you have. Friday would finish with the design summit
  feedback session, for those who are still around.
 
 
  I think this proposal makes the best use of our setup: discuss clear
  cross-project issues, address key specific topics which need
  face-to-face time and broader attendance, then try to replicate the
  success of midcycle meetup-like open unscheduled time to discuss
  whatever is hot at this point.
 
  There are still details to work out (is it possible split the space,
  should we use the usual design summit CFP website to organize the
  scheduled time...), but I would first like to have your feedback on
  this format. Also if you have alternative proposals that would make a
  better use of our 4 days, let me know.
 
  I definitely like this approach. I think it will be really interesting
  to collect feedback from people about the value they got from days 2  3
  vs. Day 4.
 
  I also wonder if we should lose a slot from days 1 - 3 and expand the
  hallway time. Hallway track is always pretty interesting, and honestly
  at a lot of interesting ideas spring up. The 10 minute transitions often
  seem to feel like you are rushing between places too quickly some times.
 
  +1
 
  Last summit, it was basically impossible to do any hallway talking and
  even meet some folks face-2-face.
 
  Other than that, I think the proposal is great and makes sense to me.
 
  Flavio
 
  --
  @flaper87
  Flavio Percoco
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
  Sounds like a great idea to me:
  +1
  
  
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
 I think this is a great direction.
 
 Here is my dilemma and it might just affect me. I attended 3 mid-cycles
 this release: one of Neutron's (there were 2), QA/Infra and Cinder. The
 Neutron and Cinder ones were mostly in pursuit of figuring out third
 party and exchanging information surrounding that (which I feel was
 successful). The QA/Infra one was, well even though I feel like I have
 been awol, I still consider this my home.
 
 From my perspective and check with Neutron and Cinder to see if they
 agree, but having at least one person from qa/infra at a mid-cycle helps
 in small ways. At both I worked with folks to help them make more
 efficient use of their review time by exploring gerrit queries (there
 were people who didn't know this magic, nor did they think to ask

Re: [openstack-dev] [Keystone][Marconi][Heat] Creating accounts in Keystone

2014-08-27 Thread Clint Byrum

Excerpts from Adam Young's message of 2014-08-24 20:17:34 -0700:
 On 08/23/2014 02:01 AM, Clint Byrum wrote:
  I don't know how Zaqar does its magic, but I'd love to see simple signed
  URLs rather than users/passwords. This would work for Heat as well. That
  way we only have to pass in a single predictably formatted string.
 
  Excerpts from Zane Bitter's message of 2014-08-22 14:35:38 -0700:
  Here's an interesting fact about Zaqar (the project formerly known as
  Marconi) that I hadn't thought about before this week: it's probably the
  first OpenStack project where a major part of the API primarily faces
 
 
 
 Nah, this is the direction we are headed.  Service users (out of LDAP!)  are 
 going to be the norm with a recent feature add to Keytone:
 
 
 http://adam.younglogic.com/2014/08/getting-service-users-out-of-ldap/
 

This complicates the case by requiring me to get tokens and present
them, to cache them, etc. I just want to fetch and/or send messages.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [qa][all][Heat] Packaging of functional tests

2014-08-26 Thread Clint Byrum

Excerpts from Steve Baker's message of 2014-08-26 14:25:46 -0700:
 On 27/08/14 03:18, David Kranz wrote:
  On 08/26/2014 10:14 AM, Zane Bitter wrote:
  Steve Baker has started the process of moving Heat tests out of the
  Tempest repository and into the Heat repository, and we're looking
  for some guidance on how they should be packaged in a consistent way.
  Apparently there are a few projects already packaging functional
  tests in the package projectname.tests.functional (alongside
  projectname.tests.unit for the unit tests).
 
  That strikes me as odd in our context, because while the unit tests
  run against the code in the package in which they are embedded, the
  functional tests run against some entirely different code - whatever
  OpenStack cloud you give it the auth URL and credentials for. So
  these tests run from the outside, just like their ancestors in
  Tempest do.
 
  There's all kinds of potential confusion here for users and
  packagers. None of it is fatal and all of it can be worked around,
  but if we refrain from doing the thing that makes zero conceptual
  sense then there will be no problem to work around :)
  Thanks, Zane. The point of moving functional tests to projects is to
  be able to run more of them
  in gate jobs for those projects, and allow tempest to survive being
  stretched-to-breaking horizontally as we scale to more projects. At
  the same time, there are benefits to the
  tempest-as-all-in-one-functional-and-integration-suite that we should
  try not to lose:
 
  1. Strong integration testing without thinking too hard about the
  actual dependencies
  2. Protection from mistaken or unwise api changes (tempest two-step
  required)
  3. Exportability as a complete blackbox functional test suite that can
  be used by Rally, RefStack, deployment validation, etc.
 
  I think (1) may be the most challenging because tests that are moved
  out of tempest might be testing some integration that is not being
  covered
  by a scenario. We will need to make sure that tempest actually has a
  complete enough set of tests to validate integration. Even if this is
  all implemented in a way where tempest can see in-project tests as
  plugins, there will still not be time to run them all as part of
  tempest on every commit to every project, so a selection will have to
  be made.
 
  (2) is quite difficult. In Atlanta we talked about taking a copy of
  functional tests into tempest for stable apis. I don't know how
  workable that is but don't see any other real options except vigilance
  in reviews of patches that change functional tests.
 
  (3) is what Zane was addressing. The in-project functional tests need
  to be written in a way that they can, at least in some configuration,
  run against a real cloud.
 
 
 
  I suspect from reading the previous thread about In-tree functional
  test vision that we may actually be dealing with three categories of
  test here rather than two:
 
  * Unit tests that run against the package they are embedded in
  * Functional tests that run against the package they are embedded in
  * Integration tests that run against a specified cloud
 
  i.e. the tests we are now trying to add to Heat might be
  qualitatively different from the projectname.tests.functional
  suites that already exist in a few projects. Perhaps someone from
  Neutron and/or Swift can confirm?
  That seems right, except that I would call the third functional
  tests and not integration tests, because the purpose is not really
  integration but deep testing of a particular service. Tempest would
  continue to focus on integration testing. Is there some controversy
  about that?
  The second category could include whitebox tests.
 
  I don't know about swift, but in neutron the intent was to have these
  tests be configurable to run against a real cloud, or not. Maru Newby
  would have details.
 
  I'd like to propose that tests of the third type get their own
  top-level package with a name of the form
  projectname-integrationtests (second choice: projectname-tempest
  on the principle that they're essentially plugins for Tempest). How
  would people feel about standardising that across OpenStack?
  +1 But I would not call it integrationtests for the reason given above.
 
 Because all heat does is interact with other services, what we call
 functional tests are actually integration tests. Sure, we could mock at
 the REST API level, but integration coverage is what we need most. This

I'd call that faking, not mocking, but both could apply.

 lets us verify things like:
 - how heat handles races in other services leading to resources going
 into ERROR

A fake that predictably fails (and thus tests failure handling) will
result in better coverage than a real service that only fails when that
real service is broken. What's frustrating is that _both_ are needed to
catch bugs.

 - connectivity and interaction between heat and agents on orchestrated
 servers
 

That is definitely

Re: [openstack-dev] [heat] heat.conf.sample is not up to date

2014-08-24 Thread Clint Byrum

Guessing this is due to the new tox feature which randomizes python's
hash seed.

Excerpts from Mike Spreitzer's message of 2014-08-24 00:10:42 -0700:
 What is going on with this?  If I do a fresh clone of heat and run `tox 
 -epep8` then I get that complaint.  If I then run the recommended command 
 to fix it, and then `tox -epep8` again, I get the same complaint again --- 
 and with different differences exhibited!  The email below carries a 
 typescript showing this.
 
 What I really need to know is what to do when committing a change that 
 really does require a change in the sample configuration file.  Of course 
 I tried running generate_sample.sh, but `tox -epep8` still complains. What 
 is the right procedure to get a correct sample committed?  BTW, I am doing 
 the following admittedly risky thing: I run DevStack, and make my changes 
 in /opt/stack/heat/.
 
 Thanks,
 Mike
 
 - Forwarded by Mike Spreitzer/Watson/IBM on 08/24/2014 03:03 AM -
 
 From:   ubuntu@mjs-dstk-821a (Ubuntu)
 To: Mike Spreitzer/Watson/IBM@IBMUS, 
 Date:   08/24/2014 02:55 AM
 Subject:fresh flake fail
 
 
 
 ubuntu@mjs-dstk-821a:~/code$ git clone 
 git://git.openstack.org/openstack/heat.git
 Cloning into 'heat'...
 remote: Counting objects: 49690, done.
 remote: Compressing objects: 100% (19765/19765), done.
 remote: Total 49690 (delta 36660), reused 39014 (delta 26526)
 Receiving objects: 100% (49690/49690), 7.92 MiB | 7.29 MiB/s, done.
 Resolving deltas: 100% (36660/36660), done.
 Checking connectivity... done.
 ubuntu@mjs-dstk-821a:~/code$ cd heat
 ubuntu@mjs-dstk-821a:~/code/heat$ tox -epep8
 pep8 create: /home/ubuntu/code/heat/.tox/pep8
 pep8 installdeps: -r/home/ubuntu/code/heat/requirements.txt, 
 -r/home/ubuntu/code/heat/test-requirements.txt
 pep8 develop-inst: /home/ubuntu/code/heat
 pep8 runtests: PYTHONHASHSEED='0'
 pep8 runtests: commands[0] | flake8 heat bin/heat-api bin/heat-api-cfn 
 bin/heat-api-cloudwatch bin/heat-engine bin/heat-manage contrib
 pep8 runtests: commands[1] | 
 /home/ubuntu/code/heat/tools/config/check_uptodate.sh
 --- /tmp/heat.ep2CBe/heat.conf.sample2014-08-24 06:52:54.16484 +
 +++ etc/heat/heat.conf.sample2014-08-24 06:48:13.66484 +
 @@ -164,7 +164,7 @@
  
 #allowed_rpc_exception_modules=oslo.messaging.exceptions,nova.exception,cinder.exception,exceptions
  
  # Qpid broker hostname. (string value)
 -#qpid_hostname=heat
 +#qpid_hostname=localhost
  
  # Qpid broker port. (integer value)
  #qpid_port=5672
 @@ -221,7 +221,7 @@
  
  # The RabbitMQ broker address where a single node is used.
  # (string value)
 -#rabbit_host=heat
 +#rabbit_host=localhost
  
  # The RabbitMQ broker port where a single node is used.
  # (integer value)
 check_uptodate.sh: heat.conf.sample is not up to date.
 check_uptodate.sh: Please run 
 /home/ubuntu/code/heat/tools/config/generate_sample.sh.
 ERROR: InvocationError: 
 '/home/ubuntu/code/heat/tools/config/check_uptodate.sh'
 pep8 runtests: commands[2] | 
 /home/ubuntu/code/heat/tools/requirements_style_check.sh requirements.txt 
 test-requirements.txt
 pep8 runtests: commands[3] | bash -c find heat -type f -regex '.*\.pot?' 
 -print0|xargs -0 -n 1 msgfmt --check-format -o /dev/null
 ___ summary 
 
 ERROR:   pep8: commands failed
 ubuntu@mjs-dstk-821a:~/code/heat$ 
 ubuntu@mjs-dstk-821a:~/code/heat$ 
 ubuntu@mjs-dstk-821a:~/code/heat$ tools/config/generate_sample.sh
 ubuntu@mjs-dstk-821a:~/code/heat$ 
 ubuntu@mjs-dstk-821a:~/code/heat$ 
 ubuntu@mjs-dstk-821a:~/code/heat$ 
 ubuntu@mjs-dstk-821a:~/code/heat$ tox -epep8
 pep8 develop-inst-noop: /home/ubuntu/code/heat
 pep8 runtests: PYTHONHASHSEED='0'
 pep8 runtests: commands[0] | flake8 heat bin/heat-api bin/heat-api-cfn 
 bin/heat-api-cloudwatch bin/heat-engine bin/heat-manage contrib
 pep8 runtests: commands[1] | 
 /home/ubuntu/code/heat/tools/config/check_uptodate.sh
 --- /tmp/heat.DqIhK5/heat.conf.sample2014-08-24 06:54:34.62884 +
 +++ etc/heat/heat.conf.sample2014-08-24 06:53:51.54084 +
 @@ -159,10 +159,6 @@
  # Size of RPC connection pool. (integer value)
  #rpc_conn_pool_size=30
  
 -# Modules of exceptions that are permitted to be recreated
 -# upon receiving exception data from an rpc call. (list value)
 -#allowed_rpc_exception_modules=oslo.messaging.exceptions,nova.exception,cinder.exception,exceptions
 -
  # Qpid broker hostname. (string value)
  #qpid_hostname=heat
  
 @@ -301,15 +297,6 @@
  # Heartbeat time-to-live. (integer value)
  #matchmaker_heartbeat_ttl=600
  
 -# Host to locate redis. (string value)
 -#host=127.0.0.1
 -
 -# Use this port to connect to redis host. (integer value)
 -#port=6379
 -
 -# Password for Redis server (optional). (string value)
 -#password=None
 -
  # Size of RPC greenthread pool. (integer value)
  #rpc_thread_pool_size=64
  
 @@ -1229,6 +1216,22 @@
  #hash_algorithms=md5
  
  
 +[matchmaker_redis]
 +
 +#
 +# Options defined in oslo.messaging
 +#
 +
 +#

Re: [openstack-dev] [Keystone][Marconi][Heat] Creating accounts in Keystone

2014-08-23 Thread Clint Byrum

I don't know how Zaqar does its magic, but I'd love to see simple signed
URLs rather than users/passwords. This would work for Heat as well. That
way we only have to pass in a single predictably formatted string.

Excerpts from Zane Bitter's message of 2014-08-22 14:35:38 -0700:
 Here's an interesting fact about Zaqar (the project formerly known as 
 Marconi) that I hadn't thought about before this week: it's probably the 
 first OpenStack project where a major part of the API primarily faces 
 software running in the cloud rather than facing the user.
 
 That is to say, nobody is going to be sending themselves messages on 
 their laptop, from their laptop, via a cloud. At least one end of any 
 given queue is likely to be on a VM in the cloud.
 
 That makes me wonder: how does Zaqar authenticate users who are sending 
 and receiving messages (as opposed to setting up the queues in the first 
 place)? Presumably using Keystone, in which case it will run into a 
 problem we've been struggling with in Heat since the very early days.
 
 Keystone is generally a front end for an identity store with a 1:1 
 correspondence between users and actual natural persons. Only the 
 operator can add or remove accounts. This breaks down as soon as you 
 need to authenticate automated services running in the cloud - in 
 particular, you never ever want to store the credentials belonging to an 
 actual natural person in a server in the cloud.
 
 Heat has managed to work around this to some extent (for those running 
 the Keystone v3 API) by creating users in a separate domain and more or 
 less doing our own authorisation for them. However, this requires action 
 on the part of the operator, and isn't an option for the end user. I 
 guess Zaqar could do something similar and pass out sets of credentials 
 good only for reading and writing to queues (respectively), but it seems 
 like it would be better if the user could create the keystone accounts 
 and set their own access control rules on the queues.
 
 On AWS the very first thing a user does is create a bunch of IAM 
 accounts so that they virtually never have to use the credentials 
 associated with their natural person ever again. There are both user 
 accounts and service accounts - the latter IIUC have 
 automatically-rotating keys. Is there anything like this planned in 
 Keystone? Zaqar is likely only the first (I guess second, if you count 
 Heat) of many services that will need it.
 
 I have this irrational fear that somebody is going to tell me that this 
 issue is the reason for the hierarchical-multitenancy idea - fear 
 because that both sounds like it requires intrusive changes in every 
 OpenStack project and fails to solve the problem. I hope somebody will 
 disabuse me of that notion in 3... 2... 1...
 
 cheers,
 Zane.
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] [ptls] The Czar system, or how to scale PTLs

2014-08-23 Thread Clint Byrum

Excerpts from Dolph Mathews's message of 2014-08-22 09:45:37 -0700:
 On Fri, Aug 22, 2014 at 11:32 AM, Zane Bitter zbit...@redhat.com wrote:
 
  On 22/08/14 11:19, Thierry Carrez wrote:
 
  Zane Bitter wrote:
 
  On 22/08/14 08:33, Thierry Carrez wrote:
 
  We also
  still need someone to have the final say in case of deadlocked issues.
 
 
  -1 we really don't.
 
 
  I know we disagree on that :)
 
 
  No problem, you and I work in different programs so we can both get our
  way ;)
 
 
   People say we don't have that many deadlocks in OpenStack for which the
  PTL ultimate power is needed, so we could get rid of them. I'd argue
  that the main reason we don't have that many deadlocks in OpenStack is
  precisely *because* we have a system to break them if they arise.
 
 
  s/that many/any/ IME and I think that threatening to break a deadlock by
  fiat is just as bad as actually doing it. And by 'bad' I mean
  community-poisoningly, trust-destroyingly bad.
 
 
  I guess I've been active in too many dysfunctional free and open source
  software projects -- I put a very high value on the ability to make a
  final decision. Not being able to make a decision is about as
  community-poisoning, and also results in inability to make any
  significant change or decision.
 
 
  I'm all for getting a final decision, but a 'final' decision that has been
  imposed from outside rather than internalised by the participants is...
  rarely final.
 
 
 The expectation of a PTL isn't to stomp around and make final decisions,
 it's to step in when necessary and help both sides find the best solution.
 To moderate.
 

Have we had many instances where a project's community divided into
two camps and dug in to the point where they actually needed active
moderation? And in those cases, was the PTL not already on one side of
said argument? I'd prefer specific examples here.

 
  I have yet to see a deadlock in Heat that wasn't resolved by better
  communication.
 
 
 Moderation == bettering communication. I'm under the impression that you
 and Thierry are agreeing here, just from opposite ends of the same spectrum.
 

I agree as well. PTL is a servant of the community, as any good leader
is. If the PTL feels they have to drop the hammer, or if an impass is
reached where they are asked to, it is because they have failed to get
everyone communicating effectively, not because that's their job.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Glance][Heat] Murano split dsicussion

2014-08-22 Thread Clint Byrum

Excerpts from Angus Salkeld's message of 2014-08-21 20:14:12 -0700:
 On Fri, Aug 22, 2014 at 12:34 PM, Clint Byrum cl...@fewbar.com wrote:
 
  Excerpts from Georgy Okrokvertskhov's message of 2014-08-20 13:14:28 -0700:
   During last Atlanta summit there were couple discussions about
  Application
   Catalog and Application space projects in OpenStack. These cross-project
   discussions occurred as a result of Murano incubation request [1] during
   Icehouse cycle.  On the TC meeting devoted to Murano incubation there was
   an idea about splitting the Murano into parts which might belong to
   different programs[2].
  
  
   Today, I would like to initiate a discussion about potential splitting of
   Murano between two or three programs.
  
  
   *App Catalog API to Catalog Program*
  
   Application Catalog part can belong to Catalog program, the package
   repository will move to artifacts repository part where Murano team
  already
   participates. API part of App Catalog will add a thin layer of API
  methods
   specific to Murano applications and potentially can be implemented as a
   plugin to artifacts repository. Also this API layer will expose other 3rd
   party systems API like CloudFoundry ServiceBroker API which is used by
   CloudFoundry marketplace feature to provide an integration layer between
   OpenStack Application packages and 3rd party PaaS tools.
  
  
 
  I thought this was basically already agreed upon, and that Glance was
  just growing the ability to store more than just images.
 
  
   *Murano Engine to Orchestration Program*
  
   Murano engine orchestrates the Heat template generation. Complementary
  to a
   Heat declarative approach, Murano engine uses imperative approach so that
   it is possible to control the whole flow of the template generation. The
   engine uses Heat updates to update Heat templates to reflect changes in
   applications layout. Murano engine has a concept of actions - special
  flows
   which can be called at any time after application deployment to change
   application parameters or update stacks. The engine is actually
   complementary to Heat engine and adds the following:
  
  
  - orchestrate multiple Heat stacks - DR deployments, HA setups,
  multiple
  datacenters deployment
 
  These sound like features already requested directly in Heat.
 
  - Initiate and controls stack updates on application specific events
 
  Sounds like workflow. :)
 
  - Error handling and self-healing - being imperative Murano allows you
  to handle issues and implement additional logic around error handling
  and
  self-healing.
 
  Also sounds like workflow.
 
  
 
 
  I think we need to re-think what a program is before we consider this.
 
  I really don't know much about Murano. I have no interest in it at
 
 
 get off my lawn;)
 

And turn down that music!

Sorry for the fist shaking, but I wan to highlight that I'm happy to
consider it, just not with programs working the way they do now.

 http://stackalytics.com/?project_type=allmodule=murano-group
 
 HP seems to be involved, you should check it out.
 

HP is involved in a lot of OpenStack things. It's a bit hard for me to
keep my eyes on everything we do. Good to know that others have been able
to take some time and buy into it a bit. +1 for distributing the load. :)

  all, and nobody has come to me saying If we only had Murano in our
  orchestration toolbox, we'd solve xxx. But making them part of the
 
 
 I thought you were saying that opsworks was neat the other day?
 Murano from what I understand was partly inspired from opsworks, yes
 it's a layer up, but still really the same field.


I was saying that OpsWorks is reportedly popular, yes. I did not make
the connection at all from OpsWorks to Murano, and nobody had pointed
that out to me until now.

  Orchestration program would imply that we'll do design sessions together,
  that we'll share the same mission statement, and that we'll have just
 
 
 This is exactly what I hope will happen.
 

Which sessions from last summit would we want to give up to make room
for the Murano-only focused sessions? How much time in our IRC meeting
should we give to Murano-only concerns?

Forgive me for being harsh. We have a cloud to deploy using Heat,
and it is taking far too long to get Heat to do that in an acceptable
manner already. Adding load to our PTL and increasing the burden on our
communication channels doesn't really seem like something that will
increase our velocity. I could be dead wrong though, Murano could be
exactly what we need. I just don't see it, and I'm sorry to be so direct
about saying that.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] The future of the integrated release

2014-08-22 Thread Clint Byrum

Excerpts from Michael Chapman's message of 2014-08-21 23:30:44 -0700:
 On Fri, Aug 22, 2014 at 2:57 AM, Jay Pipes jaypi...@gmail.com wrote:
 
  On 08/19/2014 11:28 PM, Robert Collins wrote:
 
  On 20 August 2014 02:37, Jay Pipes jaypi...@gmail.com wrote:
  ...
 
   I'd like to see more unification of implementations in TripleO - but I
  still believe our basic principle of using OpenStack technologies that
  already exist in preference to third party ones is still sound, and
  offers substantial dogfood and virtuous circle benefits.
 
 
 
  No doubt Triple-O serves a valuable dogfood and virtuous cycle purpose.
  However, I would move that the Deployment Program should welcome the many
  projects currently in the stackforge/ code namespace that do deployment
  of
  OpenStack using traditional configuration management tools like Chef,
  Puppet, and Ansible. It cannot be argued that these configuration
  management
  systems are the de-facto way that OpenStack is deployed outside of HP,
  and
  they belong in the Deployment Program, IMO.
 
 
  I think you mean it 'can be argued'... ;).
 
 
  No, I definitely mean cannot be argued :) HP is the only company I know
  of that is deploying OpenStack using Triple-O. The vast majority of
  deployers I know of are deploying OpenStack using configuration management
  platforms and various systems or glue code for baremetal provisioning.
 
  Note that I am not saying that Triple-O is bad in any way! I'm only saying
  that it does not represent the way that the majority of real-world
  deployments are done.
 
 
   And I'd be happy if folk in
 
  those communities want to join in the deployment program and have code
  repositories in openstack/. To date, none have asked.
 
 
  My point in this thread has been and continues to be that by having the TC
  bless a certain project as The OpenStack Way of X, that we implicitly are
  saying to other valid alternatives Sorry, no need to apply here..
 
 
   As a TC member, I would welcome someone from the Chef community proposing
  the Chef cookbooks for inclusion in the Deployment program, to live under
  the openstack/ code namespace. Same for the Puppet modules.
 
 
  While you may personally welcome the Chef community to propose joining the
  deployment Program and living under the openstack/ code namespace, I'm just
  saying that the impression our governance model and policies create is one
  of exclusion, not inclusion. Hope that clarifies better what I've been
  getting at.
 
 
 
 (As one of the core reviewers for the Puppet modules)
 
 Without a standardised package build process it's quite difficult to test
 trunk Puppet modules vs trunk official projects. This means we cut release
 branches some time after the projects themselves to give people a chance to
 test. Until this changes and the modules can be released with the same
 cadence as the integrated release I believe they should remain on
 Stackforge.
 

Seems like the distros that build the packages are all doing lots of
daily-build type stuff that could somehow be leveraged to get over that.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?

2014-08-22 Thread Clint Byrum

It has been brought to my attention that Ironic uses the biggest hammer
in the IPMI toolbox to control chassis power:

https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142

Which is

ret = ipmicmd.set_power('off', wait)

This is the most abrupt form, where the system power should be flipped
off at a hardware level. The short press on the power button would be
'shutdown' instead of 'off'.

I also understand that this has been brought up before, and that the
answer given was SSH in and shut it down yourself. I can respect that
position, but I have run into a bit of a pickle using it. Observe:

- ssh box.ip poweroff
- poll ironic until power state is off.
- This is a race. Ironic is asserting the power. As soon as it sees
that the power is off, it will turn it back on.

- ssh box.ip halt
- NO way to know that this has worked. Once SSH is off and the network
stack is gone, I cannot actually verify that the disks were
unmounted properly, which is the primary area of concern that I
have.

This is particulary important if I'm issuing a rebuild + preserve
ephemeral, as it is likely I will have lots of I/O going on, and I want
to make sure that it is all quiesced before I reboot to replace the
software and reboot.

Perhaps I missed something. If so, please do educate me on how I can
achieve this without hacking around it. Currently my workaround is to
manually unmount the state partition, which is something system shutdown
is supposed to do and may become problematic if system processes are
holding it open.

It seems to me that Ironic should at least try to use the graceful
shutdown. There can be a timeout, but it would need to be something a user
can disable so if graceful never works we never just dump the power on the
box. Even a journaled filesystem will take quite a bit to do a full fsck.

The inability to gracefully shutdown in a reasonable amount of time
is an error state really, and I need to go to the box and inspect it,
which is precisely the reason we have ERROR states.

Thanks for your time. :)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?

2014-08-22 Thread Clint Byrum

Excerpts from Jay Pipes's message of 2014-08-22 11:16:05 -0700:
On 08/22/2014 01:48 PM, Clint Byrum wrote:
It has been brought to my attention that Ironic uses the biggest hammer
in the IPMI toolbox to control chassis power:

https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142

Which is

ret = ipmicmd.set_power('off', wait)

This is the most abrupt form, where the system power should be flipped
off at a hardware level. The short press on the power button would be
'shutdown' instead of 'off'.

- ssh box.ip poweroff
- poll ironic until power state is off.
- This is a race. Ironic is asserting the power. As soon as it sees
that the power is off, it will turn it back on.

The inability to gracefully shutdown in a reasonable amount of time
is an error state really, and I need to go to the box and inspect it,
which is precisely the reason we have ERROR states.

What about placing a runlevel script in /etc/init.d/ and symlinking it
to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount
the state partition in that script which would ensure disk state was
quiesced, no?

That's already what OS's do in their rc0.d.

My point is, I don't have any way to know that process happened, without
the box turning itself off after it succeeded.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] The future of the integrated release

2014-08-21 Thread Clint Byrum

Excerpts from Duncan Thomas's message of 2014-08-21 09:21:06 -0700:
 On 21 August 2014 14:27, Jay Pipes jaypi...@gmail.com wrote:
 
  Specifically for Triple-O, by making the Deployment program == Triple-O, the
  TC has picked the disk-image-based deployment of an undercloud design as The
  OpenStack Way of Deployment. And as I've said previously in this thread, I
  believe that the deployment space is similarly unsettled, and that it would
  be more appropriate to let the Chef cookbooks and Puppet modules currently
  sitting in the stackforge/ code namespace live in the openstack/ code
  namespace.
 
 Totally agree with Jay here, I know people who gave up on trying to
 get any official project around deployment because they were told they
 had to do it under the TripleO umbrella
 

This was why the _program_ versus _project_ distinction was made. But
I think we ended up being 1:1 anyway.

Perhaps the deployment program's mission statement is too narrow, and
we should iterate on that. That others took their ball and went home,
instead of asking for a review of that ruling, is a bit disconcerting.

That probably strikes to the heart of the current crisis. If we were
being reasonable, alternatives to an official OpenStack program's mission
statement would be debated and considered thoughtfully. I know I made the
mistake early on of pushing the narrow _TripleO_ vision into what should
have been a much broader Deployment program. I'm not entirely sure why
that seemed o-k to me at the time, or why it was allowed to continue, but
I think it may be a good exercise to review those events and try to come
up with a few theories or even conclusions as to what we could do better.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] The future of the integrated release

2014-08-21 Thread Clint Byrum

Excerpts from David Kranz's message of 2014-08-21 12:45:05 -0700:
 On 08/21/2014 02:39 PM, gordon chung wrote:
   The point I've been making is
   that by the TC continuing to bless only the Ceilometer project as the
   OpenStack Way of Metering, I think we do a disservice to our users by
   picking a winner in a space that is clearly still unsettled.
 
  can we avoid using the word 'blessed' -- it's extremely vague and 
  seems controversial. from what i know, no one is being told project 
  x's services are the be all end all and based on experience, companies 
  (should) know this. i've worked with other alternatives even though i 
  contribute to ceilometer.
   Totally agree with Jay here, I know people who gave up on trying to
   get any official project around deployment because they were told they
   had to do it under the TripleO umbrella
  from the pov of a project that seems to be brought up constantly and 
  maybe it's my naivety, i don't really understand the fascination with 
  branding and the stigma people have placed on 
  non-'openstack'/stackforge projects. it can't be a legal thing because 
  i've gone through that potential mess. also, it's just as easy to 
  contribute to 'non-openstack' projects as 'openstack' projects (even 
  easier if we're honest).
 Yes, we should be honest. The even easier part is what Sandy cited as 
 the primary motivation for pursuing stacktach instead of ceilometer.
 
 I think we need to consider the difference between why OpenStack wants 
 to bless a project, and why a project might want to be blessed by 
 OpenStack. Many folks believe that for OpenStack to be successful it 
 needs to present itself as a stack that can be tested and deployed, not 
 a sack of parts that only the most extremely clever people can manage to 
 assemble into an actual cloud. In order to have such a stack, some code 
 (or, alternatively, dare I say API...) needs to be blessed. Reasonable 
 debates will continue about which pieces are essential to this stack, 
 and which should be left to deployers, but metering was seen as such a 
 component and therefore something needed to be blessed. The hope was 
 that every one would jump on that and make it great but it seems that 
 didn't quite happen (at least yet).
 
 Though Open Source has many advantages over proprietary development, the 
 ability to choose a direction and marshal resources for efficient 
 delivery is the biggest advantage of proprietary development like what 
 AWS does. The TC process of blessing is, IMO, an attempt to compensate 
 for that in an OpenSource project. Of course if the wrong code is 
 blessed, the negative  impact can be significant. Blessing APIs would be 

Hm, I wonder if the only difference there is when AWS blesses the wrong
thing, they evaluate the business impact, and respond by going in a
different direction, all behind closed doors. The shame is limited to
that inner circle.

Here, with full transparency, calling something the wrong thing is
pretty much public humiliation for the team involved.

So it stands to reason that we shouldn't call something the right
thing if we aren't comfortable with the potential public shaming.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Glance][Heat] Murano split dsicussion

2014-08-21 Thread Clint Byrum

Excerpts from Georgy Okrokvertskhov's message of 2014-08-20 13:14:28 -0700:
 During last Atlanta summit there were couple discussions about Application
 Catalog and Application space projects in OpenStack. These cross-project
 discussions occurred as a result of Murano incubation request [1] during
 Icehouse cycle.  On the TC meeting devoted to Murano incubation there was
 an idea about splitting the Murano into parts which might belong to
 different programs[2].
 
 
 Today, I would like to initiate a discussion about potential splitting of
 Murano between two or three programs.
 
 
 *App Catalog API to Catalog Program*
 
 Application Catalog part can belong to Catalog program, the package
 repository will move to artifacts repository part where Murano team already
 participates. API part of App Catalog will add a thin layer of API methods
 specific to Murano applications and potentially can be implemented as a
 plugin to artifacts repository. Also this API layer will expose other 3rd
 party systems API like CloudFoundry ServiceBroker API which is used by
 CloudFoundry marketplace feature to provide an integration layer between
 OpenStack Application packages and 3rd party PaaS tools.
 
 

I thought this was basically already agreed upon, and that Glance was
just growing the ability to store more than just images.

 
 *Murano Engine to Orchestration Program*
 
 Murano engine orchestrates the Heat template generation. Complementary to a
 Heat declarative approach, Murano engine uses imperative approach so that
 it is possible to control the whole flow of the template generation. The
 engine uses Heat updates to update Heat templates to reflect changes in
 applications layout. Murano engine has a concept of actions - special flows
 which can be called at any time after application deployment to change
 application parameters or update stacks. The engine is actually
 complementary to Heat engine and adds the following:
 
 
- orchestrate multiple Heat stacks - DR deployments, HA setups, multiple
datacenters deployment

These sound like features already requested directly in Heat.

- Initiate and controls stack updates on application specific events

Sounds like workflow. :)

- Error handling and self-healing - being imperative Murano allows you
to handle issues and implement additional logic around error handling and
self-healing.

Also sounds like workflow.

 


I think we need to re-think what a program is before we consider this.

I really don't know much about Murano. I have no interest in it at
all, and nobody has come to me saying If we only had Murano in our
orchestration toolbox, we'd solve xxx. But making them part of the
Orchestration program would imply that we'll do design sessions together,
that we'll share the same mission statement, and that we'll have just
one PTL. I fail to see why they're not another, higher level program
that builds on top of the other services.

 
 
 *Murano UI to Dashboard Program*
 
 Application Catalog requires  a UI focused on user experience. Currently
 there is a Horizon plugin for Murano App Catalog which adds Application
 catalog page to browse, search and filter applications. It also adds a
 dynamic UI functionality to render a Horizon forms without writing an
 actual code.
 
 

I feel like putting all the UI plugins in Horizon is the same mistake
as putting all of the functional tests in Tempest. It doesn't have the
affect of breaking the gate but it probably is a lot of burden on a
single team.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] The future of the integrated release

2014-08-20 Thread Clint Byrum

Excerpts from Robert Collins's message of 2014-08-18 23:41:20 -0700:
 On 18 August 2014 09:32, Clint Byrum cl...@fewbar.com wrote:
 
 I can see your perspective but I don't think its internally consistent...
 
  Here's why folk are questioning Ceilometer:
 
  Nova is a set of tools to abstract virtualization implementations.
 
 With a big chunk of local things - local image storage (now in
 glance), scheduling, rebalancing, ACLs and quotas. Other
 implementations that abstract over VM's at various layers already
 existed when Nova started - some bad ( some very bad!) and others
 actually quite ok.
 

The fact that we have local implementations of domain specific things is
irrelevant to the difference I'm trying to point out. Glance needs to
work with the same authentication semantics and share a common access
catalog to work well with Nova. It's unlikely there's a generic image
catalog that would ever fit this bill. In many ways glance is just an
abstraction of file storage backends and a database to track a certain
domain of files (images, and soon, templates and other such things).

The point of mentioning Nova is, we didn't write libvirt, or xen, we
wrote an abstraction so that users could consume them via a REST API
that shares these useful automated backends like glance.

  Neutron is a set of tools to abstract SDN/NFV implementations.
 
 And implements a DHCP service, DNS service, overlay networking : its
 much more than an abstraction-over-other-implementations.
 

Native DHCP and overlay? Last I checked Neutron used dnsmasq and
openvswitch, but it has been a few months, and I know that is an eon in
OpenStack time.

  Cinder is a set of tools to abstract block-device implementations.
  Trove is a set of tools to simplify consumption of existing databases.
  Sahara is a set of tools to simplify Hadoop consumption.
  Swift is a feature-complete implementation of object storage, none of
  which existed when it was started.
 
 Swift was started in 2009; Eucalyptus goes back to 2007, with Walrus
 part of that - I haven't checked precise dates, but I'm pretty sure
 that it existed and was usable by the start of 2009. There may well be
 other object storage implementations too - I simply haven't checked.
 

Indeed, and MogileFS was sort of like Swift but not HTTP based. Perhaps
Walrus was evaluated and inadequate for the CloudFiles product
requirements? I don't know. But there weren't de-facto object stores
at the time because object stores were just becoming popular.

  Keystone supports all of the above, unifying their auth.
 
 And implementing an IdP (which I know they want to stop doing ;)). And
 in fact lots of OpenStack projects, for various reasons support *not*
 using Keystone (something that bugs me, but thats a different
 discussion).
 

My point was it is justified to have a whole implementation and not
just abstraction because it is meant to enable the ecosystem, not _be_
the ecosystem. I actually think Keystone is problematic too, and I often
wonder why we haven't just do OAuth, but I'm not trying to throw every
project under the bus. I'm trying to state that we accept Keystone because
it has grown organically to support the needs of all the other pieces.

  Horizon supports all of the above, unifying their GUI.
 
  Ceilometer is a complete implementation of data collection and alerting.
  There is no shortage of implementations that exist already.
 
  I'm also core on two projects that are getting some push back these
  days:
 
  Heat is a complete implementation of orchestration. There are at least a
  few of these already in existence, though not as many as their are data
  collection and alerting systems.
 
  TripleO is an attempt to deploy OpenStack using tools that OpenStack
  provides. There are already quite a few other tools that _can_ deploy
  OpenStack, so it stands to reason that people will question why we
  don't just use those. It is my hope we'll push more into the unifying
  the implementations space and withdraw a bit from the implementing
  stuff space.
 
  So, you see, people are happy to unify around a single abstraction, but
  not so much around a brand new implementation of things that already
  exist.
 
 If the other examples we had were a lot purer, this explanation would
 make sense. I think there's more to it than that though :).
 

If purity is required to show a difference, then I don't think I know
how to demonstrate what I think is obvious to most of us: Ceilometer
is an end to end implementation of things that exist in many battle
tested implementations. I struggle to think of another component of
OpenStack that has this distinction.

 What exactly, I don't know, but its just too easy an answer, and one
 that doesn't stand up to non-trivial examination :(.
 
 I'd like to see more unification of implementations in TripleO - but I
 still believe our basic principle of using OpenStack technologies that
 already exist in preference to third party ones is still

Re: [openstack-dev] [all] The future of the integrated release

2014-08-20 Thread Clint Byrum

Excerpts from Jay Pipes's message of 2014-08-20 14:53:22 -0700:
 On 08/20/2014 05:06 PM, Chris Friesen wrote:
  On 08/20/2014 07:21 AM, Jay Pipes wrote:
  Hi Thierry, thanks for the reply. Comments inline. :)
 
  On 08/20/2014 06:32 AM, Thierry Carrez wrote:
  If we want to follow your model, we probably would have to dissolve
  programs as they stand right now, and have blessed categories on one
  side, and teams on the other (with projects from some teams being
  blessed as the current solution).
 
  Why do we have to have blessed categories at all? I'd like to think of
  a day when the TC isn't picking winners or losers at all. Level the
  playing field and let the quality of the projects themselves determine
  the winner in the space. Stop the incubation and graduation madness and
  change the role of the TC to instead play an advisory role to upcoming
  (and existing!) projects on the best ways to integrate with other
  OpenStack projects, if integration is something that is natural for the
  project to work towards.
 
  It seems to me that at some point you need to have a recommended way of
  doing things, otherwise it's going to be *really hard* for someone to
  bring up an OpenStack installation.
 
 Why can't there be multiple recommended ways of setting up an OpenStack 
 installation? Matter of fact, in reality, there already are multiple 
 recommended ways of setting up an OpenStack installation, aren't there?
 
 There's multiple distributions of OpenStack, multiple ways of doing 
 bare-metal deployment, multiple ways of deploying different message 
 queues and DBs, multiple ways of establishing networking, multiple open 
 and proprietary monitoring systems to choose from, etc. And I don't 
 really see anything wrong with that.
 

This is an argument for loosely coupling things, rather than tightly
integrating things. You will almost always win my vote with that sort of
movement, and you have here. +1.

  We already run into issues with something as basic as competing SQL
  databases.
 
 If the TC suddenly said Only MySQL will be supported, that would not 
 mean that the greater OpenStack community would be served better. It 
 would just unnecessarily take options away from deployers.
 

This is really where supported becomes the mutex binding us all. The
more supported options, the larger the matrix, the more complex a
user's decision process becomes.

   If every component has several competing implementations and
  none of them are official how many more interaction issues are going
  to trip us up?
 
 IMO, OpenStack should be about choice. Choice of hypervisor, choice of 
 DB and MQ infrastructure, choice of operating systems, choice of storage 
 vendors, choice of networking vendors.
 

Err, uh. I think OpenStack should be about users. If having 400 choices
means users are just confused, then OpenStack becomes nothing and
everything all at once. Choices should be part of the whole not when 1%
of the market wants a choice, but when 20%+ of the market _requires_
a choice.

What we shouldn't do is harm that 1%'s ability to be successful. We should
foster it and help it grow, but we don't just pull it into the program and
say You're ALSO in OpenStack now! and we also don't want to force those
users to make a hard choice because the better solution is not blessed.

 If there are multiple actively-developed projects that address the same 
 problem space, I think it serves our OpenStack users best to let the 
 projects work things out themselves and let the cream rise to the top. 
 If the cream ends up being one of those projects, so be it. If the cream 
 ends up being a mix of both projects, so be it. The production community 
 will end up determining what that cream should be based on what it 
 deploys into its clouds and what input it supplies to the teams working 
 on competing implementations.
 

I'm really not a fan of making it a competitive market. If a space has a
diverse set of problems, we can expect it will have a diverse set of
solutions that overlap. But that doesn't mean they both need to drive
toward making that overlap all-encompassing. Sometimes that happens and
it is good, and sometimes that happens and it causes horrible bloat.

 And who knows... what works or is recommended by one deployer may not be 
 what is best for another type of deployer and I believe we (the 
 TC/governance) do a disservice to our user community by picking a winner 
 in a space too early (or continuing to pick a winner in a clearly 
 unsettled space).
 

Right, I think our current situation crowds out diversity, when what we
want to do is enable it, without confusing the users.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] The future of the integrated release

2014-08-17 Thread Clint Byrum

Here's why folk are questioning Ceilometer:

Nova is a set of tools to abstract virtualization implementations.
Neutron is a set of tools to abstract SDN/NFV implementations.
Cinder is a set of tools to abstract block-device implementations.
Trove is a set of tools to simplify consumption of existing databases.
Sahara is a set of tools to simplify Hadoop consumption.
Swift is a feature-complete implementation of object storage, none of
which existed when it was started.
Keystone supports all of the above, unifying their auth.
Horizon supports all of the above, unifying their GUI.

Ceilometer is a complete implementation of data collection and alerting.
There is no shortage of implementations that exist already.

I'm also core on two projects that are getting some push back these
days:

Heat is a complete implementation of orchestration. There are at least a
few of these already in existence, though not as many as their are data
collection and alerting systems.

TripleO is an attempt to deploy OpenStack using tools that OpenStack
provides. There are already quite a few other tools that _can_ deploy
OpenStack, so it stands to reason that people will question why we
don't just use those. It is my hope we'll push more into the unifying
the implementations space and withdraw a bit from the implementing
stuff space.

So, you see, people are happy to unify around a single abstraction, but
not so much around a brand new implementation of things that already
exist.

Excerpts from Nadya Privalova's message of 2014-08-17 11:11:34 -0700:
 Hello all,
 
 As a Ceilometer's core, I'd like to add my 0.02$.
 
 During previous discussions it was mentioned several projects which were
 started or continue to be developed after Ceilometer became integrated. The
 main question I'm thinking of is why it was impossible to contribute into
 existing integrated project? Is it because of Ceilometer's architecture,
 the team or there are some other (maybe political) reasons? I think it's a
 very sad situation when we have 3-4 Ceilometer-like projects from different
 companies instead of the only one that satisfies everybody. (We don't see
 it in other projects. Though, maybe there are several Novas os Neutrons on
 StackForge and I don't know about it...)
 Of course, sometimes it's much easier to start the project from scratch.
 But there should be strong reasons for doing this if we are talking about
 integrated project.
 IMHO the idea, the role is the most important thing when we are talking
 about integrated project. And if Ceilometer's role is really needed (and I
 think it is) then we should improve existing implementation, merge all
 needs into the one project and the result will be still Ceilometer.
 
 Thanks,
 Nadya
 
 On Fri, Aug 15, 2014 at 12:41 AM, Joe Gordon joe.gord...@gmail.com wrote:
 
 
 
 
  On Wed, Aug 13, 2014 at 12:24 PM, Doug Hellmann d...@doughellmann.com
  wrote:
 
 
  On Aug 13, 2014, at 3:05 PM, Eoghan Glynn egl...@redhat.com wrote:
 
  
   At the end of the day, that's probably going to mean saying No to more
   things. Everytime I turn around everyone wants the TC to say No to
   things, just not to their particular thing. :) Which is human nature.
   But I think if we don't start saying No to more things we're going to
   end up with a pile of mud that no one is happy with.
  
   That we're being so abstract about all of this is frustrating. I get
   that no-one wants to start a flamewar, but can someone be concrete
  about
   what they feel we should say 'no' to but are likely to say 'yes' to?
  
  
   I'll bite, but please note this is a strawman.
  
   No:
   * Accepting any more projects into incubation until we are comfortable
  with
   the state of things again
   * Marconi
   * Ceilometer
  
   Well -1 to that, obviously, from me.
  
   Ceilometer is on track to fully execute on the gap analysis coverage
   plan agreed with the TC at the outset of this cycle, and has an active
   plan in progress to address architectural debt.
 
  Yes, there seems to be an attitude among several people in the community
  that the Ceilometer team denies that there are issues and refuses to work
  on them. Neither of those things is the case from our perspective.
 
 
  Totally agree.
 
 
 
  Can you be more specific about the shortcomings you see in the project
  that aren’t being addressed?
 
 
 
  Once again, this is just a strawman.
 
  I'm just not sure OpenStack has 'blessed' the best solution out there.
 
 
  https://wiki.openstack.org/wiki/Ceilometer/Graduation#Why_we_think_we.27re_ready
 
  
 
 - Successfully passed the challenge of being adopted by 3 related
 projects which have agreed to join or use ceilometer:
- Synaps
- Healthnmon
- StackTach

  https://wiki.openstack.org/w/index.php?title=StackTachaction=editredlink=1

 
 
  Stacktach seems to still be under active development (
  http://git.openstack.org/cgit/stackforge/stacktach/log/), is used by

Re: [openstack-dev] [TripleO] fix poor tarball support in source-repositories

2014-08-16 Thread Clint Byrum

Excerpts from Jyoti Ranjan's message of 2014-08-16 00:57:52 -0700:
 We will have to be little bit cautious in using glob because of its
 inherent usage pattern. For e.g. the file starting with . will not get
 matched.
 

That is a separate bug, but I think the answer to that is to use rsync
instead of mv and globs. So this:

mv $tmp/./* $destdir

becomes this:

rsync --remove-source-files $tmp/. $destdir

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Time to Samba! :-)

2014-08-16 Thread Clint Byrum

Excerpts from Martinx - ジェームズ's message of 2014-08-16 12:03:20 -0700:
 Hey Stackers,
 
  I'm wondering here... Samba4 is pretty solid (up coming 4.2 rocks), I'm
 using it on a daily basis as an AD DC controller, for both Windows and
 Linux Instances! With replication, file system ACLs - cifs, built-in LDAP,
 dynamic DNS with Bind9 as a backend (no netbios) and etc... Pretty cool!
 
  In OpenStack ecosystem, there are awesome solutions like Trove, Solum,
 Designate and etc... Amazing times BTW! So, why not try to integrate
 Samba4, working as an AD DC, within OpenStack itself?!
 

But, if we did that, what would be left for us to reinvent in our own
slightly different way?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] fix poor tarball support in source-repositories

2014-08-15 Thread Clint Byrum

Excerpts from Brownell, Jonathan C (Corvallis)'s message of 2014-08-15 08:11:18 
-0700:
 The current DIB element support for downloading tarballs via 
 source-repository allows an entry in the following form:
 
 name tar targetdir url
 
 Today, this feature is currently used only by the mysql DIB element. You can 
 see how it's used here:
 https://github.com/openstack/tripleo-image-elements/blob/master/elements/mysql/source-repository-mysql
 
 However, the underlying diskimage-builder implementation of tarball handling 
 is rather odd and inflexible. After downloading the file (or retrieving from 
 cache) and unpacking into a tmp directory, it performs:
 
 mv $tmp/*/* $targetdir
 
 This does work as long as the tarball follows a structure where all its 
 files/directories are contained within a single directory, but it fails if 
 the tarball contains no subdirectories. (Even worse is when it contains some 
 files and some subdirectories, in which case the files are lost and the 
 contents of all subdirs get lumped together in the output folder.)
 
 Since this tarball support is only used today by the mysql DIB element, I 
 would love to fix this in both diskimage-builder and tripleo-image-element by 
 changing to simply:
 
 mv $tmp/* $targetdir
 
 And then manually tweaking the directory structure of $targetdir from a new 
 install.d script in the mysql element to restore the desired layout.
 
 However, it's important to note that this will break backwards compatibility 
 if tarball support is used in its current fashion by users with private DIB 
 elements.
 
 Personally, I consider the current behavior so egregious that it really needs 
 to be fixed across the board rather than preserving backwards compatibility.
 
 Do others agree? If not, do you have suggestions as to how to improve this 
 mechanism cleanly without sacrificing backwards compatibility?
 

How about we make a glob to use, so like this:

mysql tar /usr/local/mysql http://someplace/mysql.tar.gz mysql-5.*

That would result in

mv $tmp/mysql-5.*/* $targetdir

And then we would warn that assuming the glob will be '*' is deprecated,
to be changed in a later release.

Users who want your proposed behavior would use . until the default
changes. That would result in

mv $tmp/./* $targetdir

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] The future of the integrated release

2014-08-13 Thread Clint Byrum

Excerpts from Thierry Carrez's message of 2014-08-13 02:54:58 -0700:
 Rochelle.RochelleGrober wrote:
  [...]
  So, with all that prologue, here is what I propose (and please consider 
  proposing your improvements/changes to it).  I would like to see for Kilo:
  
  - IRC meetings and mailing list meetings beginning with Juno release and 
  continuing through the summit that focus on core project needs (what 
  Thierry call strategic) that as a set would be considered the primary 
  focus of the Kilo release for each project.  This could include high 
  priority bugs, refactoring projects, small improvement projects, high 
  interest extensions and new features, specs that didn't make it into Juno, 
  etc.
  - Develop the list and prioritize it into Needs and Wants. Consider 
  these the feeder projects for the two runways if you like.  
  - Discuss the lists.  Maybe have a community vote? The vote will freeze 
  the list, but as in most development project freezes, it can be a soft 
  freeze that the core, or drivers or TC can amend (or throw out for that 
  matter).
  [...]
 
 One thing we've been unable to do so far is to set release goals at
 the beginning of a release cycle and stick to those. It used to be
 because we were so fast moving that new awesome stuff was proposed
 mid-cycle and ends up being a key feature (sometimes THE key feature)
 for the project. Now it's because there is so much proposed noone knows
 what will actually get completed.
 
 So while I agree that what you propose is the ultimate solution (and the
 workflow I've pushed PTLs to follow every single OpenStack release so
 far), we have struggled to have the visibility, long-term thinking and
 discipline to stick to it in the past. If you look at the post-summit
 plans and compare to what we end up in a release, you'll see quite a lot
 of differences :)
 

I think that shows agility, and isn't actually a problem. 6 months
is quite a long time in the future for some business models. Strategic
improvements for the project should be able to stick to a 6 month
schedule, but companies will likely be tactical about where their
developer resources are directed for feature work.

The fact that those resources land code upstream is one of the greatest
strengths of OpenStack. Any potential impact on how that happens should
be carefully considered when making any changes to process and
governance.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

2014-08-11 Thread Clint Byrum

Excerpts from Steve Baker's message of 2014-08-10 15:33:26 -0700:
 On 02/08/14 04:07, Allison Randal wrote:
  A few of us have been independently experimenting with Ansible as a
  backend for TripleO, and have just decided to try experimenting
  together. I've chatted with Robert, and he says that TripleO was always
  intended to have pluggable backends (CM layer), and just never had
  anyone interested in working on them. (I see it now, even in the early
  docs and talks, I guess I just couldn't see the forest for the trees.)
  So, the work is in line with the overall goals of the TripleO project.
 
  We're starting with a tiny scope, focused only on updating a running
  TripleO deployment, so our first work is in:
 
  - Create an Ansible Dynamic Inventory plugin to extract metadata from Heat
  - Improve/extend the Ansible nova_compute Cloud Module (or create a new
  one), for Nova rebuild
  - Develop a minimal handoff from Heat to Ansible, particularly focused
  on the interactions between os-collect-config and Ansible
 
  We're merging our work in this repo, until we figure out where it should
  live:
 
  https://github.com/allisonrandal/tripleo-ansible
 
  We've set ourselves one week as the first sanity-check to see whether
  this idea is going anywhere, and we may scrap it all at that point. But,
  it seems best to be totally transparent about the idea from the start,
  so no-one is surprised later.
 
 Having pluggable backends for configuration seems like a good idea, and
 Ansible is a great choice for the first alternative backend.
 

TripleO is intended to be loosely coupled for many components, not just
in-instance configuration.

 However what this repo seems to be doing at the moment is bypassing heat
 to do a stack update, and I can only assume there is an eventual goal to
 not use heat at all for stack orchestration too.


 Granted, until blueprint update-failure-recovery lands[1] then doing a
 stack-update is about as much fun as russian roulette. But this effort
 is tactical rather than strategic, especially given TripleO's mission
 statement.
 

We intend to stay modular. Ansible won't replace Heat from end to end.

Right now we're stuck with an update that just doesn't work. It isn't
just about update-failure-recovery, which is coming along nicely, but
it is also about the lack of signals to control rebuild, poor support
for addressing machines as groups, and unacceptable performance in
large stacks.

We remain committed to driving these things into Heat, which will allow
us to address these things the way a large scale operation will need to.

But until we can land those things in Heat, we need something more
flexible like Ansible to go around Heat and do things in the exact
order we need them done. Ansible doesn't have a REST API, which is a
non-starter for modern automation, but the need to control workflow is
greater than the need to have a REST API at this point.

 If I were to use Ansible for TripleO configuration I would start with
 something like the following:
 * Install an ansible software-config hook onto the image to be triggered
 by os-refresh-config[2][3]
 * Incrementally replace StructuredConfig resources in
 tripleo-heat-templates with SoftwareConfig resources that include the
 ansible playbooks via get_file
 * The above can start in a fork of tripleo-heat-templates, but can
 eventually be structured using resource providers so that the deployer
 chooses what configuration backend to use by selecting the environment
 file that contains the appropriate config resources
 
 Now you have a cloud orchestrated by heat and configured by Ansible. If
 it is still deemed necessary to do an out-of-band update to the stack
 then you're in a much better position to do an ansible push, since you
 can use the same playbook files that heat used to bring up the stack.
 

That would be a good plan if we wanted to fix issues with os-*-config,
but that is the opposite of reality. We are working around Heat
orchestration issues with Ansible.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

2014-08-11 Thread Clint Byrum

Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
 On 11/08/14 10:46, Clint Byrum wrote:
  Right now we're stuck with an update that just doesn't work. It isn't
  just about update-failure-recovery, which is coming along nicely, but
  it is also about the lack of signals to control rebuild, poor support
  for addressing machines as groups, and unacceptable performance in
  large stacks.
 
 Are there blueprints/bugs filed for all of these issues?
 

Convergnce addresses the poor performance for large stacks in general.
We also have this:

https://bugs.launchpad.net/heat/+bug/1306743

Which shows how slow metadata access can get. I have worked on patches
but haven't been able to complete them. We made big strides but we are
at a point where 40 nodes polling Heat every 30s is too much for one CPU
to handle. When we scaled Heat out onto more CPUs on one box by forking
we ran into eventlet issues. We also ran into issues because even with
many processes we can only use one to resolve templates for a single
stack during update, which was also excessively slow.

We haven't been able to come back around to those yet, but you can see
where this has turned into a bit of a rat hole of optimization.

action-aware-sw-config is sort of what we want for rebuild. We
collaborated with the trove devs on how to also address it for resize
a while back but I have lost track of that work as it has taken a back
seat to more pressing issues.

Addressing groups is a general problem that I've had a hard time
articulating in the past. Tomas Sedovic has done a good job with this
TripleO spec, but I don't know that we've asked for an explicit change
in a bug or spec in Heat just yet:

https://review.openstack.org/#/c/97939/

There are a number of other issues noted in that spec which are already
addressed in Heat, but require refactoring in TripleO's templates and
tools, and that work continues.

The point remains: we need something that works now, and doing an
alternate implementation for updates is actually faster than addressing
all of these issues.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

2014-08-11 Thread Clint Byrum

Excerpts from Steven Hardy's message of 2014-08-11 11:40:07 -0700:
 On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote:
  Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
   On 11/08/14 10:46, Clint Byrum wrote:
Right now we're stuck with an update that just doesn't work. It isn't
just about update-failure-recovery, which is coming along nicely, but
it is also about the lack of signals to control rebuild, poor support
for addressing machines as groups, and unacceptable performance in
large stacks.
   
   Are there blueprints/bugs filed for all of these issues?
   
  
  Convergnce addresses the poor performance for large stacks in general.
  We also have this:
  
  https://bugs.launchpad.net/heat/+bug/1306743
  
  Which shows how slow metadata access can get. I have worked on patches
  but haven't been able to complete them. We made big strides but we are
  at a point where 40 nodes polling Heat every 30s is too much for one CPU
  to handle. When we scaled Heat out onto more CPUs on one box by forking
  we ran into eventlet issues. We also ran into issues because even with
  many processes we can only use one to resolve templates for a single
  stack during update, which was also excessively slow.
 
 Related to this, and a discussion we had recently at the TripleO meetup is
 this spec I raised today:
 
 https://review.openstack.org/#/c/113296/
 
 It's following up on the idea that we could potentially address (or at
 least mitigate, pending the fully convergence-ified heat) some of these
 scalability concerns, if TripleO moves from the one-giant-template model
 to a more modular nested-stack/provider model (e.g what Tomas has been
 working on)
 
 I've not got into enough detail on that yet to be sure if it's acheivable
 for Juno, but it seems initially to be complex-but-doable.
 
 I'd welcome feedback on that idea and how it may fit in with the more
 granular convergence-engine model.
 
 Can you link to the eventlet/forking issues bug please?  I thought since
 bug #1321303 was fixed that multiple engines and multiple workers should
 work OK, and obviously that being true is a precondition to expending
 significant effort on the nested stack decoupling plan above.
 

That was the issue. So we fixed that bug, but we never un-reverted
the patch that forks enough engines to use up all the CPU's on a box
by default. That would likely help a lot with metadata access speed
(we could manually do it in TripleO but we tend to push defaults. :)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

2014-08-11 Thread Clint Byrum

Excerpts from Zane Bitter's message of 2014-08-11 13:35:44 -0700:
 On 11/08/14 14:49, Clint Byrum wrote:
  Excerpts from Steven Hardy's message of 2014-08-11 11:40:07 -0700:
  On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote:
  Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
  On 11/08/14 10:46, Clint Byrum wrote:
  Right now we're stuck with an update that just doesn't work. It isn't
  just about update-failure-recovery, which is coming along nicely, but
  it is also about the lack of signals to control rebuild, poor support
  for addressing machines as groups, and unacceptable performance in
  large stacks.
 
  Are there blueprints/bugs filed for all of these issues?
 
 
  Convergnce addresses the poor performance for large stacks in general.
  We also have this:
 
  https://bugs.launchpad.net/heat/+bug/1306743
 
  Which shows how slow metadata access can get. I have worked on patches
  but haven't been able to complete them. We made big strides but we are
  at a point where 40 nodes polling Heat every 30s is too much for one CPU
 
 This sounds like the same figure I heard at the design summit; did the 
 DB call optimisation work that Steve Baker did immediately after that 
 not have any effect?
 

Steve's work got us to 40. From 7.

  to handle. When we scaled Heat out onto more CPUs on one box by forking
  we ran into eventlet issues. We also ran into issues because even with
  many processes we can only use one to resolve templates for a single
  stack during update, which was also excessively slow.
 
  Related to this, and a discussion we had recently at the TripleO meetup is
  this spec I raised today:
 
  https://review.openstack.org/#/c/113296/
 
  It's following up on the idea that we could potentially address (or at
  least mitigate, pending the fully convergence-ified heat) some of these
  scalability concerns, if TripleO moves from the one-giant-template model
  to a more modular nested-stack/provider model (e.g what Tomas has been
  working on)
 
  I've not got into enough detail on that yet to be sure if it's acheivable
  for Juno, but it seems initially to be complex-but-doable.
 
  I'd welcome feedback on that idea and how it may fit in with the more
  granular convergence-engine model.
 
  Can you link to the eventlet/forking issues bug please?  I thought since
  bug #1321303 was fixed that multiple engines and multiple workers should
  work OK, and obviously that being true is a precondition to expending
  significant effort on the nested stack decoupling plan above.
 
 
  That was the issue. So we fixed that bug, but we never un-reverted
  the patch that forks enough engines to use up all the CPU's on a box
  by default. That would likely help a lot with metadata access speed
  (we could manually do it in TripleO but we tend to push defaults. :)
 
 Right, and we decided we wouldn't because it's wrong to do that to 
 people by default. In some cases the optimal running configuration for 
 TripleO will differ from the friendliest out-of-the-box configuration 
 for Heat users in general, and in those cases - of which this is one - 
 TripleO will need to specify the configuration.
 

Whether or not the default should be to fork 1 process per CPU is a
debate for another time. The point is, we can safely use the forking in
Heat now to perhaps improve performance of metadata polling.

Chasing that, and other optimizations, has not led us to a place where
we can get to, say, 100 real nodes _today_. We're chasing another way to
get to the scale and capability we need _today_, in much the same way
we did with merge.py. We'll find the way to get it done more elegantly
as time permits.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] a small experiment with Ansible in TripleO

2014-08-04 Thread Clint Byrum

I've been fiddling on github. This repo is unfortunately named the same
but is not the same ancestry as yours. Anyway, the branch 'fiddling' has
a working Heat inventory plugin which should give you a hostvar of
'heat_metadata' per host in the given stack.

https://github.com/SpamapS/tripleo-ansible/blob/fiddling/plugins/inventory/heat.py

Note that in the root there is a 'heat-ansible-inventory.conf' that is
an example config (works w/ devstack) to query a heat stack and turn it
into an ansible inventory. That uses oslo.config so all of the usual
patterns for loading configs in openstack should apply.

Excerpts from Allison Randal's message of 2014-08-01 09:07:44 -0700:
 A few of us have been independently experimenting with Ansible as a
 backend for TripleO, and have just decided to try experimenting
 together. I've chatted with Robert, and he says that TripleO was always
 intended to have pluggable backends (CM layer), and just never had
 anyone interested in working on them. (I see it now, even in the early
 docs and talks, I guess I just couldn't see the forest for the trees.)
 So, the work is in line with the overall goals of the TripleO project.
 
 We're starting with a tiny scope, focused only on updating a running
 TripleO deployment, so our first work is in:
 
 - Create an Ansible Dynamic Inventory plugin to extract metadata from Heat
 - Improve/extend the Ansible nova_compute Cloud Module (or create a new
 one), for Nova rebuild
 - Develop a minimal handoff from Heat to Ansible, particularly focused
 on the interactions between os-collect-config and Ansible
 
 We're merging our work in this repo, until we figure out where it should
 live:
 
 https://github.com/allisonrandal/tripleo-ansible
 
 We've set ourselves one week as the first sanity-check to see whether
 this idea is going anywhere, and we may scrap it all at that point. But,
 it seems best to be totally transparent about the idea from the start,
 so no-one is surprised later.
 
 Cheers,
 Allison
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Cross-server locking for neutron server

2014-07-30 Thread Clint Byrum

Please do not re-invent locking.. the way we reinvented locking in Heat.
;)

There are well known distributed coordination services such as Zookeeper
and etcd, and there is an abstraction for them already called tooz:

https://git.openstack.org/cgit/stackforge/tooz/

Excerpts from Elena Ezhova's message of 2014-07-30 09:09:27 -0700:
 Hello everyone!
 
 Some recent change requests ([1], [2]) show that there is a number of
 issues with locking db resources in Neutron.
 
 One of them is initialization of drivers which can be performed
 simultaneously by several neutron servers. In this case locking is
 essential for avoiding conflicts which is now mostly done via using
 SQLAlchemy's
 with_lockmode() method, which emits SELECT..FOR UPDATE resulting in rows
 being locked within a transaction. As it has been already stated by Mike
 Bayer [3], this statement is not supported by Galera and, what’s more, by
 Postgresql for which a lock doesn’t work in case when a table is empty.
 
 That is why there is a need for an easy solution that would allow
 cross-server locking and would work for every backend. First thing that
 comes into mind is to create a table which would contain all locks acquired
 by various pieces of code. Each time a code, that wishes to access a table
 that needs locking, would have to perform the following steps:
 
 1. Check whether a lock is already acquired by using SELECT lock_name FROM
 cross_server_locks table.
 
 2. If SELECT returned None, acquire a lock by inserting it into the
 cross_server_locks table.
 
In other case wait and then try again until a timeout is reached.
 
 3. After a code has executed it should release the lock by deleting the
 corresponding entry from the cross_server_locks table.
 
 The locking process can be implemented by decorating a function that
 performs a transaction by a special function, or as a context manager.
 
 Thus, I wanted to ask the community whether this approach deserves
 consideration and, if yes, it would be necessary to decide on the format of
 an entry in cross_server_locks table: how a lock_name should be formed,
 whether to support different locking modes, etc.
 
 
 [1] https://review.openstack.org/#/c/101982/
 
 [2] https://review.openstack.org/#/c/107350/
 
 [3]
 https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Pessimistic_Locking_-_SELECT_FOR_UPDATE

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Cross-server locking for neutron server

2014-07-30 Thread Clint Byrum

Excerpts from Doug Wiegley's message of 2014-07-30 09:48:17 -0700:
  I'd have to look at the Neutron code, but I suspect that a simple
  strategy of issuing the UPDATE SQL statement with a WHERE condition that
 
 I¹m assuming the locking is for serializing code, whereas for what you
 describe above, is there some reason we wouldn¹t just use a transaction?
 

I believe the code in question is doing something like this:

1) Check DB for initialized SDN controller driver
2) Not initialized - initialize the SDN controller via its API
3) Record in DB that it is initialized

Step (2) above needs serialization, not (3).

Compare and update will end up working like a distributed lock anyway,
because the db model will have to be changed to have an initializing
state, and then if initializing fails, you'll have to have a timeout.. and
stealing for stuck processes.

Sometimes a distributed lock is actually a simpler solution.

Tooz will need work, no doubt. Perhaps if we call it 'oslo.locking' it
will make more sense. Anyway, my point stands: trust the experts, avoid
reinventing locking. And if you don't like tooz, extract the locking
code from Heat and turn it into an oslo.locking library or something.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [heat] [Solum] Stack update and raw_template backup

2014-07-30 Thread Clint Byrum

Excerpts from Anant Patil's message of 2014-07-29 23:21:05 -0700:
 On 28-Jul-14 22:37, Clint Byrum wrote:
  Excerpts from Zane Bitter's message of 2014-07-28 07:25:24 -0700:
  On 26/07/14 00:04, Anant Patil wrote:
  When the stack is updated, a diff of updated template and current
  template can be stored to optimize database.  And perhaps Heat should
  have an API to retrieve this history of templates for inspection etc.
  when the stack admin needs it.
 
  If there's a demand for that feature we could implement it, but it 
  doesn't easily fall out of the current implementation any more.
  
  We are never going to do it even 1/10th as well as git. In fact we won't
  even do it 1/0th as well as CVS.
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
 
 Zane,
 I am working the defect you had filed, which would clean up backup stack
 along with the resources, templates and other data.
 
 However, I simply don't want to delete the templates for the same reason
 as we don't hard-delete the stack. Anyone who deploys a stack and
 updates it over time would want to view the the updates in the templates
 for debugging or auditing reasons. It is not fair to assume that every
 user has a VCS with him to store the templates. It is kind of
 inconvenience for me to not have the ability to view my updates in
 templates.
 

Sounds like a nice to have feature. I'd suggest you propose it as a
blueprint and spec. I will personally be against us spending time and
adding complexity for such a feature when it is so much better served
by VCS.

And I would also suggest that we _can_ assume that users have VCS. When
is the last time you encountered a developer or ops professional that
did not use at least some kind of VCS? For me, it was 2003, and it took
approximately 20 minutes to implement.

And if we want that as a service, I believe Solum is working on
doing that.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Cross-server locking for neutron server

2014-07-30 Thread Clint Byrum

Excerpts from Jay Pipes's message of 2014-07-30 13:53:38 -0700:
 On 07/30/2014 12:21 PM, Kevin Benton wrote:
  Maybe I misunderstood your approach then.
 
  I though you were suggesting where a node performs an UPDATE record
  WHERE record = last_state_node_saw query and then checks the number of
  affected rows. That's optimistic locking by every definition I've heard
  of it. It matches the following statement from the wiki article you
  linked to as well:
 
  The latter situation (optimistic locking) is only appropriate when
  there is less chance of someone needing to access the record while it is
  locked; otherwise it cannot be certain that the update will succeed
  because the attempt to update the record will fail if another user
  updates the record first.
 
  Did I misinterpret how your approach works?
 
 The record is never locked in my approach, why is why I don't like to 
 think of it as optimistic locking. It's more like optimistic read and 
 update with retry if certain conditions continue to be met... :)
 
 To be very precise, the record is never locked explicitly -- either 
 through the use of SELECT FOR UPDATE or some explicit file or 
 distributed lock. InnoDB won't even hold a lock on anything, as it will 
 simply add a new version to the row using its MGCC (sometimes called 
 MVCC) methods.
 
 The technique I am showing in the patch relies on the behaviour of the 
 SQL UPDATE statement with a WHERE clause that contains certain columns 
 and values from the original view of the record. The behaviour of the 
 UPDATE statement will be a NOOP when some other thread has updated the 
 record in between the time that the first thread read the record, and 
 the time the first thread attempted to update the record. The caller of 
 UPDATE can detect this NOOP by checking the number of affected rows, and 
 retry the UPDATE if certain conditions remain kosher...
 
 So, there's actually no locks taken in the entire process, which is why 
 I object to the term optimistic locking :) I think where the confusion 
 has been is that the initial SELECT and the following UPDATE statements 
 are *not* done in the context of a single SQL transaction...
 

This is all true at a low level Jay. But if you're serializing something
outside the DB by using the doing it versus done it state, it still
acts like a lock.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [heat] Saving the original raw template in the DB

2014-07-29 Thread Clint Byrum


Excerpts from Ton Ngo's message of 2014-07-29 13:53:12 -0700:
 
 Hi everyone,
 The raw template saved in the DB used to be the original template that
 a user submits.  With the recent fix for stack update, it now reflects the
 template that is actually deployed, so it may be different from the
 original template because some resources may fail to deploy.  I would like
 to solicit some feedback on saving the original template in the DB
 separately from the deployed template.  I can think of two use cases for
 retrieving the original template:
Debugging:  running stack-update using the same template after fixing
environmental problems.  The CLI and API can be extended to allow
reusing the original template without having to provide it again.
Convergence or retry:  some initial resource deployment may fail
intermittently, but the user can retry later.
 

I believe this use case is far better handled via vcs. We need the
template to parse the current state of the stack. The user will have
their intended template and can have their intended parameter values
all included in a VCS.

  Are there other potential use cases?The cost would be an extra
 copy of the template in the raw template table for each stack if there is
 failure, and a new column in the stack table to hold the id.  We can argue
 that the user should have the original template to resubmit, but it seems
 useful and convenient to save it in the DB.
 Ton Ngo,
 

Additional cost is the additional complexity of code to manage the data.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [heat] Stack update and raw_template backup

2014-07-28 Thread Clint Byrum

Excerpts from Zane Bitter's message of 2014-07-28 07:25:24 -0700:
 On 26/07/14 00:04, Anant Patil wrote:
  When the stack is updated, a diff of updated template and current
  template can be stored to optimize database.  And perhaps Heat should
  have an API to retrieve this history of templates for inspection etc.
  when the stack admin needs it.
 
 If there's a demand for that feature we could implement it, but it 
 doesn't easily fall out of the current implementation any more.

We are never going to do it even 1/10th as well as git. In fact we won't
even do it 1/0th as well as CVS.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Use MariaDB by default on Fedora

2014-07-25 Thread Clint Byrum

Excerpts from John Griffith's message of 2014-07-25 06:59:38 -0700:
On Fri, Jul 25, 2014 at 7:38 AM, Kerrin, Michael michael.ker...@hp.com
wrote:

Coming back to this.

I have updated the review https://review.openstack.org/#/c/90134/ so that
it passing CI for ubuntu (obviously failing on fedora) and I am happy with
it. In order to close this off my plan is to getting feedback on the mysql
element in this review. Any changes that people request in the next few
days I will make and test via the CI and internally. Next I will rename
mysql - percona and restore the old mysql in this review. At which point
the percona code will not be tested via CI so I don't want to make any more
changes at that point so I hope it will get approved. So this review will
move to adding a percona element.

Then following the mariadb integration I would like to get this
https://review.openstack.org/#/c/109415/ change to tripleo-incubator
through that will include the new percona element in ubuntu images. So in
the CI fedora will us mariadb and ubuntu will use percona.

Looking forward to any feedback,

Michael

On 09 July 2014 14:44:15, Sullivan, Jon Paul wrote:

-Original Message-
From: Giulio Fidente [mailto:gfide...@redhat.com]
Sent: 04 July 2014 14:37
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [TripleO] Use MariaDB by default on Fedora

On 07/01/2014 05:47 PM, Michael Kerrin wrote:

I propose making mysql an abstract element and user must choose either
percona or mariadb-rpm element.CI must be setup correctly

seems a cleaner and more sustainable approach

There was some concern from lifeless around recreating package-style
dependencies in dib with element-provides/element-deps, in particular a
suggestion that meta-elements are not desirable[1] (I hope I am
paraphrasing you correctly Rob).

That said, this is exactly the reason that element-provides was brought
in, so that the definition of the image could have mysql as an element,
but that the DIB_*_EXTRA_ARGS variable would provide the correct one, which
would then list itself as providing mysql.

This would not prevent the sharing of common code through a
differently-named element, such as mysql-common.

[1] see comments on April 10th in https://review.openstack.org/#/c/85776/

--
Giulio Fidente
GPG KEY: 08D733BA

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Thanks,
Jon-Paul Sullivan ☺ Cloud Services - @hpcloud

Postal Address: Hewlett-Packard Galway Limited, Ballybrit Business Park,
Galway.
Registered Office: Hewlett-Packard Galway Limited, 63-74 Sir John
Rogerson's Quay, Dublin 2.
Registered Number: 361933

The contents of this message and any attachments to it are confidential
and may be legally privileged. If you have received this message in error
you should delete it from your system immediately and advise the sender.

To any recipient of this message within HP, unless otherwise stated, you
should consider this message and attachments as HP CONFIDENTIAL.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

So this all sounds like an interesting mess. I'm not even really sure I
follow all that's going on in the database area with the exception of the
design which it seems is something that takes no account for testing or
commonality across platforms (pretty bad IMO) but I don't have any insight
there so I'll butt out.

The LIO versus Tgt thing however is a bit troubling. Is there a reason
that TripleO decided to do the exact opposite of what the defaults are in
the rest of OpenStack here? Also any reason why if there was a valid
justification for this it didn't seem like it might be worthwhile to work
with the rest of the OpenStack community and share what they considered to
be the better solution here?

John, please be specific when you say the defaults are in the rest of
OpenStack. We have a stated goal to deploy _with the defaults_. The
default iscsi_helper is tgtadmin. We deploy with that unless another is
selected. As you see below, nothing is asserted there unless a value is
set:

https://git.openstack.org/cgit/openstack/tripleo-image-elements/tree/elements/cinder/os-apply-config/etc/cinder/cinder.conf#n41

And the default in the Heat templates that will set that value matches
cinder's current default:

Re: [openstack-dev] [heat]Heat Db Model updates

2014-07-25 Thread Clint Byrum

Excerpts from Zane Bitter's message of 2014-07-24 12:09:39 -0700:
 On 17/07/14 07:51, Ryan Brown wrote:
  On 07/17/2014 03:33 AM, Steven Hardy wrote:
  On Thu, Jul 17, 2014 at 12:31:05AM -0400, Zane Bitter wrote:
  On 16/07/14 23:48, Manickam, Kanagaraj wrote:
  SNIP
  *Resource*
 
  Status  action should be enum of predefined status
 
  +1
 
  Rsrc_metadata - make full name resource_metadata
 
  -0. I don't see any benefit here.
 
  Agreed
 
 
  I'd actually be in favor of the change from rsrc-resource, I feel like
  rsrc is a pretty opaque abbreviation.
 
 I'd just like to remind everyone that these changes are not free. 
 Database migrations are a pain to manage, and every new one slows down 
 our unit tests.
 
 We now support multiple heat-engines connected to the same database and 
 people want to upgrade their installations, so that means we have to be 
 able to handle different versions talking to the same database. Unless 
 somebody has a bright idea I haven't thought of, I assume that means 
 carrying code to handle both versions for 6 months before actually being 
 able to implement the migration. Or are we saying that you have to 
 completely shut down all instances of Heat to do an upgrade?
 
 The name of the nova_instance column is so egregiously misleading that 
 it's probably worth the pain. Using an enumeration for the states will 
 save a lot of space in the database (though it would be a much more 
 obvious win if we were querying on those columns). Changing a random 
 prefix that was added to avoid a namespace conflict to a slightly 
 different random prefix is well below the cost-benefit line IMO.

In past lives managing apps like Heat, We've always kept supporting the
previous schema in new code versions. So the process is:

* Upgrade all code
* Restart all services
* Upgrade database schema
* Wait a bit for reverts
* Remove backward compatibility

Now this was always in more of a continuous delivery environment, so
there was not more than a few weeks of waiting for reverts. In OpenStack
we'd have a single release to wait.

We're not special though, doesn't Nova have some sort of object versioning
code that helps them manage the versions of each type of data for this
very purpose?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [gate] The gate: a failure analysis

2014-07-21 Thread Clint Byrum

Thanks Matthew for the analysis.

I think you missed something though.

Right now the frustration is that unrelated intermittent bugs stop your
presumably good change from getting in.

Without gating, the result would be that even more bugs, many of them not
intermittent at all, would get in. Right now, the one random developer
who has to hunt down the rechecks and do them is inconvenienced. But
without a gate, _every single_ developer will be inconvenienced until
the fix is merged.

The false negative rate is _way_ too high. Nobody would disagree there.
However, adding more false negatives and allowing more people to ignore
the ones we already have, seems like it would have the opposite effect:
Now instead of annoying the people who hit the random intermittent bugs,
we'll be annoying _everybody_ as they hit the non-intermittent ones.

Excerpts from Matthew Booth's message of 2014-07-21 03:38:07 -0700:
 On Friday evening I had a dependent series of 5 changes all with
 approval waiting to be merged. These were all refactor changes in the
 VMware driver. The changes were:
 
 * VMware: DatastorePath join() and __eq__()
 https://review.openstack.org/#/c/103949/
 
 * VMware: use datastore classes get_allowed_datastores/_sub_folder
 https://review.openstack.org/#/c/103950/
 
 * VMware: use datastore classes in file_move/delete/exists, mkdir
 https://review.openstack.org/#/c/103951/
 
 * VMware: Trivial indentation cleanups in vmops
 https://review.openstack.org/#/c/104149/
 
 * VMware: Convert vmops to use instance as an object
 https://review.openstack.org/#/c/104144/
 
 The last change merged this morning.
 
 In order to merge these changes, over the weekend I manually submitted:
 
 * 35 rechecks due to false negatives, an average of 7 per change
 * 19 resubmissions after a change passed, but its dependency did not
 
 Other interesting numbers:
 
 * 16 unique bugs
 * An 87% false negative rate
 * 0 bugs found in the change under test
 
 Because we don't fail fast, that is an average of at least 7.3 hours in
 the gate. Much more in fact, because some runs fail on the second pass,
 not the first. Because we don't resubmit automatically, that is only if
 a developer is actively monitoring the process continuously, and
 resubmits immediately on failure. In practise this is much longer,
 because sometimes we have to sleep.
 
 All of the above numbers are counted from the change receiving an
 approval +2 until final merging. There were far more failures than this
 during the approval process.
 
 Why do we test individual changes in the gate? The purpose is to find
 errors *in the change under test*. By the above numbers, it has failed
 to achieve this at least 16 times previously.
 
 Probability of finding a bug in the change under test: Small
 Cost of testing:   High
 Opportunity cost of slowing development:   High
 
 and for comparison:
 
 Cost of reverting rare false positives:Small
 
 The current process expends a lot of resources, and does not achieve its
 goal of finding bugs *in the changes under test*. In addition to using a
 lot of technical resources, it also prevents good change from making its
 way into the project and, not unimportantly, saps the will to live of
 its victims. The cost of the process is overwhelmingly greater than its
 benefits. The gate process as it stands is a significant net negative to
 the project.
 
 Does this mean that it is worthless to run these tests? Absolutely not!
 These tests are vital to highlight a severe quality deficiency in
 OpenStack. Not addressing this is, imho, an existential risk to the
 project. However, the current approach is to pick contributors from the
 community at random and hold them personally responsible for project
 bugs selected at random. Not only has this approach failed, it is
 impractical, unreasonable, and poisonous to the community at large. It
 is also unrelated to the purpose of gate testing, which is to find bugs
 *in the changes under test*.
 
 I would like to make the radical proposal that we stop gating on CI
 failures. We will continue to run them on every change, but only after
 the change has been successfully merged.
 
 Benefits:
 * Without rechecks, the gate will use 8 times fewer resources.
 * Log analysis is still available to indicate the emergence of races.
 * Fixes can be merged quicker.
 * Vastly less developer time spent monitoring gate failures.
 
 Costs:
 * A rare class of merge bug will make it into master.
 
 Note that the benefits above will also offset the cost of resolving this
 rare class of merge bug.
 
 Of course, we still have the problem of finding resources to monitor and
 fix CI failures. An additional benefit of not gating on CI will be that
 we can no longer pretend that picking developers for project-affecting
 bugs by lottery is likely to achieve results. As a project we need to
 understand the importance of CI failures. We need a proper

Re: [openstack-dev] [Tripleo][Heat] Heat is not able to create swift cloud server

2014-07-21 Thread Clint Byrum

Excerpts from Peeyush Gupta's message of 2014-07-20 23:13:16 -0700:
 Hi all,
 
 I have been trying to set up tripleo using instack with RDO.
 Now, when deploying overcloud, the script is failing consistently
 with CREATE_FAILED error:
 
 + heat stack-create -f overcloud.yaml -P 
 AdminToken=efe958561450ba61d7ef8249d29b0be1ba95dc11 -P 
 AdminPassword=2b919f2ac7790ca1053ac58bc4621ca0967a0cba -P 
 CinderPassword=e7d61883a573a3dffc65a5fb958c94686baac848 -P 
 GlancePassword=cb896d6392e08241d504f3a0a2b489fc6f2612dd -P 
 HeatPassword=7a3138ef58365bb666cb30c8377447b74e75a0ef -P 
 NeutronPassword=4480ec8f2e004be4b06d14e1e228d882e18b3c2c -P 
 NovaPassword=e4a34b6caeeb7dbc497fb1c557a396c422b4d103 -P 
 NeutronPublicInterface=eth0 -P 
 SwiftPassword=ed3761a03959e0d636b8d6fc826103734069f9dc -P 
 SwiftHashSuffix=1a26593813bb7d6b38418db747b4243d4f1b5a56 -P 
 NovaComputeLibvirtType=qemu -P 'GlanceLogFile='\'''\''' -P 
 NeutronDnsmasqOptions=dhcp-option-force=26,1400 overcloud
 +--+++--+
 | id                                   | stack_name | stack_status       | 
 creation_time        |
 +--+++--+
 | 737ada9f-aa45-45b6-a42b-c0a496d2407e | overcloud  | CREATE_IN_PROGRESS | 
 2014-07-21T06:02:22Z |
 +--+++--+
 + tripleo wait_for_stack_ready 220 10 overcloud
 Command output matched 'CREATE_FAILED'. Exiting...
 
 Here is the heat log:
 
 
 2014-07-18 06:51:11.884 30750 WARNING heat.common.keystoneclient [-] 
 stack_user_domain ID not set in heat.conf falling back to using default
 2014-07-18 06:51:12.921 30750 WARNING heat.common.keystoneclient [-] 
 stack_user_domain ID not set in heat.conf falling back to using default
 2014-07-18 06:51:16.058 30750 ERROR heat.engine.resource [-] CREATE : Server 
 SwiftStorage0 [07e42c3d-0f1b-4bb9-b980-ffbb74ac770d] Stack overcloud 
 [0ca028e7-682b-41ef-8af0-b2eb67bee272]
 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource Traceback (most 
 recent call last):
 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource File 
 /usr/lib/python2.7/site-packages/heat/engine/resource.py, line 420, in 
 _do_action
 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource while not 
 check(handle_data):
 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource File 
 /usr/lib/python2.7/site-packages/heat/engine/resources/server.py, line 545, 
 in check_create_complete
 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource return 
 self._check_active(server)
 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource File 
 /usr/lib/python2.7/site-packages/heat/engine/resources/server.py, line 561, 
 in _check_active
 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource raise exc
 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource Error: Creation of 
 server overcloud-SwiftStorage0-qdjqbif6peva failed.
 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource
 2014-07-18 06:51:16.255 30750 WARNING heat.common.keystoneclient [-] 
 stack_user_domain ID not set in heat.conf falling back to using default
 2014-07-18 06:51:16.939 30750 WARNING heat.common.keystoneclient [-] 
 stack_user_domain ID not set in heat.conf falling back to using default
 2014-07-18 06:51:17.368 30750 WARNING heat.common.keystoneclient [-] 
 stack_user_domain ID not set in heat.conf falling back to using default
 2014-07-18 06:51:17.638 30750 WARNING heat.common.keystoneclient [-] 
 stack_user_domain ID not set in heat.conf falling back to using default
 2014-07-18 06:51:18.158 30750 WARNING heat.common.keystoneclient [-] 
 stack_user_domain ID not set in heat.conf falling back to using default
 2014-07-18 06:51:18.613 30750 WARNING heat.common.keystoneclient [-] 
 stack_user_domain ID not set in heat.conf falling back to using default
 2014-07-18 06:51:19.113 30750 WARNING heat.common.keystoneclient [-] 
 stack_user_domain ID not set in heat.conf falling back to using default
 2014-07-18 06:51:19.765 30750 WARNING heat.common.keystoneclient [-] 
 stack_user_domain ID not set in heat.conf falling back to using default
 2014-07-18 06:51:20.247 30750 WARNING heat.engine.service [-] Stack create 
 failed, status FAILED
 
 How can I resolve this?

Heat is just responding to Nova. You need to look at nova and find out
why that server failed. 'nova show overcloud-SwiftStorage0-qdjqbif6peva'
should work.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] os-refresh-config run frequency

2014-07-20 Thread Clint Byrum

Excerpts from Dan Prince's message of 2014-07-20 11:51:27 -0700:
 On Thu, 2014-07-17 at 15:54 +0100, Michael Kerrin wrote:
  On Thursday 26 June 2014 12:20:30 Clint Byrum wrote:
  
   Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-26
  04:13:31 -0700:
  
Hi all,
  

  
I've been working more and more with TripleO recently and whilst
  it does
  
seem to solve a number of problems well, I have found a couple of
  
idiosyncrasies that I feel would be easy to address.
  

  
My primary concern lies in the fact that os-refresh-config does
  not run on
  
every boot/reboot of a system. Surely a reboot *is* a
  configuration
  
change and therefore we should ensure that the box has come up in
  the
  
expected state with the correct config?
  

  
This is easily fixed through the addition of an @reboot entry in
  
/etc/crontab to run o-r-c or (less easily) by re-designing o-r-c
  to run
  
as a service.
  

  
My secondary concern is that through not running os-refresh-config
  on a
  
regular basis by default (i.e. every 15 minutes or something in
  the same
  
style as chef/cfengine/puppet), we leave ourselves exposed to
  someone
  
trying to make a quick fix to a production node and taking that
  node
  
offline the next time it reboots because the config was still left
  as
  
broken owing to a lack of updates to HEAT (I'm thinking a quick
  change
  
to allow root access via SSH during a major incident that is then
  left
  
unchanged for months because no-one updated HEAT).
  

  
There are a number of options to fix this including Modifying
  
os-collect-config to auto-run os-refresh-config on a regular basis
  or
  
setting os-refresh-config to be its own service running via
  upstart or
  
similar that triggers every 15 minutes
  

  
I'm sure there are other solutions to these problems, however I
  know from
  
experience that claiming this is solved through education of
  users or
  
(more severely!) via HR is not a sensible approach to take as by
  the time
  
you realise that your configuration has been changed for the last
  24
  
hours it's often too late!
  
   So I see two problems highlighted above.
  
   
  
   1) We don't re-assert ephemeral state set by o-r-c scripts. You're
  right,
  
   and we've been talking about it for a while. The right thing to do
  is
  
   have os-collect-config re-run its command on boot. I don't think a
  cron
  
   job is the right way to go, we should just have a file in /var/run
  that
  
   is placed there only on a successful run of the command. If that
  file
  
   does not exist, then we run the command.
  
   
  
   I've just opened this bug in response:
  
   
  
   https://bugs.launchpad.net/os-collect-config/+bug/1334804
  
   
  
   
  
  I have been looking into bug #1334804 and I have a review up to
  resolve it. I want to highlight something.
  
   
  
  Currently on a reboot we start all services via upstart (on debian
  anyways) and there have been quite a lot of issues around this -
  missing upstart scripts and timing issues. I don't know the issues on
  fedora.
  
   
  
  So with a fix to #1334804, on a reboot upstart will start all the
  services first (with potentially out-of-date configuration), then
  o-c-c will start o-r-c and will now configure all services and restart
  them or start them if upstart isn't configured properly.
  
   
  
  I would like to turn off all boot scripts for services we configure
  and leave all this to o-r-c. I think this will simplify things and put
  us in control of starting services. I believe that it will also narrow
  the gap between fedora and debian or debian and debian so what works
  on one should work on the other and make it easier for developers.
 
 I'm not sold on this approach. At the very least I think we want to make
 this optional because not all deployments may want to have o-r-c be the
 central service starting agent. So I'm opposed to this being our (only!)
 default...
 

I felt this way too. However, I'm open to it because I am worried that
it is a bit idealistic without much justification for being so.

We know o-r-c will be there, and really must be there. We're already
saying it needs to run to assert ephemeral state, and one thing ephemeral
is things started.

Now, we can, and maybe even should, take a hard line long term that
o-r-c does not do this. That it stores everything in system level
configs that are started in the normal system boot. I _want_ this to
be the case. But thus far, we've failed to assert that and things have
occasionally been very broken on reboot. Short of forcing a reboot in
every CI run, we're going to have trouble detecting this.

So, I think we have two options:

1) O-r-c doing the asserting, with which we can more or less predict
that subsequent boots will work in the same manner as the first boot.

2

Re: [openstack-dev] [heat] health maintenance in autoscaling groups

2014-07-18 Thread Clint Byrum

Excerpts from Mike Spreitzer's message of 2014-07-18 09:12:21 -0700:
 Thomas Herve thomas.he...@enovance.com wrote on 07/17/2014 02:06:13 AM:
 
  There are 4 resources related to neutron load balancing. 
  OS::Neutron::LoadBalancer is probably the least useful and the one 
  you can *not* use, as it's only there for compatibility with 
  AWS::AutoScaling::AutoScalingGroup. OS::Neutron::HealthMonitor does 
  the health checking part, although maybe not in the way you want it.
 
 OK, let's work with these.  My current view is this: supposing the 
 Convergence work delivers monitoring of health according to a member's 
 status in its service and reacts accordingly, the gaps (compared to AWS 
 functionality) are the abilities to (1) get member health from 
 application level pings (e.g., URL polling) and (2) accept member health 
 declarations from an external system, with consistent reaction to health 
 information from all sources.
 

Convergence will not deliver monitoring, though I understand how one
might have that misunderstanding. Convergence will check with the API
that controls a physical resource to determine what Heat should consider
its status to be for the purpose of ongoing orchestration.

 Source (1) is what an OS::Neutron::HealthMonitor specifies, and an 
 OS::Neutron::Pool is the thing that takes such a spec.  So we could 
 complete the (1) part if there were a way to tell a scaling group to poll 
 the member health information developed by an OS::Neutron::Pool.  Does 
 that look like the right approach?
 
 For (2), this would amount to having an API that an external system (with 
 proper authorization) can use to declare member health.  In the grand and 
 glorious future when scaling groups have true APIs rather than being Heat 
 hacks, such a thing would be part of those APIs.  In the immediate future 
 we could simply add this to the Heat API.  Such an operation would take 
 somethings like a stack name or UUID, the name or UUID of a resource that 
 is a scaling group, and the member name or UUID of the Resource whose 
 health is being declared, and health_status=unhealthy.  Does that look 
 about right?
 

Isn't (2) covered already by the cloudwatch API in Heat? I am going to
claim ignorance of it a bit, as I've never used it, but it seems like
the same thing.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [heat] health maintenance in autoscaling groups

2014-07-18 Thread Clint Byrum

Excerpts from Mike Spreitzer's message of 2014-07-18 10:38:32 -0700:
 Clint Byrum cl...@fewbar.com wrote on 07/18/2014 12:56:32 PM:
 
  Excerpts from Mike Spreitzer's message of 2014-07-18 09:12:21 -0700:
   ...
   OK, let's work with these.  My current view is this: supposing the 
   Convergence work delivers monitoring of health according to a member's 
 
   status in its service and reacts accordingly, the gaps (compared to 
 AWS 
   functionality) are the abilities to (1) get member health from 
   application level pings (e.g., URL polling) and (2) accept member 
 health 
   declarations from an external system, with consistent reaction to 
 health 
   information from all sources.
   
  
  Convergence will not deliver monitoring, though I understand how one
  might have that misunderstanding. Convergence will check with the API
  that controls a physical resource to determine what Heat should consider
  its status to be for the purpose of ongoing orchestration.
 
 If I understand correctly, your point is that healing is not automatic. 
 Since a scaling group is a nested stack, the observing part of Convergence 
 will automatically note in the DB when the physical resource behind a 
 scaling group member (in its role as a stack resource) is deleted.  And 
 when convergence engine gets around to acting on that Resource, the 
 backing physical resource will be automatically re-created.  But there is 
 nothing that automatically links the notice of divergence to the 
 converging action.  Have I got that right?
 

Yes you have it right. I just wanted to be clear, that is not
monitoring.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] fair standards for all hypervisor drivers

2014-07-17 Thread Clint Byrum

Excerpts from Chris Friesen's message of 2014-07-16 11:38:44 -0700:
 On 07/16/2014 11:59 AM, Monty Taylor wrote:
  On 07/16/2014 07:27 PM, Vishvananda Ishaya wrote:
 
  This is a really good point. As someone who has to deal with packaging
  issues constantly, it is odd to me that libvirt is one of the few places
  where we depend on upstream packaging. We constantly pull in new python
  dependencies from pypi that are not packaged in ubuntu. If we had to
  wait for packaging before merging the whole system would grind to a halt.
 
  I think we should be updating our libvirt version more frequently vy
  installing from source or our own ppa instead of waiting for the ubuntu
  team to package it.
 
  Shrinking in terror from what I'm about to say ... but I actually agree
  with this, There are SEVERAL logistical issues we'd need to sort, not
  the least of which involve the actual mechanics of us doing that and
  properly gating,etc. But I think that, like the python depends where we
  tell distros what version we _need_ rather than using what version they
  have, libvirt, qemu, ovs and maybe one or two other things are areas in
  which we may want or need to have a strongish opinion.
 
  I'll bring this up in the room tomorrow at the Infra/QA meetup, and will
  probably be flayed alive for it - but maybe I can put forward a
  straw-man proposal on how this might work.
 
 How would this work...would you have them uninstall the distro-provided 
 libvirt/qemu and replace them with newer ones?  (In which case what 
 happens if the version desired by OpenStack has bugs in features that 
 OpenStack doesn't use, but that some other software that the user wants 
 to run does use?)
 
 Or would you have OpenStack versions of them installed in parallel in an 
 alternate location?

Yes. See: docker, lxc, chroot. (Listed in descending hipsterness order).

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] os-refresh-config run frequency

2014-07-17 Thread Clint Byrum

Excerpts from Michael Kerrin's message of 2014-07-17 07:54:26 -0700:
 On Thursday 26 June 2014 12:20:30 Clint Byrum wrote:
  Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-26 04:13:31 
 -0700:
   Hi all,
   
   I've been working more and more with TripleO recently and whilst it does
   seem to solve a number of problems well, I have found a couple of
   idiosyncrasies that I feel would be easy to address.
   
   My primary concern lies in the fact that os-refresh-config does not run on
   every boot/reboot of a system.  Surely a reboot *is* a configuration
   change and therefore we should ensure that the box has come up in the
   expected state with the correct config?
   
   This is easily fixed through the addition of an @reboot entry in
   /etc/crontab to run o-r-c or (less easily) by re-designing o-r-c to run
   as a service.
   
   My secondary concern is that through not running os-refresh-config on a
   regular basis by default (i.e. every 15 minutes or something in the same
   style as chef/cfengine/puppet), we leave ourselves exposed to someone
   trying to make a quick fix to a production node and taking that node
   offline the next time it reboots because the config was still left as
   broken owing to a lack of updates to HEAT (I'm thinking a quick change
   to allow root access via SSH during a major incident that is then left
   unchanged for months because no-one updated HEAT).
   
   There are a number of options to fix this including Modifying
   os-collect-config to auto-run os-refresh-config on a regular basis or
   setting os-refresh-config to be its own service running via upstart or
   similar that triggers every 15 minutes
   
   I'm sure there are other solutions to these problems, however I know from
   experience that claiming this is solved through education of users or
   (more severely!) via HR is not a sensible approach to take as by the time
   you realise that your configuration has been changed for the last 24
   hours it's often too late!
  So I see two problems highlighted above.
  
  1) We don't re-assert ephemeral state set by o-r-c scripts. You're right,
  and we've been talking about it for a while. The right thing to do is
  have os-collect-config re-run its command on boot. I don't think a cron
  job is the right way to go, we should just have a file in /var/run that
  is placed there only on a successful run of the command. If that file
  does not exist, then we run the command.
  
  I've just opened this bug in response:
  
  https://bugs.launchpad.net/os-collect-config/+bug/1334804
  
 
 I have been looking into bug #1334804 and I have a review up to resolve it. I 
 want to highlight something.
 
 Currently on a reboot we start all services via upstart (on debian anyways) 
 and there have been quite a lot of issues around this - missing upstart 
 scripts and timing issues. I don't know the issues on fedora.
 
 So with a fix to #1334804, on a reboot upstart will start all the services 
 first (with potentially out-of-date configuration), then o-c-c will start o-r-
 c and will now configure all services and restart them or start them if 
 upstart isn't configured properly.
 
 I would like to turn off all boot scripts for services we configure and leave 
 all this to o-r-c. I think this will simplify things and put us in control of 
 starting services. I believe that it will also narrow the gap between fedora 
 and debian or debian and debian so what works on one should work on the other 
 and make it easier for developers.

Agreed, and that is actually really simple. I hate to steal your thunder,
but this is the patch:

https://review.openstack.org/107772

 
 Having the ability to service nova-api stop|start|restart is very handy but 
 this will be a manually thing and I intend to leave that there.
 
 What do people think and how best do I push this forward. I feel that this 
 leads into the the re-assert-system-state spec but mainly I think this is a 
 bug and doesn't require a spec.
 
 I will be at the tripleo mid-cycle meetup next and willing to discuss this 
 with anyone interested in this and put together the necessary bits to make 
 this happen.

As I said, it is simple. :) I suggest testing the patch above and adding
anything I missed to it.

Systemd based systems will likely need something different. I'm still
burying my head int he sand and not learning systemd, but perhaps a
follow-up patch from somebody who understands it can make those systems
do the same thing.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [heat]Heat Db Model updates

2014-07-17 Thread Clint Byrum

Excerpts from Manickam, Kanagaraj's message of 2014-07-16 20:48:04 -0700:
 Event
 Why uuid and id both used?

The event uuid is the user-facing ID. However, we need to return events
to the user in insertion order. So we use an auto-increment primary key,
and order by that in 'heat event-list stack_name'.

We don't want to expose that integer to the user though, because knowing
the rate at which these integers increase would reveal a lot about the
goings on inside Heat.

 Resource_action is being used in both event and resource table, so it should 
 be moved to common table

If we're joining to resource already o-k, but it is worth noting that
there is a desire to not use a SQL table for event storage. Maintaining
those events on a large, busy stack will be expensive. The simpler
solution is to just write batches of event files into swift.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [heat] health maintenance in autoscaling groups

2014-07-16 Thread Clint Byrum

Excerpts from Mike Spreitzer's message of 2014-07-16 10:50:42 -0700:
 Clint Byrum cl...@fewbar.com wrote on 07/02/2014 01:54:49 PM:
 
  Excerpts from Qiming Teng's message of 2014-07-02 00:02:14 -0700:
   Just some random thoughts below ...
   
   On Tue, Jul 01, 2014 at 03:47:03PM -0400, Mike Spreitzer wrote:
In AWS, an autoscaling group includes health maintenance 
  functionality --- 
both an ability to detect basic forms of failures and an abilityto 
 react 
properly to failures detected by itself or by a load balancer.  What 
 is 
the thinking about how to get this functionality in OpenStack? Since 
 
   
   We are prototyping a solution to this problem at IBM Research - China
   lab.  The idea is to leverage oslo.messaging and ceilometer events for
   instance (possibly other resource such as port, securitygroup ...)
   failure detection and handling.
   
  
  Hm.. perhaps you should be contributing some reviews here as you may
  have some real insight:
  
  https://review.openstack.org/#/c/100012/
  
  This sounds a lot like what we're working on for continuous convergence.
 
 I noticed that health checking in AWS goes beyond convergence.  In AWS an 
 ELB can be configured with a URL to ping, for application-level health 
 checking.  And an ASG can simply be *told* the health of a member by a 
 user's own external health system.  I think we should have analogous 
 functionality in OpenStack.  Does that make sense to you?  If so, do you 
 have any opinion on the right way to integrate, so that we do not have 
 three completely independent health maintenance systems?

The check url is already a part of Neutron LBaaS IIRC. What may not be
a part is notifications for when all members are reporting down (which
might be something to trigger scale-up).

If we don't have push checks in our auto scaling implementation then we
don't have a proper auto scaling implementation.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Cinder coverage

2014-07-16 Thread Clint Byrum

Excerpts from Dan Prince's message of 2014-07-16 09:50:51 -0700:
 Hi TripleO!
 
 It would appear that we have no coverage in devtest which ensures that
 Cinder consistently works in the overcloud. As such the TripleO Cinder
 elements are often broken (as of today I can't fully use lio or tgt w/
 upstream TripleO elements).
 
 How do people feel about swapping out our single 'nova boot' command to
 boot from a volume. Something like this:
 
  https://review.openstack.org/#/c/107437
 
 There is a bit of tradeoff here in that the conversion will take a bit
 of time (qemu-img has to run). Also our boot code path won't be exactly
 the same as booting from an image.
 
 Long term we want to run Tempest but due to resource constraints we
 can't do that today. Until then this sort of deep systems test (running
 a command that exercises more code) might serve us well and give us the
 Cinder coverage we need.
 
 Thoughts?
 

Tempest is a stretch goal. Given our long test times, until we get them
down, I don't know if we can even flirt with tempest other than the most
basic smoke tests.

So yes, I like the idea of having our one smoke test be as wide as
possible.

Later on we can add Heat coverage by putting said smoke test into a
Heat template.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] Asyncio and oslo.messaging

2014-07-10 Thread Clint Byrum

Excerpts from Victor Stinner's message of 2014-07-10 05:57:38 -0700:
 Le jeudi 10 juillet 2014, 14:48:04 Yuriy Taraday a écrit :
  I'm not suggesting that taskflow is useless and asyncio is better (apple vs
  oranges). I'm saying that using coroutines (asyncio) can improve ways we
  can use taskflow and provide clearer method of developing these flows.
  This was mostly response to the this is impossible with coroutines. I say
  it is possible and it can even be better.
 
 It would be nice to modify taskflow to support trollius coroutines. 
 Coroutines 
 supports asynchronous operations and has a better syntax than callbacks.
 

You mean like this:

https://review.openstack.org/#/c/90881/1/taskflow/engines/action_engine/executor.py

Abandoned, but I think Josh is looking at it. :)

 For Mark's spec, add a new greenio executor to Oslo Messaging: I don't see 
 the 
 direct link to taskflow. taskflow can use Oslo Messaging to call RPC, but I 
 don't see how to use taskflow internally to read a socket (driver), wait for 
 the completion of the callback and then send back the result to the socket 
 (driver).
 

So oslo and the other low level bits are going to need to be modified
to support coroutines. That is definitely something that will make them
more generally useful anyway. I don't think Josh or I meant to get in
the way of that.

However, having this available is a step toward removing eventlet and
doing the painful work to switch to asyncio. Josh's original email was
in essence a reminder that we should consider a layer on top of asyncio
and eventlet alike, so that the large scale code changes only happen
once.

 I see trollius as a low-level tool to handle simple asynchronous operations, 
 whereas taskflow is more high level to chain correctly more complex 
 operations.
 

_yes_

 trollius and taskflow must not be exclusive options, they should cooperate, 
 as 
 we plan to support trollius coroutines in Oslo Messaging.
 

In fact they are emphatically not exclusive. However, considering the
order of adoption should produce a little less chaos for the project.
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] Asyncio and oslo.messaging

2014-07-09 Thread Clint Byrum

Excerpts from Yuriy Taraday's message of 2014-07-09 03:36:00 -0700:
 On Tue, Jul 8, 2014 at 11:31 PM, Joshua Harlow harlo...@yahoo-inc.com
 wrote:
 
  I think clints response was likely better than what I can write here, but
  I'll add-on a few things,
 
 
  How do you write such code using taskflow?
  
@asyncio.coroutine
def foo(self):
result = yield from some_async_op(...)
return do_stuff(result)
 
  The idea (at a very high level) is that users don't write this;
 
  What users do write is a workflow, maybe the following (pseudocode):
 
  # Define the pieces of your workflow.
 
  TaskA():
def execute():
# Do whatever some_async_op did here.
 
def revert():
# If execute had any side-effects undo them here.
 
  TaskFoo():
 ...
 
  # Compose them together
 
  flow = linear_flow.Flow(my-stuff).add(TaskA(my-task-a),
  TaskFoo(my-foo))
 
 
 I wouldn't consider this composition very user-friendly.
 

I find it extremely user friendly when I consider that it gives you
clear lines of delineation between the way it should work and what
to do when it breaks.

  # Submit the workflow to an engine, let the engine do the work to execute
  it (and transfer any state between tasks as needed).
 
  The idea here is that when things like this are declaratively specified
  the only thing that matters is that the engine respects that declaration;
  not whether it uses asyncio, eventlet, pigeons, threads, remote
  workers[1]. It also adds some things that are not (imho) possible with
  co-routines (in part since they are at such a low level) like stopping the
  engine after 'my-task-a' runs and shutting off the software, upgrading it,
  restarting it and then picking back up at 'my-foo'.
 
 
 It's absolutely possible with coroutines and might provide even clearer
 view of what's going on. Like this:
 
 @asyncio.coroutine
 def my_workflow(ctx, ...):
 project = yield from ctx.run_task(create_project())
 # Hey, we don't want to be linear. How about parallel tasks?
 volume, network = yield from asyncio.gather(
 ctx.run_task(create_volume(project)),
 ctx.run_task(create_network(project)),
 )
 # We can put anything here - why not branch a bit?
 if create_one_vm:
 yield from ctx.run_task(create_vm(project, network))
 else:
 # Or even loops - why not?
 for i in range(network.num_ips()):
 yield from ctx.run_task(create_vm(project, network))
 

Sorry but the code above is nothing like the code that Josh shared. When
create_network(project) fails, how do we revert its side effects? If we
want to resume this flow after reboot, how does that work?

I understand that there is a desire to write everything in beautiful
python yields, try's, finally's, and excepts. But the reality is that
python's stack is lost the moment the process segfaults, power goes out
on that PDU, or the admin rolls out a new kernel.

We're not saying asyncio vs. taskflow. I've seen that mistake twice
already in this thread. Josh and I are suggesting that if there is a
movement to think about coroutines, there should also be some time spent
thinking at a high level: how do we resume tasks, revert side effects,
and control flow?

If we embed taskflow deep in the code, we get those things, and we can
treat tasks as coroutines and let taskflow's event loop be asyncio just
the same. If we embed asyncio deep into the code, we don't get any of
the high level functions and we get just as much code churn.

 There's no limit to coroutine usage. The only problem is the library that
 would bind everything together.
 In my example run_task will have to be really smart, keeping track of all
 started tasks, results of all finished ones, skipping all tasks that have
 already been done (and substituting already generated results).
 But all of this is doable. And I find this way of declaring workflows way
 more understandable than whatever would it look like with Flow.add's
 

The way the flow is declared is important, as it leads to more isolated
code. The single place where the flow is declared in Josh's example means
that the flow can be imported, the state deserialized and inspected,
and resumed by any piece of code: an API call, a daemon start up, an
admin command, etc.

I may be wrong, but it appears to me that the context that you built in
your code example is hard, maybe impossible, to resume after a process
restart unless _every_ task is entirely idempotent and thus can just be
repeated over and over.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [TripleO] Proposal to add Jon Paul Sullivan and Alexis Lee to core review team

2014-07-09 Thread Clint Byrum

Hello!

I've been looking at the statistics, and doing a bit of review of the
reviewers, and I think we have an opportunity to expand the core reviewer
team in TripleO. We absolutely need the help, and I think these two
individuals are well positioned to do that.

I would like to draw your attention to this page:

http://russellbryant.net/openstack-stats/tripleo-reviewers-90.txt

Specifically these two lines:

+---+---++
|  Reviewer | Reviews   -2  -1  +1  +2  +A+/- % | Disagreements* |
+---+---++
|  jonpaul-sullivan | 1880  43 145   0   077.1% |   28 ( 14.9%)  |
|   lxsli   | 1860  23 163   0   087.6% |   27 ( 14.5%)  |

Note that they are right at the level we expect, 3 per work day. And
I've looked through their reviews and code contributions: it is clear
that they understand what we're trying to do in TripleO, and how it all
works. I am a little dismayed at the slightly high disagreement rate,
but looking through the disagreements, most of them were jp and lxsli
being more demanding of submitters, so I am less dismayed.

So, I propose that we add jonpaul-sullivan and lxsli to the TripleO core
reviewer team.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] stevedore plugins (and wait conditions)

2014-07-09 Thread Clint Byrum

Excerpts from Randall Burt's message of 2014-07-09 15:33:26 -0700:
 On Jul 9, 2014, at 4:38 PM, Zane Bitter zbit...@redhat.com
  wrote:
  On 08/07/14 17:17, Steven Hardy wrote:
  
  Regarding forcing deployers to make a one-time decision, I have a question
  re cost (money and performance) of the Swift approach vs just hitting the
  Heat API
  
  - If folks use the Swift resource and it stores data associated with the
signal in Swift, does that incurr cost to the user in a public cloud
scenario?
  
  Good question. I believe the way WaitConditions work in AWS is that it sets 
  up a pre-signed URL in a bucket owned by CloudFormation. If we went with 
  that approach we would probably want some sort of quota, I imagine.
 
 Just to clarify, you suggest that the swift-based signal mechanism use 
 containers that Heat owns rather than ones owned by the user?
 

+1, don't hide it.

  The other approach is to set up a new container, owned by the user, every 
  time. In that case, a provider selecting this implementation would need to 
  make it clear to customers if they would be billed for a WaitCondition 
  resource. I'd prefer to avoid this scenario though (regardless of the 
  plug-point).
 
 Why? If we won't let the user choose, then why wouldn't we let the provider 
 make this choice? I don't think its wise of us to make decisions based on 
 what a theoretical operator may theoretically do. If the same theoretical 
 provider were to also charge users to create a trust, would we then be 
 concerned about that implementation as well? What if said provider decides 
 charges the user per resource in a stack regardless of what they are? Having 
 Heat own the container(s) as suggested above doesn't preclude that operator 
 from charging the stack owner for those either.


This is a nice use case for preview. A user should be able to preview a
stack and know what will be consumed. Wait conditions will show a swift
container if preview is worth anything.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] Asyncio and oslo.messaging

2014-07-08 Thread Clint Byrum

Excerpts from Victor Stinner's message of 2014-07-08 05:47:36 -0700:
 Hi Joshua,
 
 You asked a lot of questions. I will try to answer.
 
 Le lundi 7 juillet 2014, 17:41:34 Joshua Harlow a écrit :
  * Why focus on a replacement low level execution model integration instead
  of higher level workflow library or service (taskflow, mistral... other)
  integration?
 
 I don't know tasklow, I cannot answer to this question.
 
 How do you write such code using taskflow?
 
   @asyncio.coroutine
   def foo(self):
   result = yield from some_async_op(...)
   return do_stuff(result)
 

Victor, this is a low level piece of code, which highlights the problem
that taskflow's higher level structure is meant to address. In writing
OpenStack, we want to accomplish tasks based on a number of events. Users,
errors, etc. We don't explicitly want to run coroutines, we want to
attach volumes, spawn vms, and store files.

See this:

http://docs.openstack.org/developer/taskflow/examples.html

The result is consumed in the next task in the flow. Meanwhile we get
a clear definition of work-flow and very clear methods for resumption,
retry, etc. So the expression is not as tightly bound as the code above,
but that is the point, because we want to break things up into tasks
which are clearly defined and then be able to resume each one
individually.

So what I think Josh is getting at, is that we could add asyncio support
into taskflow as an abstraction for tasks that want to be non-blocking,
and then we can focus on refactoring the code around high level work-flow
expression rather than low level asyncio and coroutines.

  * Was the heat (asyncio-like) execution model[1] examined and learned from
  before considering moving to asyncio?
 
 I looked at Heat coroutines, but it has a design very different from asyncio.
 
 In short, asyncio uses an event loop running somewhere in the background, 
 whereas Heat explicitly schedules the execution of some tasks (with 
 TaskRunner), blocks until it gets the result and then stop completly its 
 event loop. It's possible to implement that with asyncio, there is for 
 example a run_until_complete() method stopping the event loop when a future 
 is 
 done. But asyncio event loop is designed to run forever, so various 
 projects 
 can run tasks at the same time, not only a very specific section of the 
 code 
 to run a set of tasks.
 
 asyncio is not only designed to schedule callbacks, it's also designed to 
 manager file descriptors (especially sockets). It can also spawn and manager 
 subprocessses. This is not supported by Heat scheduler.
 
 IMO Heat scheduler is too specific, it cannot be used widely in OpenStack.
 

This is sort of backwards to what Josh was suggesting. Heat can't continue
with the current approach, which is coroutine based, because we need the
the execution stack to not be in RAM on a single engine. We are going
to achieve even more concurrency than we have now through an even higher
level of task abstraction as part of the move to a convergence model. We
will likely use task-flow to express these tasks so that they are more
resumable and generally resilient to failure.

  Along a related question,
  seeing that openstack needs to support py2.x and py3.x will this mean that
  trollius will be required to be used in 3.x (as it is the least common
  denominator, not new syntax like 'yield from' that won't exist in 2.x).
  Does this mean that libraries that will now be required to change will be
  required to use trollius (the pulsar[6] framework seemed to mesh these two
  nicely); is this understood by those authors?
 
 It *is* possible to write code working on asyncio and trollius:
 http://trollius.readthedocs.org/#write-code-working-on-trollius-and-tulip
 
 They are different options for that. They are already projects supporting 
 asyncio and trollius.
 
  Is this the direction we
  want to go down (if we stay focused on ensuring py3.x compatible, then why
  not just jump to py3.x in the first place)?
 
 FYI OpenStack does not support Python 3 right now. I'm working on porting 
 OpenStack to Python 3, we made huge progress, but it's not done yet.
 
 Anyway, the new RHEL 7 release doesn't provide Python 3.3 in the default 
 system, you have to enable the SCL repository (which provides Python 3.3). 
 And 
 Python 2.7 or even 2.6 is still used in production.
 
 I would also prefer to use directly yield from and just drop Python 2 
 support. But dropping Python 2 support is not going to happen before at least 
 2 years.
 

Long term porting is important, however, we have immediate needs for
improvements in resilience and scalability. We cannot hang _any_ of that
on Python 3.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Heat] [Marconi] Heat and concurrent signal processing needs some deep thought

2014-07-07 Thread Clint Byrum

I just noticed this review:

https://review.openstack.org/#/c/90325/

And gave it some real thought. This will likely break any large scale
usage of signals, and I think breaks the user expectations. Nobody expects
to get a failure for a signal. It is one of those things that you fire and
forget. I'm done, deal with it. If we start returning errors, or 409's
or 503's, I don't think users are writing their in-instance initialization
tooling to retry. I think we need to accept it and reliably deliver it.

Does anybody have any good ideas for how to go forward with this? I'd
much rather borrow a solution from some other project than try to invent
something for Heat.

I've added Marconi as I suspect there has already been some thought put
into how a user-facing set of tools would send messages.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] Asyncio and oslo.messaging

2014-07-07 Thread Clint Byrum

Excerpts from Joshua Harlow's message of 2014-07-07 10:41:34 -0700:
 So I've been thinking how to respond to this email, and here goes (shields
 up!),
 
 First things first; thanks mark and victor for the detailed plan and
 making it visible to all. It's very nicely put together and the amount of
 thought put into it is great to see. I always welcome an effort to move
 toward a new structured  explicit programming model (which asyncio
 clearly helps make possible and strongly encourages/requires).
 

I too appreciate the level of detail in the proposal. I think I
understand where it wants to go.

 So now to some questions that I've been thinking about how to
 address/raise/ask (if any of these appear as FUD, they were not meant to
 be):
 
 * Why focus on a replacement low level execution model integration instead
 of higher level workflow library or service (taskflow, mistral... other)
 integration?
 
 Since pretty much all of openstack is focused around workflows that get
 triggered by some API activated by some user/entity having a new execution
 model (asyncio) IMHO doesn't seem to be shifting the needle in the
 direction that improves the scalability, robustness and crash-tolerance of
 those workflows (and the associated projects those workflows are currently
 defined  reside in). I *mostly* understand why we want to move to asyncio
 (py3, getting rid of eventlet, better performance? new awesomeness...) but
 it doesn't feel that important to actually accomplish seeing the big holes
 that openstack has right now with scalability, robustness... Let's imagine
 a different view on this; if all openstack projects declaratively define
 the workflows there APIs trigger (nova is working on task APIs, cinder is
 getting there to...), and in the future when the projects are *only*
 responsible for composing those workflows and handling the API inputs 
 responses then the need for asyncio or other technology can move out from
 the individual projects and into something else (possibly something that
 is being built  used as we speak). With this kind of approach the
 execution model can be an internal implementation detail of the workflow
 'engine/processor' (it will also be responsible for fault-tolerant, robust
 and scalable execution). If this seems reasonable, then why not focus on
 integrating said thing into openstack and move the projects to a model
 that is independent of eventlet, asyncio (or the next greatest thing)
 instead? This seems to push the needle in the right direction and IMHO
 (and hopefully others opinions) has a much bigger potential to improve the
 various projects than just switching to a new underlying execution model.
 
 * Was the heat (asyncio-like) execution model[1] examined and learned from
 before considering moving to asyncio?
 
 I will try not to put words into the heat developers mouths (I can't do it
 justice anyway, hopefully they can chime in here) but I believe that heat
 has a system that is very similar to asyncio and coroutines right now and
 they are actively moving to a different model due to problems in part due
 to using that coroutine model in heat. So if they are moving somewhat away
 from that model (to a more declaratively workflow model that can be
 interrupted and converged upon [2]) why would it be beneficial for other
 projects to move toward the model they are moving away from (instead of
 repeating the issues the heat team had with coroutines, ex, visibility
 into stack/coroutine state, scale limitations, interruptibility...)?
 

I'd like to hear Zane's opinions as he developed the rather light weight
code that we use. It has been quite a learning curve for me but I do
understand how to use the task scheduler we have in Heat now.

Heat's model is similar to asyncio, but is entirely limited in scope. I
think it has stayed relatively manageable because it is really only used
for a few explicit tasks where a high degree of concurrency makes a lot
of sense. We are not using it for I/O concurrency (eventlet still does
that) but rather for request concurrency. So we tell nova to boot 100
servers with 100 coroutines that have 100 other coroutines to block
further execution until those servers are active. We are by no means
using it as a general purpose concurrency programming model.

That said, as somebody working on the specification to move toward a
more taskflow-like (perhaps even entirely taskflow-based) model in Heat,
I think that is the way to go. The fact that we already have an event
loop that doesn't need to be explicit except at the very lowest levels
makes me want to keep that model. And we clearly need help with how to
define workflows, which something like taskflow will do nicely.

   
   * A side-question, how do asyncio and/or trollius support debugging, do
 they support tracing individual co-routines? What about introspecting the
 state a coroutine has associated with it? Eventlet at least has
 http://eventlet.net/doc/modules/debug.html (which is

Re: [openstack-dev] [Heat] Upwards-compatibility for HOT

2014-07-07 Thread Clint Byrum

Excerpts from Zane Bitter's message of 2014-07-07 14:25:50 -0700:
 With the Icehouse release we announced that there would be no further 
 backwards-incompatible changes to HOT without a revision bump. However, 
 I notice that we've already made an upward-incompatible change in Juno:
 
 https://review.openstack.org/#/c/102718/
 
 So a user will be able to create a valid template for a Juno (or later) 
 version of Heat with the version
 
heat_template_version: 2013-05-23
 
 but the same template may break on an Icehouse installation of Heat with 
 the stable HOT parser. IMO this is almost equally as bad as breaking 
 backwards compatibility, since a user moving between clouds will 
 generally have no idea whether they are going forward or backward in 
 version terms.

Sounds like a bug in Juno that we need to fix. I agree, this is a new
template version.

 
 (Note: AWS don't use the version field this way, because there is only 
 one AWS and therefore in theory they don't have this problem. This 
 implies that we might need a more sophisticated versioning system.)
 

A good manual with a this was introduced in version X and this was
changed in version Y would, IMO be enough, to help users not go crazy
and help us know whether something is a bug or not. We can probably
achieve this entirely in the in-code template guide.

 I'd like to propose a policy that we bump the revision of HOT whenever 
 we make a change from the previous stable version, and that we declare 
 the new version stable at the end of each release cycle. Maybe we can 
 post-date it to indicate the policy more clearly. (I'd also like to 
 propose that the Juno version drops cfn-style function support.)

Agreed. I'm also curious if we're going to reject a template with
version 2013-05-23 that includes list_join. If we don't reject it, we
probably need to look at how to show the user warnings about
version/feature skew.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][specs] Please stop doing specs for any changes in projects

2014-07-02 Thread Clint Byrum

Excerpts from Dolph Mathews's message of 2014-07-01 10:02:13 -0700:
 The argument has been made in the past that small features will require
 correspondingly small specs. If there's a counter-argument to this example
 (a small feature requiring a relatively large amount of spec effort), I'd
 love to have links to both the spec and the resulting implementation so we
 can discuss exactly why the spec was an unnecessary additional effort.
 

Indeed. The line to be drawn isn't around the size, IMO, but around
communication.  Nobody has the bandwidth to watch all of the git
logs. Nobody has the bandwidth to poll all of the developers what has
changed in the interfaces available. So the line for me is whether or
not users and operators will need to know something is under way and
may want to comment _before_ a change to an interface is made.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Use MariaDB by default on Fedora

2014-06-30 Thread Clint Byrum

Excerpts from Michael Kerrin's message of 2014-06-30 02:16:07 -0700:
 I am trying to finish off https://review.openstack.org/#/c/90134 - percona 
 xtradb 
 cluster for debian based system.
 
 I have read into this thread that I can error out on Redhat systems when 
 trying to 
 install percona and tell them to use mariadb instead, percona isn't support 
 here. Is 
 this correct?
 

Probably. But if CI for Fedora breaks as a result you'll need a solution
first.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][specs] Please stop doing specs for any changes in projects

2014-06-30 Thread Clint Byrum

Excerpts from Boris Pavlovic's message of 2014-06-30 14:11:08 -0700:
 Hi all,
 
 Specs are interesting idea, that may be really useful, when you need to
 discuss large topics:
 1) work on API
 2) Large refactoring
 3) Large features
 4) Performance, scale, ha, security issues that requires big changes
 
 And I really dislike idea of adding spec for every patch. Especially when
 changes (features) are small, don't affect too much, and they are optional.
 It really kills OpenStack. And will drastically slow down process of
 contribution and reduce amount of contributors.

Who says there needs to be a spec for every patch?

I agree with your items above. Any other change is likely just fixing a
bug.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] os-refresh-config run frequency

2014-06-27 Thread Clint Byrum

Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-27 00:14:49 -0700:
 Hi Clint,

  -Original Message-
  From: Clint Byrum [mailto:cl...@fewbar.com]
  Sent: 26 June 2014 20:21
  To: openstack-dev
  Subject: Re: [openstack-dev] [TripleO] os-refresh-config run frequency

  So I see two problems highlighted above.

  1) We don't re-assert ephemeral state set by o-r-c scripts. You're right, 
  and
  we've been talking about it for a while. The right thing to do is have 
  os-collect-
  config re-run its command on boot. I don't think a cron job is the right 
  way to
  go, we should just have a file in /var/run that is placed there only on a 
  successful
  run of the command. If that file does not exist, then we run the command.

  I've just opened this bug in response:

  https://bugs.launchpad.net/os-collect-config/+bug/1334804

 Cool, I'm more than happy for this to be done elsewhere, I'm glad that people 
 are in agreement with me on the concept and that work has already started on 
 this.

 I'll add some notes to the bug if needed later on today.

  2) We don't re-assert any state on a regular basis.

  So one reason we haven't focused on this, is that we have a stretch goal of
  running with a readonly root partition. It's gotten lost in a lot of the 
  craziness of
  just get it working, but with rebuilds blowing away root now, leading to
  anything not on the state drive (/mnt currently), there's a good chance 
  that this
  will work relatively well.

  Now, since people get root, they can always override the readonly root and
  make changes. golemwe hates thiss!/golem.

  I'm open to ideas, however, os-refresh-config is definitely not the place 
  to solve
  this. It is intended as a non-resident command to be called when it is time 
  to
  assert state. os-collect-config is intended to gather configurations, and 
  expose
  them to a command that it runs, and thus should be the mechanism by which 
  os-
  refresh-config is run.

  I'd like to keep this conversation separate from one in which we discuss 
  more
  mechanisms to make os-refresh-config robust. There are a bunch of things we
  can do, but I think we should focus just on how do we re-assert state?.

 OK, that's fair enough.

  Because we're able to say right now that it is only for running when config
  changes, we can wave our hands and say it's ok that we restart everything on
  every run. As Jan alluded to, that won't work so well if we run it every 20
  minutes.

 Agreed, and chatting with Jan and a couple of others yesterday we came to the 
 conclusion that whatever we do here it will require tweaking of a number of 
 elements to safely restart services.

  So, I wonder if we can introduce a config version into os-collect-config.

  Basically os-collect-config would keep a version along with its cache.
  Whenever a new version is detected, os-collect-config would set a value in 
  the
  environment that informs the command this is a new version of config. From
  that, scripts can do things like this:

  if [ -n $OS_CONFIG_NEW_VERSION ] ; then
service X restart
  else
if !service X status ; then service X start fi

  This would lay the groundwork for future abilities to compare old/new so we
  can take shortcuts by diffing the two config versions. For instance if we 
  look at
  old vs. new and we don't see any of the keys we care about changed, we can
  skip restarting.

 I like this approach - does this require a new spec? If so, I'll start an 
 etherpad to collect thoughts on it before writing it up for approval.

I think this should be a tripleo spec. If you're volunteering write it,
hooray \o/. It will require several work items. Off the top of my head:

- Add version awareness to os-collect-config
- Add version awareness to all os-refresh-config scripts that do
  disruptive things.
- Add periodic command run to os-collect-config

Let's call it 're-assert-system-state'. Sound good?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Use MariaDB by default on Fedora

2014-06-27 Thread Clint Byrum

Excerpts from James Slagle's message of 2014-06-27 12:59:36 -0700:
 Things are a bit confusing right now, especially with what's been
 proposed.  Let me try and clarify (even if just for my own sake).
 
 Currently the choices offered are:
 
 1. mysql percona with the percona tarball

Percona Xtradb Cluster, not mysql percona

 2. mariadb galera with mariadb.org packages
 3. mariadb galera with rdo packages
 
 And, we're proposing to add
 
 4. mysql percona with percona packages: https://review.openstack.org/#/c/90134
 5. mariadb galera with fedora packages 
 https://review.openstack.org/#/c/102815/
 
 4 replaces 1, but only for Ubuntu/Debian, it doesn't work on Fedora/RH
 5 replaces 3 (neither of which work on Ubuntu/Debian, obviously)
 
 Do we still need 1? Fedora/RH + percona tarball.  I personally don't think so.
 
 Do we still need 2? Fedora/RH or Ubuntu/Debian with galera packages
 from maraidb.org. For the Fedora/RH case, I doubt it, people will just
 use 5.
 
 3 will be gone (replaced by 5).
 
 So, yes, I'd like to see 5 as the default for Fedora/RH and 4 as the
 default for Ubuntu/Debian, and both those tested in CI. And get rid of
 (or deprecate) 1-3.


I'm actually more confused now than before I read this. The use of
numbers is just making my head spin.

It can be stated this way I think:

On RPM systems, use MariaDB Galera packages.
If packages are in the distro, use distro packages. If packages are
not in the distro, use RDO packages.

On DEB systems, use Percona XtraDB Cluster packages.
If packages are in the distro, use distro packages. If packages are
not in the distro, use upstream packages.

If anything doesn't match those principles, it is a bug.

 On Thu, Jun 26, 2014 at 5:30 PM, Giulio Fidente gfide...@redhat.com wrote:
  On 06/26/2014 11:11 AM, Jan Provaznik wrote:
 
  On 06/25/2014 06:58 PM, Giulio Fidente wrote:
 
  On 06/16/2014 11:14 PM, Clint Byrum wrote:
 
  Excerpts from Gregory Haynes's message of 2014-06-16 14:04:19 -0700:
 
  Excerpts from Jan Provazník's message of 2014-06-16 20:28:29 +:
 
  Hi,
  MariaDB is now included in Fedora repositories, this makes it
  easier to
  install and more stable option for Fedora installations. Currently
  MariaDB can be used by including mariadb (use mariadb.org pkgs) or
  mariadb-rdo (use redhat RDO pkgs) element when building an image. What
  do you think about using MariaDB as default option for Fedora when
  running devtest scripts?
 
 
  (first, I believe Jan means that MariaDB _Galera_ is now in Fedora)
 
 
  I think so too.
 
  Id like to give this a try. This does start to change us from being a
  deployment of openstck to being a deployment per distro but IMO thats a
  reasonable position.
 
  Id also like to propose that if we decide against doing this then these
  elements should not live in tripleo-image-elements.
 
 
  I'm not so sure I agree. We have lio and tgt because lio is on RHEL but
  everywhere else is still using tgt IIRC.
 
  However, I also am not so sure that it is actually a good idea for
  people
  to ship on MariaDB since it is not in the gate. As it diverges from
  MySQL
  (starting in earnest with 10.x), there will undoubtedly be subtle issues
  that arise. So I'd say having MariaDB get tested along with Fedora will
  actually improve those users' test coverage, which is a good thing.
 
 
  I am favourable to the idea of switching to mariadb for fedora based
  distros.
 
  Currently the default mysql element seems to be switching [1], yet for
  ubuntu/debian only, from the percona provided binary tarball of mysql to
  the percona provided packaged version of mysql.
 
  In theory we could further update it to use percona provided packages of
  mysql on fedora too but I'm not sure there is much interest in using
  that combination where people gets mariadb and galera from the official
  repos.
 
 
  IIRC fedora packages for percona xtradb cluster are not provided (unless
  something has changed recently).
 
 
  I see, so on fedora it will be definitely easier and safer to just use the
  mariadb/galera packages provided in the official repo ... and this further
  reinforces my idea that it is the best option to use that by default for
  fedora
 
 
  --
  Giulio Fidente
  GPG KEY: 08D733BA
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] os-refresh-config run frequency

2014-06-26 Thread Clint Byrum

Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-26 04:13:31 -0700:
 Hi all,
 
 I've been working more and more with TripleO recently and whilst it does seem 
 to solve a number of problems well, I have found a couple of idiosyncrasies 
 that I feel would be easy to address.
 
 My primary concern lies in the fact that os-refresh-config does not run on 
 every boot/reboot of a system.  Surely a reboot *is* a configuration change 
 and therefore we should ensure that the box has come up in the expected state 
 with the correct config?
 
 This is easily fixed through the addition of an @reboot entry in 
 /etc/crontab to run o-r-c or (less easily) by re-designing o-r-c to run as a 
 service.
 
 My secondary concern is that through not running os-refresh-config on a 
 regular basis by default (i.e. every 15 minutes or something in the same 
 style as chef/cfengine/puppet), we leave ourselves exposed to someone trying 
 to make a quick fix to a production node and taking that node offline the 
 next time it reboots because the config was still left as broken owing to a 
 lack of updates to HEAT (I'm thinking a quick change to allow root access 
 via SSH during a major incident that is then left unchanged for months 
 because no-one updated HEAT).
 
 There are a number of options to fix this including Modifying 
 os-collect-config to auto-run os-refresh-config on a regular basis or setting 
 os-refresh-config to be its own service running via upstart or similar that 
 triggers every 15 minutes
 
 I'm sure there are other solutions to these problems, however I know from 
 experience that claiming this is solved through education of users or (more 
 severely!) via HR is not a sensible approach to take as by the time you 
 realise that your configuration has been changed for the last 24 hours it's 
 often too late!
 

So I see two problems highlighted above. 

1) We don't re-assert ephemeral state set by o-r-c scripts. You're right,
and we've been talking about it for a while. The right thing to do is
have os-collect-config re-run its command on boot. I don't think a cron
job is the right way to go, we should just have a file in /var/run that
is placed there only on a successful run of the command. If that file
does not exist, then we run the command.

I've just opened this bug in response:

https://bugs.launchpad.net/os-collect-config/+bug/1334804

2) We don't re-assert any state on a regular basis.

So one reason we haven't focused on this, is that we have a stretch goal
of running with a readonly root partition. It's gotten lost in a lot of
the craziness of just get it working, but with rebuilds blowing away
root now, leading to anything not on the state drive (/mnt currently),
there's a good chance that this will work relatively well.

Now, since people get root, they can always override the readonly root
and make changes. golemwe hates thiss!/golem.

I'm open to ideas, however, os-refresh-config is definitely not the
place to solve this. It is intended as a non-resident command to be
called when it is time to assert state. os-collect-config is intended
to gather configurations, and expose them to a command that it runs,
and thus should be the mechanism by which os-refresh-config is run.

I'd like to keep this conversation separate from one in which we discuss
more mechanisms to make os-refresh-config robust. There are a bunch of
things we can do, but I think we should focus just on how do we
re-assert state?.

Because we're able to say right now that it is only for running when
config changes, we can wave our hands and say it's ok that we restart
everything on every run. As Jan alluded to, that won't work so well if
we run it every 20 minutes.

So, I wonder if we can introduce a config version into
os-collect-config.

Basically os-collect-config would keep a version along with its cache.
Whenever a new version is detected, os-collect-config would set a value
in the environment that informs the command this is a new version of
config. From that, scripts can do things like this:

if [ -n $OS_CONFIG_NEW_VERSION ] ; then
  service X restart
else
  if !service X status ; then service X start
fi

This would lay the groundwork for future abilities to compare old/new so
we can take shortcuts by diffing the two config versions. For instance
if we look at old vs. new and we don't see any of the keys we care about
changed, we can skip restarting.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [hacking] rules for removal

2014-06-24 Thread Clint Byrum

Excerpts from Mark McLoughlin's message of 2014-06-24 12:49:52 -0700:
 On Tue, 2014-06-24 at 09:51 -0700, Clint Byrum wrote:
  Excerpts from Monty Taylor's message of 2014-06-24 06:48:06 -0700:
   On 06/22/2014 02:49 PM, Duncan Thomas wrote:
On 22 June 2014 14:41, Amrith Kumar amr...@tesora.com wrote:
In addition to making changes to the hacking rules, why don't we 
mandate also
that perceived problems in the commit message shall not be an 
acceptable
reason to -1 a change.

-1.

There are some /really/ bad commit messages out there, and some of us
try to use the commit messages to usefully sort through the changes
(i.e. I often -1 in cinder a change only affects one driver and that
isn't clear from the summary).

If the perceived problem is grammatical, I'm a bit more on board with
it not a reason to rev a patch, but core reviewers can +2/A over the
top of a -1 anyway...
   
   100% agree. Spelling and grammar are rude to review on - especially
   since we have (and want) a LOT of non-native English speakers. It's not
   our job to teach people better grammar. Heck - we have people from
   different English backgrounds with differing disagreements on what good
   grammar _IS_
   
  
  We shouldn't quibble over _anything_ grammatical in a commit message. If
  there is a disagreement about it, the comments should be ignored. There
  are definitely a few grammar rules that are loose and those should be
  largely ignored.
  
  However, we should correct grammar when there is a clear solution, as
  those same people who do not speak English as their first language are
  likely to be confused by poor grammar.
  
  We're not doing it to teach grammar. We're doing it to ensure readability.
 
 The importance of clear English varies with context, but commit messages
 are a place where we should try hard to just let it go, particularly
 with those who do not speak English as their first language.
 
 Commit messages stick around forever and it's important that they are
 useful, but they will be read by a small number of people who are going
 to be in a position to spend a small amount of time getting over
 whatever dissonance is caused by a typo or imperfect grammar.


The times that one is reading git messages are often the most stressful
such as when a regression has occurred in production.

Given that, I believe it is entirely worth it to me that the commit
messages on my patches are accurate and understandable. I embrace all
feedback which leads to them being more clear. I will of course stand
back from grammar correcting and not block patches if there are many
who disagree.

 I think specs are pretty similar and don't warrant much additional
 grammar nitpicking. Sure, they're longer pieces of text and slightly
 more people will rely on them for information, but they're not intended
 to be complete documentation.


Disagree. I will only state this one more time as I think everyone knows
how I feel: if we are going to grow beyond the english-as-a-first-language
world we simply cannot assume that those reading specs will be native
speakers. Good spelling and grammar helps us grow. Bad spelling and
grammar holds us back.

 Where grammar is so poor that readers would be easily misled in
 important ways, then sure that should be fixed. But there comes a point
 when we're no longer working to avoid confusion and instead just being
 pendants. Taking issue[1] with this:
 
   whatever scaling mechanism Heat and we end up going with.
 
 because it has a dangling preposition is an example of going way
 beyond the point of productive pedantry IMHO :-)

I actually agree that it would not at all be a reason to block a patch.
However, there is some ambiguity in that sentence that may not be clear
to a native speaker. It is not 100% clear if we are going with Heat,
or with the scaling mechanism. That is the only reason for the dangling
preposition debate. However, there is a debate, and thus I would _never_
block a patch based on this rule. It was feedback.. just as sometimes
there is feedback in commit messages that isn't taken and doesn't lead
to a -1.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [hacking] rules for removal

2014-06-23 Thread Clint Byrum

Excerpts from Mark McLoughlin's message of 2014-06-22 00:39:29 -0700:
 The main point is that this is something worth addressing as a wider
 community rather than in individual reviews with a limited audience. And
 that doing it with a bit of humor might help take the sting out of it.
 

Yes, a private message saying Hey fellow earthling, do we really care
whether httplib is grouped with os or eventlet? is a productive thing
indeed! However, turning that into something that we can all publicly
laugh about requires some real skill, and trust between us all. Given
our digital communication mediums, it is not likely we can do it all
that regularly. I think at best we could gather them all into a couple
of keynote slides with the authors' blessings (oh please, somebody do
this!)

However, if we can all look inward and just do it with self deprecation
I'm all for it.

The main point I am making is that the less grey areas we have, the less
of this we ever have to worry about. It is worth it, to me, to look into
keeping this rule alive so that we never ever have to discuss import
grouping.

(BTW, how many release cycles does one have to deprecate themselves
before they remove themselves?)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [hacking] rules for removal

2014-06-23 Thread Clint Byrum

Excerpts from Christopher Yeoh's message of 2014-06-22 18:46:59 -0700:
 On Mon, Jun 23, 2014 at 4:43 AM, Jay Pipes jaypi...@gmail.com wrote:
 
  On 06/22/2014 09:41 AM, Amrith Kumar wrote:
 
  In addition to making changes to the hacking rules, why don't we mandate
  also
  that perceived problems in the commit message shall not be an acceptable
  reason to -1 a change.
 
  Would this improve the situation?
 
 
  I actually *do* think a very poor commit message for a substantial patch
  deserves a -1. The git commit message is our history for the patch, and it
  is important in its own right. Now, nits like a single misspelled word or
  the commit summary being 60 characters instead of 50 are not what I'm
  talking about, of course.
 
  I'm speaking only about when a commit message blatantly disregards the
  best practices of commit message writing [1] and doesn't offer anything of
  value to the reviewer.
 
 
 +1.
 
 Minor typos and grammatical errors I don't  care about (but will put in
 suggested fixes if the patch needs to be updated anyway). However, commit
 messages are very important for future debugging. One or two line vague
 commit messages can make life a lot harder for others in the future when
 writing a short description is not what I'd consider an excessive burden.
 And there should be no assumption that the person reading the commit
 message will have easy access to the bug database.
 

We've had this discussion already, but just remember that not everybody
reading those commit messages will be a native English speaker. The more
incorrect the grammar and punctuation is, the more confusing it will be
to somebody who is already struggling with those concepts.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Ceilometer] [Heat] Ceilometer aware people, please advise us on processing notifications..

2014-06-23 Thread Clint Byrum

Hello! I would like to turn your attention to this specification draft
that I've written:

https://review.openstack.org/#/c/100012/1/specs/convergence-continuous-observer.rst

Angus has suggested that perhaps Ceilometer is a better place to handle
this. Can you please comment on the review, or can we have a brief
mailing list discussion about how best to filter notifications?

Basically in Heat when a user boots an instance, we would like to act as
soon as it is active, and not have to poll the nova API to know when
that is. Angus has suggested that perhaps we can just tell ceilometer to
hit Heat with a web hook when that happens.

Thanks!

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [hacking] rules for removal

2014-06-21 Thread Clint Byrum

Excerpts from Sean Dague's message of 2014-06-21 05:08:01 -0700:
 On 06/20/2014 09:26 PM, Clint Byrum wrote:
  Excerpts from Sean Dague's message of 2014-06-20 11:07:39 -0700:
 
  H803 - First line of a commit message must *not* end in a period. This
  was mostly a response to an unreasonable core reviewer that was -1ing
  people for not having periods. I think any core reviewer that -1s for
  this either way should be thrown off the island, or at least made fun
  of, a lot. Again, the clarity of a commit message is not made or lost by
  the lack or existence of a period at the end of the first line.
 
  
  Perhaps we can make a bot that writes disparaging remarks on any -1's
  that mention period in the line after the short commit message. :)
  

[For the reader: read the comments below, and then come back to this]

Note that I'm not at all unaware of the irony I created by making the
statement above and then the statements below. I feel like I'm a Fox
News reporter being called out on the daily show actually. ;)

  H305 - Enforcement of libraries fitting correctly into stdlib, 3rdparty,
  our tree. This biggest issue here is it's built in a world where there
  was only 1 viable python version, 2.7. Python's stdlib is actually
  pretty dynamic and grows over time. As we embrace more python 3, and as
  distros start to make python3 be front and center, what does this even
  mean? The current enforcement can't pass on both python2 and python3 at
  the same time in many cases because of that.
 
  
  I think we should find a way to make this work. Like it or not, this
  will garner -1's by people for stylistic reasons and I'd rather it be
  the bots than the humans do it.
  
  The algorithm is something like this pseudo python:
  
  for block in import_blocks:
  if is_this_set_in_a_known_lib_collection(block):
  continue
  if is_this_set_entirely_local(block):
  continue
  if is_this_set_entirely_installed_libs(block):
  continue
  raise AnError(block)
  
  And just make the python2 and python3 stdlibs both be a match. Basically
  I'm saying, let's just be more forgiving but keep the check so we can
  avoid most of the -1 please group libs and stdlibs separately patches.
 
 You can avoid that by yelling at reviewers if that's the *only* feedback
 they are giving.
 

I totally agree we can do that.

 Pedantic reviewers that are reviewing for this kind of thing only should
 be scorned. I realistically like the idea markmc came up with -
 https://twitter.com/markmc_/status/480073387600269312


I also agree it is really fun to think about shaming those annoying
actions. It is also not fun _at all_ to be publicly shamed. In fact I'd
say it is at least an order of magnitude less fun. There is an old saying,
praise in public, punish in private. It is one reason the -1 comments I
give always include praise for whatever is right for new contributors. Not
everyone is a grizzled veteran.

It is far more interesting to me to solve the grouping problem in a
way that works for us long term (python 2 and 3) than it is to develop
a culture that builds any of its core activities on negative emotional
feedback.

That's not to say we can't say hey you're doing it wrong. I mean to say
that direct feedback like that belongs in private IRC messages or email,
not in public everyone can see that reviews. Give people a chance to
save face. Meanwhile, the less we have to have one on one negative
feedback, the easier the job of reviewers is.

The last thing we want to do is have more reasons for people to NOT do
reviews.

 I no longer buy the theory that something like this is saving time. What
 it's actually doing is training a new generation of reviewers that the
 right thing to do it review for nits. That's not actually what we want,
 we want people reviewing for how this could go wrong.
 

I'm not sure how hacking is training reviewers. I feel like hacking is
training developers. Reviewers don't even need to look at it until the
pep8 tox job passes.

 It's really instructive to realize that we've definitely gone beyond
 shared culture with what's in hacking. Look at how much of it is turned
 off in projects. It's pretty high. If this project is going to remain
 useful at all it really needs to prune back to what's actually shared
 culture.


I think having things turned off at the project level is o-k. The more
strict a project's automated style rules, the less they have to quibble
and train new reviewers on the fact that we don't do that here.

However, I don't think rules being turned off is evidence that rules
are unhelpful. It most likely means that those rules didn't exist when
the code base was created and they turned them off because of incubation
or a new set of rules arrived and they didn't have time to land the new
patches. That is a per-project choice and should remain so, but I don't
think that choice means that those rules wouldn't have a long term
positive effect of stopping

Re: [openstack-dev] 答复: [Heat] fine grained quotas

2014-06-20 Thread Clint Byrum

I started to type the same response as Duncan last night, and I do have
the same concern.

The fine grained quotas in nova, for instance, can be used to measure
potential use of the whole system _exactly_. You can give a bit more
to one tenant while you're building out your infrastructure for more
tenants to come on board at the lower quotas and know that the one more
demanding tenant will still be happy.

But how much RAM does it cost to have 1000 stacks creating all at
once? How much CPU does it cost? Those are not really 1:1 correlated,
and so I also question whether one can really use these quotas to do
such fine grained planning.

Excerpts from Duncan Thomas's message of 2014-06-20 05:12:44 -0700:
 There's a maintenance and testing cost to the added complexity, and as
 far as I can tell, no solid use-case. Under what circumstance would a
 cloud provider want different limits for different tenants? What
 concrete problem does it solve?
 
 On 20 June 2014 04:35, Huangtianhua huangtian...@huawei.com wrote:
  Hi, Clint,
 
  Thank you for your comments on my BP and code!
 
  The BP I proposed is all about putting dynamic, admin-configurable 
  limitations
  on stack number per tenant and stack complexity. Therefore, you can 
  consider my BP as
  an extension to your config file-based limitation mechanism. If the admin 
  does not want to
  configure fined-grained, tenant-specific limits, the values in config 
  become the defalut
  values of those limits.
 
  And just like only an Admin can config the limit items in the config file, 
  the limit update
  and delete APIs I proposed are also Admin-only. Therefore, users can not 
  set those values by
  themselves to break the anti-DoS capability you mentioned.
 
  The reason I want to introduce the APIs and the dynamic configurable 
  capability to those
  limits mainly lies in that, since various tenants have various underlying 
  resource quota,
  and even various template/stack complexity requirements, I think a global, 
  static-configured
  limitation mechanism could be refined to echo user requirements better.
 
  Your idea?
 
  By the way, I do think that, the DoS problem is interesting in Heat. Can we 
  have more discussion on that?
 
  Thanks again!
 
  -邮件原件-
  发件人: Clint Byrum [mailto:cl...@fewbar.com]
  发送时间: 2014年6月20日 6:33
  收件人: openstack-dev
  主题: Re: [openstack-dev] [Heat] fine grained quotas
 
  Excerpts from Randall Burt's message of 2014-06-19 15:21:14 -0700:
  On Jun 19, 2014, at 4:17 PM, Clint Byrum cl...@fewbar.com wrote:
 
   I was made aware of the following blueprint today:
  
   http://blueprints.launchpad.net/heat/+spec/add-quota-api-for-heat
   http://review.openstack.org/#/c/96696/14
  
   Before this goes much further.. I want to suggest that this work be
   cancelled, even though the code looks excellent. The reason those
   limits are in the config file is that these are not billable items
   and they have a _tiny_ footprint in comparison to the physical
   resources they will allocate in Nova/Cinder/Neutron/etc.
  
   IMO we don't need fine grained quotas in Heat because everything the
   user will create with these templates will cost them and have its
   own quota system. The limits (which I added) are entirely to prevent
   a DoS of the engine.
 
  What's more, I don't think this is something we should expose via API
  other than to perhaps query what those quota values are. It is
  possible that some provider would want to bill on number of stacks,
  etc (I personally agree with Clint here), it seems that is something
  that could/should be handled external to Heat itself.
 
  Far be it from any of us to dictate a single business model. However, Heat 
  is a tool which encourages consumption of billable resources by making it 
  easier to tie them together. This is why FedEx gives away envelopes and 
  will come pick up your packages for free.
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] CI status update for this week

2014-06-20 Thread Clint Byrum

Excerpts from Charles Crouch's message of 2014-06-20 13:51:49 -0700:
 
 - Original Message -
  Not a great week for TripleO CI. We had 3 different failures related to:
  
   Nova [1]: we were using a deprecated config option
   Heat [2]: missing heat data obtained from the Heat CFN API
   Neutron [3]: a broken GRE overlay network setup
 
 The last two are bugs, but is there anything tripleo can do about avoiding 
 the first one in the future?:
 e.g. reviewing a list of deprecated options and seeing when they will be 
 removed.
 
 do the integrated projects have a protocol for when an option is deprecated 
 and at what point it can be removed?
 e.g. if I make something deprecated in icehouse I can remove it in juno, but 
 if I
 make something deprecated at the start of juno I can't remove it at the end 
 of juno?
 

Was this being logged as deprecated for a while? I think we probably
should aspire to fail CI if something starts printing out deprecation
warnings. We have a few more sprinkled here and there that I see in logs;
those are just ticking time bombs.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [hacking] rules for removal

2014-06-20 Thread Clint Byrum

Excerpts from Sean Dague's message of 2014-06-20 11:07:39 -0700:
 After seeing a bunch of code changes to enforce new hacking rules, I'd
 like to propose dropping some of the rules we have. The overall patch
 series is here -
 https://review.openstack.org/#/q/status:open+project:openstack-dev/hacking+branch:master+topic:be_less_silly,n,z
 
 H402 - 1 line doc strings should end in punctuation. The real statement
 is this should be a summary sentence. A sentence is not just a set of
 words that end in a period. Squirel fast bob. It's something deeper.
 This rule thus isn't really semantically useful, especially when you are
 talking about at 69 character maximum (79 - 4 space indent - 6 quote
 characters).

Yes. I despise this one.

 
 H803 - First line of a commit message must *not* end in a period. This
 was mostly a response to an unreasonable core reviewer that was -1ing
 people for not having periods. I think any core reviewer that -1s for
 this either way should be thrown off the island, or at least made fun
 of, a lot. Again, the clarity of a commit message is not made or lost by
 the lack or existence of a period at the end of the first line.
 

Perhaps we can make a bot that writes disparaging remarks on any -1's
that mention period in the line after the short commit message. :)

 H305 - Enforcement of libraries fitting correctly into stdlib, 3rdparty,
 our tree. This biggest issue here is it's built in a world where there
 was only 1 viable python version, 2.7. Python's stdlib is actually
 pretty dynamic and grows over time. As we embrace more python 3, and as
 distros start to make python3 be front and center, what does this even
 mean? The current enforcement can't pass on both python2 and python3 at
 the same time in many cases because of that.
 

I think we should find a way to make this work. Like it or not, this
will garner -1's by people for stylistic reasons and I'd rather it be
the bots than the humans do it.

The algorithm is something like this pseudo python:

for block in import_blocks:
if is_this_set_in_a_known_lib_collection(block):
continue
if is_this_set_entirely_local(block):
continue
if is_this_set_entirely_installed_libs(block):
continue
raise AnError(block)

And just make the python2 and python3 stdlibs both be a match. Basically
I'm saying, let's just be more forgiving but keep the check so we can
avoid most of the -1 please group libs and stdlibs separately patches.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Heat] fine grained quotas

2014-06-19 Thread Clint Byrum

I was made aware of the following blueprint today:

http://blueprints.launchpad.net/heat/+spec/add-quota-api-for-heat
http://review.openstack.org/#/c/96696/14

Before this goes much further.. I want to suggest that this work be
cancelled, even though the code looks excellent. The reason those limits
are in the config file is that these are not billable items and they
have a _tiny_ footprint in comparison to the physical resources they
will allocate in Nova/Cinder/Neutron/etc.

IMO we don't need fine grained quotas in Heat because everything the
user will create with these templates will cost them and have its own
quota system. The limits (which I added) are entirely to prevent a DoS
of the engine.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] fine grained quotas

2014-06-19 Thread Clint Byrum

Excerpts from Randall Burt's message of 2014-06-19 15:21:14 -0700:
 On Jun 19, 2014, at 4:17 PM, Clint Byrum cl...@fewbar.com wrote:
 
  I was made aware of the following blueprint today:
  
  http://blueprints.launchpad.net/heat/+spec/add-quota-api-for-heat
  http://review.openstack.org/#/c/96696/14
  
  Before this goes much further.. I want to suggest that this work be
  cancelled, even though the code looks excellent. The reason those limits
  are in the config file is that these are not billable items and they
  have a _tiny_ footprint in comparison to the physical resources they
  will allocate in Nova/Cinder/Neutron/etc.
  
  IMO we don't need fine grained quotas in Heat because everything the
  user will create with these templates will cost them and have its own
  quota system. The limits (which I added) are entirely to prevent a DoS
  of the engine.
 
 What's more, I don't think this is something we should expose via
 API other than to perhaps query what those quota values are. It is
 possible that some provider would want to bill on number of stacks, etc
 (I personally agree with Clint here), it seems that is something that
 could/should be handled external to Heat itself.

Far be it from any of us to dictate a single business model. However, Heat
is a tool which encourages consumption of billable resources by making
it easier to tie them together. This is why FedEx gives away envelopes
and will come pick up your packages for free.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed locking

2014-06-17 Thread Clint Byrum

Excerpts from Matthew Booth's message of 2014-06-17 01:36:11 -0700:
 On 17/06/14 00:28, Joshua Harlow wrote:
  So this is a reader/write lock then?
  
  I have seen https://github.com/python-zk/kazoo/pull/141 come up in the
  kazoo (zookeeper python library) but there was a lack of a maintainer for
  that 'recipe', perhaps if we really find this needed we can help get that
  pull request 'sponsored' so that it can be used for this purpose?
  
  
  As far as resiliency, the thing I was thinking about was how correct do u
  want this lock to be?
  
  If u say go with memcached and a locking mechanism using it this will not
  be correct but it might work good enough under normal usage. So that¹s why
  I was wondering about what level of correctness do you want and what do
  you want to happen if a server that is maintaining the lock record dies.
  In memcaches case this will literally be 1 server, even if sharding is
  being used, since a key hashes to one server. So if that one server goes
  down (or a network split happens) then it is possible for two entities to
  believe they own the same lock (and if the network split recovers this
  gets even weirder); so that¹s what I was wondering about when mentioning
  resiliency and how much incorrectness you are willing to tolerate.
 
 From my POV, the most important things are:
 
 * 2 nodes must never believe they hold the same lock
 * A node must eventually get the lock
 

If these are musts, then memcache is a no-go for locking. memcached is
likely to delete anything it is storing in its RAM, at any time. Also
if you have several memcache servers, a momentary network blip could
lead to acquiring the lock erroneously.

The only thing it is useful for is coalescing, where a broken lock just
means wasted resources, erroneous errors, etc. If consistency is needed,
then you need a consistent backend.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Backwards compatibility policy for our projects

2014-06-17 Thread Clint Byrum

Excerpts from Tomas Sedovic's message of 2014-06-17 04:56:24 -0700:
 On 16/06/14 18:51, Clint Byrum wrote:
  Excerpts from Tomas Sedovic's message of 2014-06-16 09:19:40 -0700:
  All,
 
  After having proposed some changes[1][2] to tripleo-heat-templates[3],
  reviewers suggested adding a deprecation period for the merge.py script.
 
  While TripleO is an official OpenStack program, none of the projects
  under its umbrella (including tripleo-heat-templates) have gone through
  incubation and integration nor have they been shipped with Icehouse.
 
  So there is no implicit compatibility guarantee and I have not found
  anything about maintaining backwards compatibility neither on the
  TripleO wiki page[4], tripleo-heat-template's readme[5] or
  tripleo-incubator's readme[6].
 
  The Release Management wiki page[7] suggests that we follow Semantic
  Versioning[8], under which prior to 1.0.0 (t-h-t is ) anything goes.
  According to that wiki, we are using a stronger guarantee where we do
  promise to bump the minor version on incompatible changes -- but this
  again suggests that we do not promise to maintain backwards
  compatibility -- just that we document whenever we break it.
 
  
  I think there are no guarantees, and no promises. I also think that we've
  kept tripleo_heat_merge pretty narrow in surface area since making it
  into a module, so I'm not concerned that it will be incredibly difficult
  to keep those features alive for a while.
  
  According to Robert, there are now downstreams that have shipped things
  (with the implication that they don't expect things to change without a
  deprecation period) so there's clearly a disconnect here.
 
  
  I think it is more of a we will cause them extra work thing. If we
  can make a best effort and deprecate for a few releases (as in, a few
  releases of t-h-t, not OpenStack), they'll likely appreciate that. If
  we can't do it without a lot of effort, we shouldn't bother.
 
 Oh. I did assume we were talking about OpenStack releases, not t-h-t,
 sorry. I have nothing against making a new tht release that deprecates
 the features we're no longer using and dropping them for good in a later
 release.
 
 What do you suggest would be a reasonable waiting period? Say a month or
 so? I think it would be good if we could remove all the deprecated stuff
 before we start porting our templates to HOT.
 
  
  If we do promise backwards compatibility, we should document it
  somewhere and if we don't we should probably make that more visible,
  too, so people know what to expect.
 
  I prefer the latter, because it will make the merge.py cleanup easier
  and every published bit of information I could find suggests that's our
  current stance anyway.
 
  
  This is more about good will than promising. If it is easy enough to
  just keep the code around and have it complain to us if we accidentally
  resurrect a feature, that should be enough. We could even introduce a
  switch to the CLI like --strict that we can run in our gate and that
  won't allow us to keep using deprecated features.
  
  So I'd like to see us deprecate not because we have to, but because we
  can do it with only a small amount of effort.
 
 Right, that's fair enough. I've thought about adding a strict switch,
 too, but I'd like to start removing code from merge.py, not adding more :-).
 

Let's just leave the capability forever. We're not adding things to
merge.py or taking it in any new directions. Keeping the code does not
cost us anything. Some day merge.py won't be used, and then it will be
like we deleted the whole thing.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Backwards compatibility policy for our projects

2014-06-16 Thread Clint Byrum

Excerpts from Tomas Sedovic's message of 2014-06-16 09:19:40 -0700:
 All,
 
 After having proposed some changes[1][2] to tripleo-heat-templates[3],
 reviewers suggested adding a deprecation period for the merge.py script.
 
 While TripleO is an official OpenStack program, none of the projects
 under its umbrella (including tripleo-heat-templates) have gone through
 incubation and integration nor have they been shipped with Icehouse.
 
 So there is no implicit compatibility guarantee and I have not found
 anything about maintaining backwards compatibility neither on the
 TripleO wiki page[4], tripleo-heat-template's readme[5] or
 tripleo-incubator's readme[6].
 
 The Release Management wiki page[7] suggests that we follow Semantic
 Versioning[8], under which prior to 1.0.0 (t-h-t is ) anything goes.
 According to that wiki, we are using a stronger guarantee where we do
 promise to bump the minor version on incompatible changes -- but this
 again suggests that we do not promise to maintain backwards
 compatibility -- just that we document whenever we break it.
 

I think there are no guarantees, and no promises. I also think that we've
kept tripleo_heat_merge pretty narrow in surface area since making it
into a module, so I'm not concerned that it will be incredibly difficult
to keep those features alive for a while.

 According to Robert, there are now downstreams that have shipped things
 (with the implication that they don't expect things to change without a
 deprecation period) so there's clearly a disconnect here.
 

I think it is more of a we will cause them extra work thing. If we
can make a best effort and deprecate for a few releases (as in, a few
releases of t-h-t, not OpenStack), they'll likely appreciate that. If
we can't do it without a lot of effort, we shouldn't bother.

 If we do promise backwards compatibility, we should document it
 somewhere and if we don't we should probably make that more visible,
 too, so people know what to expect.
 
 I prefer the latter, because it will make the merge.py cleanup easier
 and every published bit of information I could find suggests that's our
 current stance anyway.
 

This is more about good will than promising. If it is easy enough to
just keep the code around and have it complain to us if we accidentally
resurrect a feature, that should be enough. We could even introduce a
switch to the CLI like --strict that we can run in our gate and that
won't allow us to keep using deprecated features.

So I'd like to see us deprecate not because we have to, but because we
can do it with only a small amount of effort.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Backwards compatibility policy for our projects

2014-06-16 Thread Clint Byrum

Excerpts from Duncan Thomas's message of 2014-06-16 09:41:49 -0700:
 On 16 June 2014 17:30, Jason Rist jr...@redhat.com wrote:
  I'm going to have to agree with Tomas here.  There doesn't seem to be
  any reasonable expectation of backwards compatibility for the reasons
  he outlined, despite some downstream releases that may be impacted.
 
 
 Backward compatibility is a hard habit to get into, and easy to put
 off. If you're not making any guarantees now, when are you going to
 start making them? How much breakage can users expect? Without wanting
 to look entirely like a troll, should TripleO be dropped as an
 official until it can start making such guarantees? I think every
 other official OpenStack project has a stable API policy of some kind,
 even if they don't entirely match...
 

I actually agree with the sentiment of your statement, which is backward
compatibility matters.

However, there is one thing that is inaccurate in your statements:

TripleO is not a project, it is a program. These tools are products
of that program's mission which is to deploy OpenStack using itself as
much as possible. Where there are holes, we fill them with existing
tools or we write minimal tools such as the tripleo_heat_merge Heat
template pre-processor.

This particular tool is marked for death as soon as Heat grows the
appropriate capabilities to allow that. This tool never wants to
be integrated into the release. So it is a little hard to justify
bending over backwards for BC. But I don't think that is what anybody
is requesting.

We're not looking for this tool to remain super agile and grow, thus
making any existing code and interfaces a burden. So I think it is pretty
easy to just start marking features as deprecated and raising deprecation
warnings when they're used.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] revert hacking to 0.8 series

2014-06-16 Thread Clint Byrum

Excerpts from Sean Dague's message of 2014-06-16 05:15:54 -0700:
 Hacking 0.9 series was released pretty late for Juno. The entire check
 queue was flooded this morning with requirements proposals failing pep8
 because of it (so at 6am EST we were waiting 1.5 hrs for a check node).
 
 The previous soft policy with pep8 updates was that we set a pep8
 version basically release week, and changes stopped being done for style
 after first milestone.
 
 I think in the spirit of that we should revert the hacking requirements
 update back to the 0.8 series for Juno. We're past milestone 1, so
 shouldn't be working on style only fixes at this point.
 
 Proposed review here - https://review.openstack.org/#/c/100231/
 
 I also think in future hacking major releases need to happen within one
 week of release, or not at all for that series.
 

+1. Hacking is supposed to help us avoid redundant nit-picking in
reviews. If it places any large burden on developers, whether by merge
conflicting or backing up CI, it is a failure IMO.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Backwards compatibility policy for our projects

2014-06-16 Thread Clint Byrum

Excerpts from Duncan Thomas's message of 2014-06-16 10:46:12 -0700:
 Hi Clint
 
 This looks like a special pleading here - all OpenStack projects (or
 'program' if you prefer - I'm honestly not seeing a difference) have
 bits that they've written quickly and would rather not have to
 maintain, but in order to allow people to make use of them downstream
 have to do that work. Ask the cinder team about how much I try to stay
 on top of any back-compat issues.
 

I don't just prefer program. It is an entirely different thing:

https://wiki.openstack.org/wiki/Programs

https://wiki.openstack.org/wiki/Governance/NewProjects

 If TripleO is not ready to take up that burden, then IMO it shouldn't
 be an official project. If the bits that make it up are too immature
 to actually be maintained with reasonable guarantees that they won't
 just pull the rug out from any consumers, then their use needs to be
 re-thought. Currently, tripleO enjoys huge benefits from its official
 status, but isn't delivering to that standard. No other project has a
 hope of coming in as an official deployment tool while tripleO holds
 that niche. Despite this, tripleO is barely usable, and doesn't seem
 to be maturing towards taking up the responsibilities that other
 projects have had forced upon them. If it isn't ready for that, should
 it go back to incubation and give some other team or technology a fair
 chance to step up to the plate?
 

TripleO _isn't_ an official project. It is a program to make OpenStack
deploy itself. This is the same as the infra program, which has a
mission to support development. We're not calling for Zuul to be
integrated into the release, we are just expecting it to keep supporting
the goals of the infra program and OpenStack in general.

What is the official deployment tool you mention? There isn't one.
The tool we've been debating is something that enables OpenStack to
be deployed using its own component, Heat, but that is sort of like
oslo-incubator.. it is driving a proof of concept for inclusion into an
official project.

Ironic was spun out very early on because it was clear there was a need
for an integrated project to manage baremetal. This is an example where
pieces used for TripleO have been pushed into the integrated release.

However, Heat already exists, and that is where the responsibility lies
to orchestrate applications. We are driving quite a bit into Heat right
now, with a massive refactor of the core to be more resilient to the
types of challenges a datacenter environment will present. The features
we get from the tripleo_heat_merge pre-processor that is in question
will be the next thing to go into Heat. Expecting us to commit resources
to both of those efforts doesn't make much sense. The program is driving
its mission, and the tools will be incubated and integrated when that
makes sense.

Meanwhile, it turns out OpenStack _is not_ currently able to deploy
itself. Users have to bolt things on, whether it is our tools, or
salt/puppet/chef/ansible artifacts, users cannot use just what is in
OpenStack to deploy OpenStack. But we need to be able to test from one
end to the other while we get things landed in OpenStack.. and so, we
use the pre-release model while we get to a releasable thing.

 I don't want to look like I'm specifically beating on tripleO here,
 but it is the first openstack component I've worked with that seems to
 have this little concern for downstream users *and* no apparent plans
 to fix it.
 

Which component specifically are you referring to? Our plan, nay,
our mission, is to fix it by pushing the necessary features into the
relevant projects.

Also, we actually take on a _higher_ burden of backward compatibility with
some of our tools that we do want to release. They're not integrated, and
we intend to keep them working with all releases of OpenStack because we
intend to keep their interfaces stable for as long as those interfaces
are relevant. diskimage-builder, os-apply-config, os-collect-config,
os-refresh-config, are all stable, and don't need to be integrated into
the OpenStack release because they're not even OpenStack specific.

 That's without going into all of the other difficulties myself and
 fellow developers have had trying to get involved with tripleO, which
 I'll go into at some other point.
 

I would be quite interested in any feedback you can give us on how
hard it might be to join the effort. It is a large effort, and I know
new contributors can often get lost in a sea of possibilities if we,
the long time contributors, aren't careful to get them bootstrapped.

 It is possible there are other places with similar problems, but this
 is the first I've run into - I'll call out any others I run into,
 since I think it is important, and discussing it publicly keeps
 everyone honest. If I've got the wrong expectations, I'd at least like
 to have the correction on record.

I do think that there is a misunderstanding that TripleO is some kind
of tool.

Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas

2014-06-16 Thread Clint Byrum

Excerpts from Doug Wiegley's message of 2014-06-10 14:41:29 -0700:
 Of what use is a database that randomly delete rows?  That is, in effect, 
 what you’re allowing.
 
 The secrets are only useful when paired with a service.  And unless I’m 
 mistaken, there’s no undo.  So you’re letting users shoot themselves in the 
 foot, for what reason, exactly?  How do you expect openstack to rely on a 
 data store that is fundamentally random at the whim of users?  Every single 
 service that uses Barbican will now have to hack in a defense mechanism of 
 some kind, because they can’t trust that the secret they rely on will still 
 be there later.  Which defeats the purpose of this mission statement:  
 Barbican is a ReST API designed for the secure storage, provisioning and 
 management of secrets.”
 
 (And I don’t think anyone is suggesting that blind refcounts are the answer.  
 At least, I hope not.)
 
 Anyway, I hear this has already been decided, so, so be it.  Sounds like 
 we’ll hack around it.
 


Doug, nobody is calling Barbican a database. It is a place to store
secrets.

The idea is to loosely couple things, and if you need more assurances,
use something like Heat to manage the relationships.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas

2014-06-16 Thread Clint Byrum

Excerpts from Doug Wiegley's message of 2014-06-16 13:22:26 -0700:
  nobody is calling Barbican a database. It is a place to store
 
 Š did you at least feel a heavy sense of irony as you typed those two
 statements?  ³It¹s not a database, it just stores things!²  :-)
 

Not at all, though I understand that, clipped as so, it may look a bit
ironic.

I was using shorthand of database to mean a general purpose database. I
should have qualified it to avoid any confusion. It is a narrow purpose
storage service with strong access controls. We can call that a database
if you like, but I think it has one very tiny role, and that is to audit
and control access to secrets.

 The real irony here is that in this rather firm stand of keeping the user
 in control of their secrets, you are actually making the user LESS in
 control of their secrets.  Copies of secrets will have to be made, whether
 stored under another tenant, or shadow copied somewhere.  And the user
 will have no way to delete them, or even know that they exist.
 

Why would you need to make copies outside of the in-RAM copy that is
kept while the service runs? You're trying to do too much instead of
operating in a nice loosely coupled fashion.

 The force flag would eliminate the common mistake cases enough that I¹d
 wager lbaas and most others would cease to worry, not duplicate, and just
 reference barbican id¹s and nothing else.  (Not including backends that
 will already make a copy of the secret, but things like servicevm will not
 need to dup it.)  The earlier assertion that we have to deal with the
 missing secrets case even with a force flag is, I think, false, because
 once the common errors have been eliminated, the potential window of
 accidental pain is reduced to those that really ask for it.

The accidental pain thing makes no sense to me. I'm a user and I take
responsibility for my data. If I don't want to have that responsibility,
I will use less privileged users and delegate the higher amount of
privilege to a system that does manage those relationships for me.

Do we have mandatory file locking in Unix? No we don't. Why? Because some
users want the power to remove files _no matter what_. We build in the
expectation that things may disappear no matter what you do to prevent
it. I think your LBaaS should be written with the same assumption. It
will be more resilient and useful to more people if they do not have to
play complicated games to remove a secret.

Anyway, nobody has answered this. What user would indiscriminately delete
their own data and expect that things depending on that data will continue
to work indefinitely?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Use MariaDB by default on Fedora

2014-06-16 Thread Clint Byrum

Excerpts from Gregory Haynes's message of 2014-06-16 14:04:19 -0700:
 Excerpts from Jan Provazník's message of 2014-06-16 20:28:29 +:
  Hi,
  MariaDB is now included in Fedora repositories, this makes it easier to 
  install and more stable option for Fedora installations. Currently 
  MariaDB can be used by including mariadb (use mariadb.org pkgs) or 
  mariadb-rdo (use redhat RDO pkgs) element when building an image. What 
  do you think about using MariaDB as default option for Fedora when 
  running devtest scripts?

(first, I believe Jan means that MariaDB _Galera_ is now in Fedora)

 
 Id like to give this a try. This does start to change us from being a
 deployment of openstck to being a deployment per distro but IMO thats a
 reasonable position.
 
 Id also like to propose that if we decide against doing this then these
 elements should not live in tripleo-image-elements.

I'm not so sure I agree. We have lio and tgt because lio is on RHEL but
everywhere else is still using tgt IIRC.

However, I also am not so sure that it is actually a good idea for people
to ship on MariaDB since it is not in the gate. As it diverges from MySQL
(starting in earnest with 10.x), there will undoubtedly be subtle issues
that arise. So I'd say having MariaDB get tested along with Fedora will
actually improve those users' test coverage, which is a good thing.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas

2014-06-16 Thread Clint Byrum

Excerpts from Carlos Garza's message of 2014-06-16 16:25:10 -0700:
 
 On Jun 16, 2014, at 4:06 PM, Clint Byrum cl...@fewbar.com wrote:
 
  Excerpts from Doug Wiegley's message of 2014-06-16 13:22:26 -0700:
  nobody is calling Barbican a database. It is a place to store
  
  Š did you at least feel a heavy sense of irony as you typed those two
  statements?  ³It¹s not a database, it just stores things!²  :-)
  
  
  Not at all, though I understand that, clipped as so, it may look a bit
  ironic.
  
  I was using shorthand of database to mean a general purpose database. I
  should have qualified it to avoid any confusion. It is a narrow purpose
  storage service with strong access controls. We can call that a database
  if you like, but I think it has one very tiny role, and that is to audit
  and control access to secrets.
  
  The real irony here is that in this rather firm stand of keeping the user
  in control of their secrets, you are actually making the user LESS in
  control of their secrets.  Copies of secrets will have to be made, whether
  stored under another tenant, or shadow copied somewhere.  And the user
  will have no way to delete them, or even know that they exist.
  
  
  Why would you need to make copies outside of the in-RAM copy that is
  kept while the service runs? You're trying to do too much instead of
  operating in a nice loosely coupled fashion.
 
 Because the service may be restarted?
 
  
  The force flag would eliminate the common mistake cases enough that I¹d
  wager lbaas and most others would cease to worry, not duplicate, and just
  reference barbican id¹s and nothing else.  (Not including backends that
  will already make a copy of the secret, but things like servicevm will not
  need to dup it.)  The earlier assertion that we have to deal with the
  missing secrets case even with a force flag is, I think, false, because
  once the common errors have been eliminated, the potential window of
  accidental pain is reduced to those that really ask for it.
  
  The accidental pain thing makes no sense to me. I'm a user and I take
  responsibility for my data. If I don't want to have that responsibility,
  I will use less privileged users and delegate the higher amount of
  privilege to a system that does manage those relationships for me.
  
  Do we have mandatory file locking in Unix? No we don't. Why? Because some
  users want the power to remove files _no matter what_. We build in the
  expectation that things may disappear no matter what you do to prevent
  it. I think your LBaaS should be written with the same assumption. It
  will be more resilient and useful to more people if they do not have to
  play complicated games to remove a secret.
  
  Anyway, nobody has answered this. What user would indiscriminately delete
  their own data and expect that things depending on that data will continue
  to work indefinitely?
 
 Users that are expecting barbican operations to only occur during the 
 initial loadbalancer provisioning. IE users that don't realize their 
 LBconfigs don't natively store the private keys and would be be retrieving 
 keys on the fly during every migration,HA spin up, service restart(from power 
 failure) etc. But I agree we shouldn't do force flag locking as the barbican 
 team has already dismissed the possibility of adding policy enforcement on 
 behalf of other services. Shadow copying(into a lbaas owned account on 
 barbican) was just so that our lbaas backend can access the keys outside of 
 the users control if need be. :|
 

I'm not sure what that means, but perhaps this is a nice use case for
trusts, which would let the user hand LBaaS a revokable secret that
gives LBaaS rights to impersonate the user for a specific keystone role.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [heat] How to avoid property revalidation?

2014-06-15 Thread Clint Byrum

Excerpts from Steven Hardy's message of 2014-06-15 02:40:14 -0700:
 Hi all,
 
 So, I stumbled accross an issue while fixing up some tests, which is that
 AFAICS since Icehouse we continually revalidate every property every time
 they are accessed:
 
 https://github.com/openstack/heat/blob/stable/havana/heat/engine/properties.py#L716
 
 This means that, for example, we revalidate every property every time an
 event is created:
 
 https://github.com/openstack/heat/blob/stable/havana/heat/engine/event.py#L44
 
 And obviously also every time the property is accessed in the code
 implementing whatever action we're handling, and potentially also before
 the action (e.g the explicit validate before create/update).
 
 This repeated revalidation seems like it could get very expensive - for
 example there are several resources (Instance/Server resources in
 particular) which validate against glance via a custom constraint, so we're
 probably doing at least 6 calls to glance validating the image every
 create.  My suspicion is this is one of the reasons for the performance
 regression observed in bug #1324102.
 
 I've been experimenting with some code which implements local caching of
 the validated properties, but according to the tests this introduces some
 problems where the cached value doesn't always match what is expected,
 still investigating why but I guess it's updates where we need to
 re-resolve what is cached during the update.
 
 Does anyone (and in particular Zane and Thomas who I know have deep
 experience in this area) have any ideas on what strategy we might employ to
 reduce this revalidation overhead?

tl;dr: I think we should only validate structure in validate, and leave
runtime validation to preview.

I've been wondering about what we want to achieve with validation
recently. It seems to me that the goal is to assist template authors
in finding obvious issues in structure and content before they cause a
runtime failure. But the error messages are so unhelpful we basically
get this:

http://cdn.memegenerator.net/instances/500x/50964597.jpg

What holds us back from improving that is the complexity of doing
runtime validation.

To me, runtime is more of a 'preview' problem than a validate problem. A
template that validates once should continue to validate on any version
that supports the template format. But a preview will actually want to
measure runtime things and use parameters, and thus is where runtime
concerns belong.

I wonder if we could move validation out of any runtime context, and
remove any attempts to validate runtime things like image names/ids and
such. That would allow us to remove any but pre-action validation calls.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed locking

2014-06-15 Thread Clint Byrum

Excerpts from Matthew Booth's message of 2014-06-13 01:40:30 -0700:
 On 12/06/14 21:38, Joshua Harlow wrote:
  So just a few thoughts before going to far down this path,
  
  Can we make sure we really really understand the use-case where we think
  this is needed. I think it's fine that this use-case exists, but I just
  want to make it very clear to others why its needed and why distributing
  locking is the only *correct* way.
 
 An example use of this would be side-loading an image from another
 node's image cache rather than fetching it from glance, which would have
 very significant performance benefits in the VMware driver, and possibly
 other places. The copier must take a read lock on the image to prevent
 the owner from ageing it during the copy. Holding a read lock would also
 assure the copier that the image it is copying is complete.

Really? Usually in the unix-inspired world we just open a file and it
stays around until we close it.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron] Starting contributing to project

2014-06-15 Thread Clint Byrum

Excerpts from Sławek Kapłoński's message of 2014-06-15 13:10:56 -0700:
 Hello,
 
 I want to start contributing to neutron project. I found bug which I
 want to try fix: https://bugs.launchpad.net/neutron/+bug/1204956 and I
 have question about workflow in such case. Should I clone neutron
 reposiotory from branch master and do changes based on master branch or
 maybe should I do my changes starting from any other branch? What
 should I do next when I will for example do patch for such bug?
 Thanks in advance for any help and explanation about that
 

This should explain everything you need to know:

https://wiki.openstack.org/wiki/Gerrit_Workflow

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Fwd: Fwd: Debian people don't like bash8 as a project name (Bug#748383: ITP: bash8 -- bash script style guide checker)

2014-06-13 Thread Clint Byrum

Excerpts from Thomas Goirand's message of 2014-06-13 03:04:07 -0700:
 On 06/13/2014 06:53 AM, Morgan Fainberg wrote:
  Hi Thomas,
  
  I felt a couple sentences here were reasonable to add (more than “don’t
  care” from before). 
  
  I understand your concerns here, and I totally get what you’re driving
  at, but in the packaging world wouldn’t this make sense to call it
  python-bash8?
 
 Yes, this is what will happen.
 
  Now the binary, I can agree (for reasons outlined)
  should probably not be named ‘bash8’, but the name of the “command”
  could be separate from the packaging / project name.
 
 If upstream chooses /usr/bin/bash8, I'll have to follow. I don't want to
 carry patches which I'd have to maintain.
 
  Beyond a relatively minor change to the resulting “binary” name [sure
  bash-tidy, or whatever we come up with], is there something more that
  really is awful (rather than just silly) about the naming?
 
 Renaming python-bash8 into something else is not possible, because the
 Debian standard is to use, as Debian name, what is used for the import.
 So if we have import xyz, then the package will be python-xyz.
 

For python _libraries_ yes.

But for a utility which happens to import that library, naming the
package after what upstream calls it is a de facto standard.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat]Heat template parameters encryption

2014-06-12 Thread Clint Byrum

I tend to agree with you Keith, securing Heat is Heat's problem.
Securing Nova is nova's problem. And I too would expect that those with
admin access to Heat, would not have admin access to Nova. That is why
we split these things up with API's.

I still prefer that users encrypt secrets on the client side, and store
said secrets in Barbican, passing only a temporary handle into templates
for consumption.

But until we have that, just encrypting hidden parameters would be simple
to do and I wouldn't even mind it being on by default in devstack because
only a small percentage of parameters are hidden. My initial
reluctance to the plan was in encrypting everything, as that makes
verifying things a lot harder. But just encrypting the passwords.. I
think that's a decent plan.

A couple of ideas:

* Provide a utility to change the key (must update the entire database).
* Allow multiple decryption keys (to enable tool above to work
  slowly).

Excerpts from Keith Bray's message of 2014-06-11 22:29:13 -0700:
 
 On 6/11/14 2:43 AM, Steven Hardy sha...@redhat.com wrote:
 
 IMO, when a template author marks a parameter as hidden/secret, it seems
 incorrect to store that information in plain text.
 
 Well I'd still question why we're doing this, as my previous questions
 have
 not been answered:
 - AFAIK nova user-data is not encrypted, so surely you're just shifting
 the
   attack vector from one DB to another in nearly all cases?
 
 Having one system (e.g. Nova) not as secure as it could be isn't a reason
 to not secure another system as best we can. For every attack vector you
 close, you have another one to chase. I'm concerned that the merit of the
 feature is being debated, so let me see if I can address that:
 
 We want to use Heat to launch customer facing stacks.  In a UI, we would
 prompt customers for Template inputs, including for example: Desired
 Wordpress Admin Password, Desired MySQL password, etc. The UI then makes
 an API call to Heat to orchestrate instantiation of the stack.  With Heat
 as it is today, these customer specified credentials (as template
 parameters) would be stored in Heat's database in plain text. As a Heat
 Service Administrator, I do not need nor do I want the customer's
 Wordpress application password to be accessible to me.  The application
 belongs to the customer, not to the infrastructure provider.  Sure, I
 could blow the customer's entire instance away as the service provider.
 But, if I get fired or leave the company, I could no longer can blow away
 their instance... If I leave the company, however, I could have taken a
 copy of the Heat DB with me, or had looked that info up in the Heat DB
 before my exit, and I could then externally attack the customer's
 Wordpress instance.  It makes no sense for us to store user specified
 creds unencrypted unless we are administering the customer's Wordpress
 instance for them, which we are not.  We are administering the
 infrastructure only.  I realize the encryption key could also be stolen,
 but in a production system the encryption key access gets locked down to
 a VERY small set of folks and not all the people that administer Heat
 (that's part of good security practices and makes auditing of a leaked
 encryption key much easier).
   
 - Is there any known way for heat to leak sensitive user data, other
 than
   a cloud operator with admin access to the DB stealing it?  Surely cloud
   operators can trivially access all your resources anyway, including
   instances and the nova DB/API so they have this data anyway.
 
 Encrypting the data in the DB also helps in case if a leak of arbitrary DB
 data does surface in Heat.  We are not aware of any issues with Heat today
 that could leak that data... But, we never know what vulnerabilities will
 be introduced or discovered in the future.
 
 
 At Rackspace, individual cloud operators can not trivially access all
 customer cloud resources.  When operating a large cloud at scale, service
 administrator's operations and capabilities are limited to the systems
 they work on.  While I could impersonate a user via Heat and do lot's of
 bad things across many of their resources, each of the other systems
 (Nova, Databases, Auth, etc.) audit the who is doing what on behave of
 what customer, so I can't do something malicious to a customer's Nova
 instance without the Auth System Administrators ensuring that HR knows I
 would be the person to blame.  Similarly, a Nova system administrator
 can't delete a customer's Heat stack without our Heat administrators
 knowing who is to blame.  We have checks and balances across our systems
 and purposefully segment our possible attack vectors.
 
 Leaving sensitive customer data unencrypted at rest provides many more
 options for that data to get in the wrong hands or be taken outside the
 company.  It is quick and easy to do a MySQL dump if the DB linux system
 is compromised, which has nothing to do with Heat having a vulnerability.
 
 Our ask is to

Re: [openstack-dev] Gate proposal - drop Postgresql configurations in the gate

2014-06-12 Thread Clint Byrum

Excerpts from Matt Riedemann's message of 2014-06-12 08:15:46 -0700:
 
 On 6/12/2014 9:38 AM, Mike Bayer wrote:
 
  On 6/12/14, 8:26 AM, Julien Danjou wrote:
  On Thu, Jun 12 2014, Sean Dague wrote:
 
  That's not cacthable in unit or functional tests?
  Not in an accurate manner, no.
 
  Keeping jobs alive based on the theory that they might one day be useful
  is something we just don't have the liberty to do any more. We've not
  seen an idle node in zuul in 2 days... and we're only at j-1. j-3 will
  be at least +50% of this load.
  Sure, I'm not saying we don't have a problem. I'm just saying it's not a
  good solution to fix that problem IMHO.
 
  Just my 2c without having a full understanding of all of OpenStack's CI
  environment, Postgresql is definitely different enough that MySQL
  strict mode could still allow issues to slip through quite easily, and
  also as far as capacity issues, this might be longer term but I'm hoping
  to get database-related tests to be lots faster if we can move to a
  model that spends much less time creating databases and schemas.
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 Is there some organization out there that uses PostgreSQL in production 
 that could stand up 3rd party CI with it?
 
 I know that at least for the DB2 support we're adding across the 
 projects we're doing 3rd party CI for that. Granted it's a proprietary 
 DB unlike PG but if we're talking about spending resources on testing 
 for something that's not widely used, but there is a niche set of users 
 that rely on it, we could/should move that to 3rd party CI.
 
 I'd much rather see us spend our test resources on getting multi-node 
 testing running in the gate so we can test migrations in Nova.
 

I think this is really the answer. To paraphrase the wise and well
experienced engineer, Beyoncé:

If you like it then you shoulda put CI on it.

The project will succumb to a tragedy of the commons if it bends over
backwards for every deployment variation available. But 3rd parties who
care can always contribute resources and (if they play nice...) votes.

I think there are a tiny number of things that will cause corner case
bugs that could creep in, but as Sean says, we haven't actually seen
these.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas

2014-06-11 Thread Clint Byrum

Excerpts from Adam Harwell's message of 2014-06-10 12:04:41 -0700:
 So, it looks like any sort of validation on Deletes in Barbican is going
 to be a no-go. I'd like to propose a third option, which might be the
 safest route to take for LBaaS while still providing some of the
 convenience of using Barbican as a central certificate store. Here is a
 diagram of the interaction sequence to create a loadbalancer:
 http://bit.ly/1pgAC7G
 
 Summary: Pass the Barbican TLS Container ID to the LBaaS create call,
 get the container from Barbican, and store a shadow-copy of the
 container again in Barbican, this time on the LBaaS service account.
 The secret will now be duplicated (it still exists on the original tenant,
 but also exists on the LBaaS tenant), but we're not talking about a huge
 amount of data here -- just a few kilobytes. With this approach, we retain
 most of the advantages we wanted to get from using Barbican -- we don't
 need to worry about taking secret data through the LBaaS API (we still
 just take a barbicanID from the user), and the user can still use a single
 barbicanID (the original one they created -- the copies are invisible to
 them) when passing their TLS info to other services. We gain the
 additional advantage that it no longer matters what happens to the
 original TLS container -- it could be deleted and it would not impact our
 service.
 
 What do you guys think of that option?

A user hands LBaaS an ID, and then deletes it, and expects that LBaaS
can continue working indefinitely? How is that user's reckless action
LBaaS's problem?

Do one thing: Be a good load balancer. Let users orchestrate your APIs
according to their use case and tools.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [marconi] Reconsidering the unified API model

2014-06-11 Thread Clint Byrum

Excerpts from Janczuk, Tomasz's message of 2014-06-11 10:05:54 -0700:
 On 6/11/14, 2:43 AM, Gordon Sim g...@redhat.com wrote:
 
 On 06/10/2014 09:57 PM, Janczuk, Tomasz wrote:
  Using processes to isolate tenants is certainly possible. There is a
 range
  of isolation mechanisms that can be used, from VM level isolation
  (basically a separate deployment of the broker per-tenant), to process
  level isolation, to sub-process isolation. The higher the density the
  lower the overall marginal cost of adding a tenant to the system, and
  overall cost of operating it. From the cost perspective it is therefore
  desired to provide sub-process multi-tenancy mechanism; at the same time
  this is the most challenging approach.
 
 Where does the increased cost for process level isolation come from? Is
 it simply the extra process required (implying an eventual limit for a
 given VM)?
 
 With sub-process isolation you have to consider the fairness of
 scheduling between operations for different tenants, i.e. potentially
 limiting the processing done on behalf of any given tenant in a given
 period. You would also need to limit the memory used on behalf of any
 given tenant. Wouldn't you end up reinventing much of what the operating
 system does?
 
 
 Process level isolation is more costly than sub-process level isolation
 primarily due to larger memory consumption. For example, CGI has worse
 cost characteristics than FastCGI when scaled out. But the example closer
 to Marconi¹s use case is database systems: I can¹t put my finger on a
 single one that would isolate queries executed by its users using
 processes. 

There's at least one, and it is fairly popular:

http://www.postgresql.org/docs/9.3/static/tutorial-arch.html

The PostgreSQL server can handle multiple concurrent connections from
clients. To achieve this it starts (forks) a new process for each
connection. From that point on, the client and the new server process
communicate without intervention by the original postgres process.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] [Ironic] [Heat] Mid-cycle collaborative meetup

2014-06-10 Thread Clint Byrum

Excerpts from Jaromir Coufal's message of 2014-06-08 16:44:58 -0700:
 Hi,
 
 it looks that there is no more activity on the survey for mid-cycle 
 dates so I went forward to evaluate it.
 
 I created a table view into the etherpad [0] and results are following:
 * option1 (Jul 28 - Aug 1): 27 attendees - collides with Nova/Ironic
 * option2 (Jul 21-25) : 27 attendees
 * option3 (Jul 25-29) : 17 attendees - collides with Nova/Ironic
 * option4 (Aug 11-15) : 13 attendees
 
 I think that we can remove options 3 and 4 from the consideration, 
 because there is lot of people who can't make it. So we have option1 and 
 option2 left. Since Robert and Devananda (PTLs on the projects) can't 
 make option1, which also conflicts with Nova/Ironic meetup, I think it 
 is pretty straightforward.
 
 Based on the survey the winning date for the mid-cycle meetup is 
 option2: July 21th - 25th.
 
 Does anybody have very strong reason why we shouldn't fix the date for 
 option2 and proceed forward with the organization for the meetup?
 

July 21-25 is also the shortest notice. I will not be able to attend
as plans have already been made for the summer and I've already been
travelling quite a bit recently, after all we were all just at the summit
a few weeks ago.

I question the reasoning that being close to FF is a bad thing, and
suggest adding much later dates. But I understand since the chosen dates
are so close, there is a need to make a decision immediately.

Alternatively, I suggest that we split Heat out of this, and aim at
later dates in August.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas

2014-06-10 Thread Clint Byrum

Excerpts from Vijay Venkatachalam's message of 2014-06-09 21:48:43 -0700:
 
 My vote is for option #2 (without the registration). It is simpler to start 
 with this approach. How is delete handled though?
 
 Ex. What is the expectation when user attempts to delete a 
 certificate/container which is referred by an entity like LBaaS listener?
 
 
 1.   Will there be validation in Barbican to prevent this? *OR*
 
 2.   LBaaS listener will have a dangling reference/pointer to certificate?
 

Dangling reference. To avoid that, one should update all references
before deleting.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas

2014-06-09 Thread Clint Byrum

Excerpts from Douglas Mendizabal's message of 2014-06-09 16:08:02 -0700:
 Hi all,
 
 I’m strongly in favor of having immutable TLS-typed containers, and very
 much opposed to storing every revision of changes done to a container.  I
 think that storing versioned containers would add too much complexity to
 Barbican, where immutable containers would work well.
 

Agree completely. Create a new one for new values. Keep the old ones
while they're still active.

 
 I’m still not sold on the idea of registering services with Barbican, even
 though (or maybe especially because) Barbican would not be using this data
 for anything.  I understand the problem that we’re trying to solve by
 associating different resources across projects, but I don’t feel like
 Barbican is the right place to do this.
 

Agreed also, this is simply not Barbican or Neutron's role. Be a REST
API for secrets and networking, not all dancing all singing nannies that
prevent any possibly dangerous behavior with said API's.

 It seems we’re leaning towards option #2, but I would argue that
 orchestration of services is outside the scope of Barbican’s role as a
 secret-store.  I think this is a problem that may need to be solved at a
 higher level.  Maybe an openstack-wide registry of dependend entities
 across services?

An optional openstack-wide registry of depended entities is called
Heat.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas

2014-06-08 Thread Clint Byrum

Excerpts from Eichberger, German's message of 2014-06-06 15:52:54 -0700:
 Jorge + John,
 
 I am most concerned with a user changing his secret in barbican and then the 
 LB trying to update and causing downtime. Some users like to control when the 
 downtime occurs.
 

Couldn't you allow a user to have multiple credentials, the way basically
every key based user access system works (for an example see SSH). Users
changing their credentials would create new ones, reference them in the
appropriate consuming service, and dereference old ones when they are
believed to be out of service.

I see both specified options as overly complicated attempts to work
around what would be solved gracefully with a many-to-one relationship
of keys to users.

 For #1 it was suggested that once the event is delivered it would be up to a 
 user to enable an auto-update flag.
 
 In the case of #2 I am a bit worried about error cases: e.g. uploading the 
 certificates succeeds but registering the loadbalancer(s) fails. So using the 
 barbican system for those warnings might not as fool proof as we are hoping. 
 
 One thing I like about #2 over #1 is that it pushes a lot of the information 
 to Barbican. I think a user would expect when he uploads a new certificate to 
 Barbican that the system warns him right away about load balancers using the 
 old cert. With #1 he might get an e-mails from LBaaS telling him things 
 changed (and we helpfully updated all affected load balancers) -- which isn't 
 as immediate as #2. 
 
 If we implement an auto-update flag for #1 we can have both. User's who 
 like #2 juts hit the flag. Then the discussion changes to what we should 
 implement first and I agree with Jorge + John that this should likely be #2.

IMO you're doing way too much and tending toward tight coupling which
will make the system brittle.

If you want to give the user orchestration, there is Heat. A template will
manage the sort of things that you want, such as automatic replacement
and dereferencing/deleting of older credentials. But not if your service
doesn't support having n+1 active credentials at one time.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Mid cycle meetup

2014-06-07 Thread Clint Byrum

Excerpts from Devananda van der Veen's message of 2014-06-06 12:04:08 -0700:
 I have just announced the Ironic mid-cycle in Beaverton, co-located
 with Nova. That's the main one for Ironic.
 
 However, there are many folks working on both TripleO and Ironic, so I
 wouldn't be surprised if there is a (small?) group at the TripleO
 sprint hacking on Ironic, even if there's nothing official, and even
 if the dates overlap (which I really hope they don't). I'm going to
 try to attend the TripleO sprint if at all possible, as that project
 remains one of the largest users of Ironic that I'm aware of.
 

Yes, we desperately need expertise as our intention is to push forward
on scale testing, and we'll need experts on Ironic's internals to push
optimizations where they're needed. I hope that the Ironic team is
large enough that there can be some at the Nova sprint, and some at the
TripleO sprint if they happen to be concurrent.

I believe we would like for the TripleO sprint to be a bit later in the
cycle though, and I'm seeing dates proposed that would reflect that.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat]Heat template parameters encryption

2014-06-05 Thread Clint Byrum

Excerpts from Steven Hardy's message of 2014-06-05 02:23:40 -0700:
 On Thu, Jun 05, 2014 at 12:17:07AM +, Randall Burt wrote:
  On Jun 4, 2014, at 7:05 PM, Clint Byrum cl...@fewbar.com
   wrote:
  
   Excerpts from Zane Bitter's message of 2014-06-04 16:19:05 -0700:
   On 04/06/14 15:58, Vijendar Komalla wrote:
   Hi Devs,
   I have submitted an WIP review (https://review.openstack.org/#/c/97900/)
   for Heat parameters encryption blueprint
   https://blueprints.launchpad.net/heat/+spec/encrypt-hidden-parameters
   This quick and dirty implementation encrypts all the parameters on on
   Stack 'store' and decrypts on on Stack 'load'.
   Following are couple of improvements I am thinking about;
   1. Instead of encrypting individual parameters, on Stack 'store' encrypt
   all the parameters together as a dictionary  [something like
   crypt.encrypt(json.dumps(param_dictionary))]
   
   Yeah, definitely don't encrypt them individually.
   
   2. Just encrypt parameters that were marked as 'hidden', instead of
   encrypting all parameters
   
   I would like to hear your feedback/suggestions.
   
   Just as a heads-up, we will soon need to store the properties of 
   resources too, at which point parameters become the least of our 
   problems. (In fact, in theory we wouldn't even need to store 
   parameters... and probably by the time convergence is completely 
   implemented, we won't.) Which is to say that there's almost certainly no 
   point in discriminating between hidden and non-hidden parameters.
   
   I'll refrain from commenting on whether the extra security this affords 
   is worth the giant pain it causes in debugging, except to say that IMO 
   there should be a config option to disable the feature (and if it's 
   enabled by default, it should probably be disabled by default in e.g. 
   devstack).
   
   Storing secrets seems like a job for Barbican. That handles the giant
   pain problem because in devstack you can just tell Barbican to have an
   open read policy.
   
   I'd rather see good hooks for Barbican than blanket encryption. I've
   worked with a few things like this and they are despised and worked
   around universally because of the reason Zane has expressed concern about:
   debugging gets ridiculous.
   
   How about this:
   
   parameters:
secrets:
  type: sensitive
   resources:
sensitive_deployment:
  type: OS::Heat::StructuredDeployment
  properties:
config: weverConfig
server: myserver
input_values:
  secret_handle: { get_param: secrets }
   
   The sensitive type would, on the client side, store the value in Barbican,
   never in Heat. Instead it would just pass in a handle which the user
   can then build policy around. Obviously this implies the user would set
   up Barbican's in-instance tools to access the secrets value. But the
   idea is, let Heat worry about being high performing and introspectable,
   and then let Barbican worry about sensitive things.
  
  While certainly ideal, it doesn't solve the current problem since we can't 
  yet guarantee Barbican will even be available in a given release of 
  OpenStack. In the meantime, Heat continues to store sensitive user 
  information unencrypted in its database. Once Barbican is integrated, I'd 
  be all for changing this implementation, but until then, we do need an 
  interim solution. Sure, debugging is a pain and as developers we can 
  certainly grumble, but leaking sensitive user information because we were 
  too fussed to protect data at rest seems worse IMO. Additionally, the 
  solution as described sounds like we're imposing a pretty awkward process 
  on a user to save ourselves from having to decrypt some data in the cases 
  where we can't access the stack information directly from the API or via 
  debugging running Heat code (where the data isn't encrypted anymore).
 
 Under what circumstances are we leaking sensitive user information?
 
 Are you just trying to mitigate a potential attack vector, in the event of
 a bug which leaks data from the DB?  If so, is the user-data encrypted in
 the nova DB?
 
 It seems to me that this will only be a worthwhile exercise if the
 sensitive stuff is encrypted everywhere, and many/most use-cases I can
 think of which require sensitive data involve that data ending up in nova
 user|meta-data?

I tend to agree Steve. The strategy to move things into a system with
strong policy controls like Barbican will mitigate these risks, as even
compromise of the given secret access information may not yield access
to the actual secrets. Basically, let's help facilitate end-to-end
encryption and access control, not just mitigate one attack vector
because the end-to-end one is hard.

Until then, our DBs will have sensitive information, and such is life.

(Of course, this also reminds me that I think we should probably add a
one-time-pad type of access method that we can use to prevent compromise
of our credentials

Re: [openstack-dev] [ironic bare metal installation issue]

2014-06-04 Thread Clint Byrum

Excerpts from 严超's message of 2014-06-03 21:23:25 -0700:
 Hi, All :
 I've deployed my ironic following this link:
 http://ma.ttwagner.com/bare-metal-deploys-with-devstack-and-ironic/ , all
 steps is completed.
 Now one of my node-show provision_state is active. But why is this
 node still in installation state as follow ?
  [image: 内嵌图片 1]


Ironic has done all that it can for the machine. That is the kernel
and ramdisk from the image, and Ironic has no real way to check that
this deploy succeeds. It is on the same level as checking to see if your
VM actually boots after kvm has been spawned.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Spam] [heat] Resource action API

2014-06-04 Thread Clint Byrum

Excerpts from yang zhang's message of 2014-06-04 00:01:41 -0700:
 
 
 
 
 
 
 Hi all,   Now heat only supports suspending/resuming a whole stack, all the 
 resources of the stack will be suspended/resumed,but sometime we just want to 
 suspend or resume only a part of resources in the stack, so I think adding 
 resource-action API for heat isnecessary. this API will be helpful to solve 2 
 problems:- If we want to suspend/resume the resources of the stack, you 
 need to get the phy_id first and then call the API of other services, and 
 this won't update the statusof the resource in heat, which often cause some 
 unexpected problem.- this API could offer a turn on/off function for some 
 native resources, e.g., we can turn on/off the autoscalinggroup or a single 
 policy with the API, this is like the suspend/resume services feature[1] in 
 AWS.   I registered a bp for it, and you are welcome for discussing it.   
  https://blueprints.launchpad.net/heat/+spec/resource-action-api
 [1]  
 http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SuspendResume.html
 Regards!Zhang Yang

Hi zhang. I'd rather we model the intended states of each resource, and
ensure that Heat can assert them. Actions are tricky things to model.

So if you want your nova server to be stopped, how about

resources:
  server1:
type: OS::Nova::Server
properties:
  flavor: superbig
  image: TheBestOS
  state: STOPPED

We don't really need to model actions then, just the API's we have
available.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Spam] Re: [Spam] [heat] Resource action API

2014-06-04 Thread Clint Byrum

Excerpts from yang zhang's message of 2014-06-04 01:14:37 -0700:

  From: cl...@fewbar.com
  To: openstack-dev@lists.openstack.org
  Date: Wed, 4 Jun 2014 00:09:39 -0700
  Subject: Re: [openstack-dev] [Spam]  [heat] Resource action API

  Excerpts from yang zhang's message of 2014-06-04 00:01:41 -0700:

   Hi all,   Now heat only supports suspending/resuming a whole stack, all 
   the resources of the stack will be suspended/resumed,but sometime we just 
   want to suspend or resume only a part of resources in the stack, so I 
   think adding resource-action API for heat isnecessary. this API will be 
   helpful to solve 2 problems:- If we want to suspend/resume the 
   resources of the stack, you need to get the phy_id first and then call 
   the API of other services, and this won't update the statusof the 
   resource in heat, which often cause some unexpected problem.- this 
   API could offer a turn on/off function for some native resources, e.g., 
   we can turn on/off the autoscalinggroup or a single policy with the API, 
   this is like the suspend/resume services feature[1] in AWS.   I 
   registered a bp for it, and you are welcome for discussing it.
   https://blueprints.launchpad.net/heat/+spec/resource-action-api
   [1]  
   http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SuspendResume.html
   Regards!Zhang Yang

  Hi zhang. I'd rather we model the intended states of each resource, and
  ensure that Heat can assert them. Actions are tricky things to model.

  So if you want your nova server to be stopped, how about

  resources:
server1:
  type: OS::Nova::Server
  properties:
flavor: superbig
image: TheBestOS
state: STOPPED  We don't really need to model actions then, just 
  the API's we have
  available.

  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 At first, I want to do it like this, using a resource parameter, but this 
 need to update the stack  in order to suspend the Resource, It means we can't 
 stop another resource when a resource is stopping, but it seems not a big 
 deal, stopping resource usually is soon, compare to API,  using resource 
 parameter is easy to implement as the result of mature code of stack-update, 
 we could finish it in a short period. 
 Does anyone else have good ideas?

It's a bit far off, but the eventual goal of the convergence effort is
to make it so you _can_ update two things concurrently, since updates
will just be recording intended state in the db, not waiting for all of
that to complete.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat]Heat template parameters encryption

2014-06-04 Thread Clint Byrum

Excerpts from Zane Bitter's message of 2014-06-04 16:19:05 -0700:
 On 04/06/14 15:58, Vijendar Komalla wrote:
  Hi Devs,
  I have submitted an WIP review (https://review.openstack.org/#/c/97900/)
  for Heat parameters encryption blueprint
  https://blueprints.launchpad.net/heat/+spec/encrypt-hidden-parameters
  This quick and dirty implementation encrypts all the parameters on on
  Stack 'store' and decrypts on on Stack 'load'.
  Following are couple of improvements I am thinking about;
  1. Instead of encrypting individual parameters, on Stack 'store' encrypt
  all the parameters together as a dictionary  [something like
  crypt.encrypt(json.dumps(param_dictionary))]
 
 Yeah, definitely don't encrypt them individually.
 
  2. Just encrypt parameters that were marked as 'hidden', instead of
  encrypting all parameters
 
  I would like to hear your feedback/suggestions.
 
 Just as a heads-up, we will soon need to store the properties of 
 resources too, at which point parameters become the least of our 
 problems. (In fact, in theory we wouldn't even need to store 
 parameters... and probably by the time convergence is completely 
 implemented, we won't.) Which is to say that there's almost certainly no 
 point in discriminating between hidden and non-hidden parameters.
 
 I'll refrain from commenting on whether the extra security this affords 
 is worth the giant pain it causes in debugging, except to say that IMO 
 there should be a config option to disable the feature (and if it's 
 enabled by default, it should probably be disabled by default in e.g. 
 devstack).

Storing secrets seems like a job for Barbican. That handles the giant
pain problem because in devstack you can just tell Barbican to have an
open read policy.

I'd rather see good hooks for Barbican than blanket encryption. I've
worked with a few things like this and they are despised and worked
around universally because of the reason Zane has expressed concern about:
debugging gets ridiculous.

How about this:

parameters:
  secrets:
type: sensitive
resources:
  sensitive_deployment:
type: OS::Heat::StructuredDeployment
properties:
  config: weverConfig
  server: myserver
  input_values:
secret_handle: { get_param: secrets }

The sensitive type would, on the client side, store the value in Barbican,
never in Heat. Instead it would just pass in a handle which the user
can then build policy around. Obviously this implies the user would set
up Barbican's in-instance tools to access the secrets value. But the
idea is, let Heat worry about being high performing and introspectable,
and then let Barbican worry about sensitive things.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic workflow question]

2014-06-04 Thread Clint Byrum

Excerpts from 严超's message of 2014-06-04 20:34:01 -0700:
 BTW, If I run sudo ./bin/disk-image-create -a amd64 ubuntu deploy-ironic
 -o /tmp/deploy-ramdisk-ubuntu,
 What is the username/password for image deploy-ramdisk-ubuntu ?

There isn't one. You can write an element if you want to include a
backdoor user. Otherwise, just use nova's SSH keypair capability when
you deploy your image onto boxes.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] Short term scaling strategies for large Heat stacks

2014-06-02 Thread Clint Byrum

Excerpts from Steve Baker's message of 2014-06-02 14:37:25 -0700:
 On 31/05/14 07:01, Zane Bitter wrote:
  On 29/05/14 19:52, Clint Byrum wrote:
 
  update-failure-recovery
  ===
 
  This is a blueprint I believe Zane is working on to land in Juno. It
  will
  allow us to retry a failed create or update action. Combined with the
  separate controller/compute node strategy, this may be our best option,
  but it is unclear whether that code will be available soon or not. The
  chunking is definitely required, because with 500 compute nodes, if
  node #250 fails, the remaining 249 nodes that are IN_PROGRESS will be
  cancelled, which makes the impact of a transient failure quite extreme.
  Also without chunking, we'll suffer from some of the performance
  problems we've seen where a single engine process will have to do all of
  the work to bring up a stack.
 
  Pros: * Uses blessed strategy
 
  Cons: * Implementation is not complete
* Still suffers from heavy impact of failure
* Requires chunking to be feasible
 
  I've already started working on this and I'm expecting to have this
  ready some time between the j-1 and j-2 milestones.
 
  I think these two strategies combined could probably get you a long
  way in the short term, though obviously they are not a replacement for
  the convergence strategy in the long term.
 
 
  BTW You missed off another strategy that we have discussed in the
  past, and which I think Steve Baker might(?) be working on: retrying
  failed calls at the client level.
 
 As part of the client-plugins blueprint I'm planning on implementing
 retry policies on API calls. So when currently we call:
 self.nova().servers.create(**kwargs)
 
 This will soon be:
 self.client().servers.create(**kwargs)
 
 And with a retry policy (assuming the default unique-ish server name is
 used):
 self.client_plugin().call_with_retry_policy('cleanup_yr_mess_and_try_again',
 self.client().servers.create, **kwargs)
 
 This should be suitable for handling transient errors on API calls such
 as 500s, response timeouts or token expiration. It shouldn't be used for
 resources which later come up in an ERROR state; convergence or
 update-failure-recovery would be better for that.
 

Steve this is fantastic work and sorely needed. Thank you for working on
it.

Unfortunately, ERROR state machines is the majority of our problem. IPMI
and PXE can be unreliable in some environments, and sometimes machines
are broken in subtle ways. Also, the odd bug in Neutron, Nova, or Ironic
will cause this.

Convergence is not available to us for the short term, and really
update-failure-recovery is some time off too, so we need more solutions
unfortunately.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 1222 matches

Mail list logo