Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report
Excerpts from Steven Hardy's message of 2014-08-27 10:08:36 -0700: On Wed, Aug 27, 2014 at 09:40:31AM -0700, Clint Byrum wrote: Excerpts from Zane Bitter's message of 2014-08-27 08:41:29 -0700: On 27/08/14 11:04, Steven Hardy wrote: On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote: I am little bit skeptical about using Swift for this use case because of its eventual consistency issue. I am not sure Swift cluster is good to be used for this kind of problem. Please note that Swift cluster may give you old data at some point of time. This is probably not a major problem, but it's certainly worth considering. My assumption is that the latency of making the replicas consistent will be small relative to the timeout for things like SoftwareDeployments, so all we need is to ensure that instances eventually get the new data, act on That part is fine, but if they get the new data and then later get the old data back again... that would not be so good. Agreed, and I had not considered that this can happen. There is a not-so-simple answer though: * Heat inserts this as initial metadata: {metadata: {}, update-url: xx, version: 0} * Polling goes to update-url and ignores metadata = 0 * Polling finds new metadata in same format, and continues the loop without talking to Heat However, this makes me rethink why we are having performance problems. MOST of the performance problems have two root causes: * We parse the entire stack to show metadata, because we have to see if there are custom access controls defined in any of the resources used. I actually worked on a patch set to deprecate this part of the resource plugin API because it is impossible to scale this way. * We rely on the engine to respond because of the parsing issue. If however we could just push metadata into the db fully resolved whenever things in the stack change, and cache the response in the API using Last-Modified/Etag headers, I think we'd be less inclined to care so much about swift for polling. However we are still left with the many thousands of keystone users being created vs. thousands of swift tempurls. There's probably a few relatively simple optimisations we can do if the keystone user thing becomes the bottleneck: - Make the user an attribute of the stack and only create one per stack/tree-of-stacks - Make the user an attribute of each server resource (probably more secure but less optimal if your optimal is less keystone users). I don't think the many keystone users thing is actually a problem right now though, or is it? 1000 servers means 1000 keystone users to manage, and all of the tokens and backend churn that implies. It's not a problem, but it is quite a bit heavier than tempurls. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Design Summit reloaded
Excerpts from Thierry Carrez's message of 2014-08-27 05:51:55 -0700: Hi everyone, I've been thinking about what changes we can bring to the Design Summit format to make it more productive. I've heard the feedback from the mid-cycle meetups and would like to apply some of those ideas for Paris, within the constraints we have (already booked space and time). Here is something we could do: Day 1. Cross-project sessions / incubated projects / other projects I think that worked well last time. 3 parallel rooms where we can address top cross-project questions, discuss the results of the various experiments we conducted during juno. Don't hesitate to schedule 2 slots for discussions, so that we have time to come to the bottom of those issues. Incubated projects (and maybe other projects, if space allows) occupy the remaining space on day 1, and could occupy pods on the other days. I like it. The only thing I would add is that it would be quite useful if the use of pods were at least partially enhanced by an unconference style interest list. What I mean is, on day 1 have people suggest topics and vote on suggested topics to discuss at the pods, and from then on the pods can host these topics. This is for the other things that aren't well defined until the summit and don't have their own rooms for days 2 and 3. This is driven by the fact that the pods in Atlanta were almost always busy doing something other than whatever the track that owned them wanted. A few projects pods grew to 30-40 people a few times, eating up all the chairs for the surrounding pods. TripleO often sat at the Heat pod because of this for instance. I don't think they should be fully scheduled. They're also just great places to gather and have a good discussion, but it would be useful to plan for topic flexibility and help coalesce interested parties, rather than have them be silos that get taken over randomly. Especially since there is a temptation to push the other topics to them already. Day 2 and Day 3. Scheduled sessions for various programs That's our traditional scheduled space. We'll have a 33% less slots available. So, rather than trying to cover all the scope, the idea would be to focus those sessions on specific issues which really require face-to-face discussion (which can't be solved on the ML or using spec discussion) *or* require a lot of user feedback. That way, appearing in the general schedule is very helpful. This will require us to be a lot stricter on what we accept there and what we don't -- we won't have space for courtesy sessions anymore, and traditional/unnecessary sessions (like my traditional release schedule one) should just move to the mailing-list. Day 4. Contributors meetups On the last day, we could try to split the space so that we can conduct parallel midcycle-meetup-like contributors gatherings, with no time boundaries and an open agenda. Large projects could get a full day, smaller projects would get half a day (but could continue the discussion in a local bar). Ideally that meetup would end with some alignment on release goals, but the idea is to make the best of that time together to solve the issues you have. Friday would finish with the design summit feedback session, for those who are still around. Love this. Please if we can also fully enclose these meetups and the session rooms in dry erase boards that would be ideal. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Design Summit reloaded
Excerpts from Sean Dague's message of 2014-08-27 06:26:38 -0700: On 08/27/2014 08:51 AM, Thierry Carrez wrote: Hi everyone, I've been thinking about what changes we can bring to the Design Summit format to make it more productive. I've heard the feedback from the mid-cycle meetups and would like to apply some of those ideas for Paris, within the constraints we have (already booked space and time). Here is something we could do: Day 1. Cross-project sessions / incubated projects / other projects I think that worked well last time. 3 parallel rooms where we can address top cross-project questions, discuss the results of the various experiments we conducted during juno. Don't hesitate to schedule 2 slots for discussions, so that we have time to come to the bottom of those issues. Incubated projects (and maybe other projects, if space allows) occupy the remaining space on day 1, and could occupy pods on the other days. Day 2 and Day 3. Scheduled sessions for various programs That's our traditional scheduled space. We'll have a 33% less slots available. So, rather than trying to cover all the scope, the idea would be to focus those sessions on specific issues which really require face-to-face discussion (which can't be solved on the ML or using spec discussion) *or* require a lot of user feedback. That way, appearing in the general schedule is very helpful. This will require us to be a lot stricter on what we accept there and what we don't -- we won't have space for courtesy sessions anymore, and traditional/unnecessary sessions (like my traditional release schedule one) should just move to the mailing-list. Day 4. Contributors meetups On the last day, we could try to split the space so that we can conduct parallel midcycle-meetup-like contributors gatherings, with no time boundaries and an open agenda. Large projects could get a full day, smaller projects would get half a day (but could continue the discussion in a local bar). Ideally that meetup would end with some alignment on release goals, but the idea is to make the best of that time together to solve the issues you have. Friday would finish with the design summit feedback session, for those who are still around. I think this proposal makes the best use of our setup: discuss clear cross-project issues, address key specific topics which need face-to-face time and broader attendance, then try to replicate the success of midcycle meetup-like open unscheduled time to discuss whatever is hot at this point. There are still details to work out (is it possible split the space, should we use the usual design summit CFP website to organize the scheduled time...), but I would first like to have your feedback on this format. Also if you have alternative proposals that would make a better use of our 4 days, let me know. I definitely like this approach. I think it will be really interesting to collect feedback from people about the value they got from days 2 3 vs. Day 4. I also wonder if we should lose a slot from days 1 - 3 and expand the hallway time. Hallway track is always pretty interesting, and honestly at a lot of interesting ideas spring up. The 10 minute transitions often seem to feel like you are rushing between places too quickly some times. Yes please. I'd also be fine with just giving back 5 minutes from each session to facilitate this. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Design Summit reloaded
Excerpts from Anita Kuno's message of 2014-08-27 13:48:25 -0700: On 08/27/2014 02:46 PM, John Griffith wrote: On Wed, Aug 27, 2014 at 9:25 AM, Flavio Percoco fla...@redhat.com wrote: On 08/27/2014 03:26 PM, Sean Dague wrote: On 08/27/2014 08:51 AM, Thierry Carrez wrote: Hi everyone, I've been thinking about what changes we can bring to the Design Summit format to make it more productive. I've heard the feedback from the mid-cycle meetups and would like to apply some of those ideas for Paris, within the constraints we have (already booked space and time). Here is something we could do: Day 1. Cross-project sessions / incubated projects / other projects I think that worked well last time. 3 parallel rooms where we can address top cross-project questions, discuss the results of the various experiments we conducted during juno. Don't hesitate to schedule 2 slots for discussions, so that we have time to come to the bottom of those issues. Incubated projects (and maybe other projects, if space allows) occupy the remaining space on day 1, and could occupy pods on the other days. Day 2 and Day 3. Scheduled sessions for various programs That's our traditional scheduled space. We'll have a 33% less slots available. So, rather than trying to cover all the scope, the idea would be to focus those sessions on specific issues which really require face-to-face discussion (which can't be solved on the ML or using spec discussion) *or* require a lot of user feedback. That way, appearing in the general schedule is very helpful. This will require us to be a lot stricter on what we accept there and what we don't -- we won't have space for courtesy sessions anymore, and traditional/unnecessary sessions (like my traditional release schedule one) should just move to the mailing-list. Day 4. Contributors meetups On the last day, we could try to split the space so that we can conduct parallel midcycle-meetup-like contributors gatherings, with no time boundaries and an open agenda. Large projects could get a full day, smaller projects would get half a day (but could continue the discussion in a local bar). Ideally that meetup would end with some alignment on release goals, but the idea is to make the best of that time together to solve the issues you have. Friday would finish with the design summit feedback session, for those who are still around. I think this proposal makes the best use of our setup: discuss clear cross-project issues, address key specific topics which need face-to-face time and broader attendance, then try to replicate the success of midcycle meetup-like open unscheduled time to discuss whatever is hot at this point. There are still details to work out (is it possible split the space, should we use the usual design summit CFP website to organize the scheduled time...), but I would first like to have your feedback on this format. Also if you have alternative proposals that would make a better use of our 4 days, let me know. I definitely like this approach. I think it will be really interesting to collect feedback from people about the value they got from days 2 3 vs. Day 4. I also wonder if we should lose a slot from days 1 - 3 and expand the hallway time. Hallway track is always pretty interesting, and honestly at a lot of interesting ideas spring up. The 10 minute transitions often seem to feel like you are rushing between places too quickly some times. +1 Last summit, it was basically impossible to do any hallway talking and even meet some folks face-2-face. Other than that, I think the proposal is great and makes sense to me. Flavio -- @flaper87 Flavio Percoco ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Sounds like a great idea to me: +1 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I think this is a great direction. Here is my dilemma and it might just affect me. I attended 3 mid-cycles this release: one of Neutron's (there were 2), QA/Infra and Cinder. The Neutron and Cinder ones were mostly in pursuit of figuring out third party and exchanging information surrounding that (which I feel was successful). The QA/Infra one was, well even though I feel like I have been awol, I still consider this my home. From my perspective and check with Neutron and Cinder to see if they agree, but having at least one person from qa/infra at a mid-cycle helps in small ways. At both I worked with folks to help them make more efficient use of their review time by exploring gerrit queries (there were people who didn't know this magic, nor did they think to ask
Re: [openstack-dev] [Keystone][Marconi][Heat] Creating accounts in Keystone
Excerpts from Adam Young's message of 2014-08-24 20:17:34 -0700: On 08/23/2014 02:01 AM, Clint Byrum wrote: I don't know how Zaqar does its magic, but I'd love to see simple signed URLs rather than users/passwords. This would work for Heat as well. That way we only have to pass in a single predictably formatted string. Excerpts from Zane Bitter's message of 2014-08-22 14:35:38 -0700: Here's an interesting fact about Zaqar (the project formerly known as Marconi) that I hadn't thought about before this week: it's probably the first OpenStack project where a major part of the API primarily faces Nah, this is the direction we are headed. Service users (out of LDAP!) are going to be the norm with a recent feature add to Keytone: http://adam.younglogic.com/2014/08/getting-service-users-out-of-ldap/ This complicates the case by requiring me to get tokens and present them, to cache them, etc. I just want to fetch and/or send messages. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [qa][all][Heat] Packaging of functional tests
Excerpts from Steve Baker's message of 2014-08-26 14:25:46 -0700: On 27/08/14 03:18, David Kranz wrote: On 08/26/2014 10:14 AM, Zane Bitter wrote: Steve Baker has started the process of moving Heat tests out of the Tempest repository and into the Heat repository, and we're looking for some guidance on how they should be packaged in a consistent way. Apparently there are a few projects already packaging functional tests in the package projectname.tests.functional (alongside projectname.tests.unit for the unit tests). That strikes me as odd in our context, because while the unit tests run against the code in the package in which they are embedded, the functional tests run against some entirely different code - whatever OpenStack cloud you give it the auth URL and credentials for. So these tests run from the outside, just like their ancestors in Tempest do. There's all kinds of potential confusion here for users and packagers. None of it is fatal and all of it can be worked around, but if we refrain from doing the thing that makes zero conceptual sense then there will be no problem to work around :) Thanks, Zane. The point of moving functional tests to projects is to be able to run more of them in gate jobs for those projects, and allow tempest to survive being stretched-to-breaking horizontally as we scale to more projects. At the same time, there are benefits to the tempest-as-all-in-one-functional-and-integration-suite that we should try not to lose: 1. Strong integration testing without thinking too hard about the actual dependencies 2. Protection from mistaken or unwise api changes (tempest two-step required) 3. Exportability as a complete blackbox functional test suite that can be used by Rally, RefStack, deployment validation, etc. I think (1) may be the most challenging because tests that are moved out of tempest might be testing some integration that is not being covered by a scenario. We will need to make sure that tempest actually has a complete enough set of tests to validate integration. Even if this is all implemented in a way where tempest can see in-project tests as plugins, there will still not be time to run them all as part of tempest on every commit to every project, so a selection will have to be made. (2) is quite difficult. In Atlanta we talked about taking a copy of functional tests into tempest for stable apis. I don't know how workable that is but don't see any other real options except vigilance in reviews of patches that change functional tests. (3) is what Zane was addressing. The in-project functional tests need to be written in a way that they can, at least in some configuration, run against a real cloud. I suspect from reading the previous thread about In-tree functional test vision that we may actually be dealing with three categories of test here rather than two: * Unit tests that run against the package they are embedded in * Functional tests that run against the package they are embedded in * Integration tests that run against a specified cloud i.e. the tests we are now trying to add to Heat might be qualitatively different from the projectname.tests.functional suites that already exist in a few projects. Perhaps someone from Neutron and/or Swift can confirm? That seems right, except that I would call the third functional tests and not integration tests, because the purpose is not really integration but deep testing of a particular service. Tempest would continue to focus on integration testing. Is there some controversy about that? The second category could include whitebox tests. I don't know about swift, but in neutron the intent was to have these tests be configurable to run against a real cloud, or not. Maru Newby would have details. I'd like to propose that tests of the third type get their own top-level package with a name of the form projectname-integrationtests (second choice: projectname-tempest on the principle that they're essentially plugins for Tempest). How would people feel about standardising that across OpenStack? +1 But I would not call it integrationtests for the reason given above. Because all heat does is interact with other services, what we call functional tests are actually integration tests. Sure, we could mock at the REST API level, but integration coverage is what we need most. This I'd call that faking, not mocking, but both could apply. lets us verify things like: - how heat handles races in other services leading to resources going into ERROR A fake that predictably fails (and thus tests failure handling) will result in better coverage than a real service that only fails when that real service is broken. What's frustrating is that _both_ are needed to catch bugs. - connectivity and interaction between heat and agents on orchestrated servers That is definitely
Re: [openstack-dev] [heat] heat.conf.sample is not up to date
Guessing this is due to the new tox feature which randomizes python's hash seed. Excerpts from Mike Spreitzer's message of 2014-08-24 00:10:42 -0700: What is going on with this? If I do a fresh clone of heat and run `tox -epep8` then I get that complaint. If I then run the recommended command to fix it, and then `tox -epep8` again, I get the same complaint again --- and with different differences exhibited! The email below carries a typescript showing this. What I really need to know is what to do when committing a change that really does require a change in the sample configuration file. Of course I tried running generate_sample.sh, but `tox -epep8` still complains. What is the right procedure to get a correct sample committed? BTW, I am doing the following admittedly risky thing: I run DevStack, and make my changes in /opt/stack/heat/. Thanks, Mike - Forwarded by Mike Spreitzer/Watson/IBM on 08/24/2014 03:03 AM - From: ubuntu@mjs-dstk-821a (Ubuntu) To: Mike Spreitzer/Watson/IBM@IBMUS, Date: 08/24/2014 02:55 AM Subject:fresh flake fail ubuntu@mjs-dstk-821a:~/code$ git clone git://git.openstack.org/openstack/heat.git Cloning into 'heat'... remote: Counting objects: 49690, done. remote: Compressing objects: 100% (19765/19765), done. remote: Total 49690 (delta 36660), reused 39014 (delta 26526) Receiving objects: 100% (49690/49690), 7.92 MiB | 7.29 MiB/s, done. Resolving deltas: 100% (36660/36660), done. Checking connectivity... done. ubuntu@mjs-dstk-821a:~/code$ cd heat ubuntu@mjs-dstk-821a:~/code/heat$ tox -epep8 pep8 create: /home/ubuntu/code/heat/.tox/pep8 pep8 installdeps: -r/home/ubuntu/code/heat/requirements.txt, -r/home/ubuntu/code/heat/test-requirements.txt pep8 develop-inst: /home/ubuntu/code/heat pep8 runtests: PYTHONHASHSEED='0' pep8 runtests: commands[0] | flake8 heat bin/heat-api bin/heat-api-cfn bin/heat-api-cloudwatch bin/heat-engine bin/heat-manage contrib pep8 runtests: commands[1] | /home/ubuntu/code/heat/tools/config/check_uptodate.sh --- /tmp/heat.ep2CBe/heat.conf.sample2014-08-24 06:52:54.16484 + +++ etc/heat/heat.conf.sample2014-08-24 06:48:13.66484 + @@ -164,7 +164,7 @@ #allowed_rpc_exception_modules=oslo.messaging.exceptions,nova.exception,cinder.exception,exceptions # Qpid broker hostname. (string value) -#qpid_hostname=heat +#qpid_hostname=localhost # Qpid broker port. (integer value) #qpid_port=5672 @@ -221,7 +221,7 @@ # The RabbitMQ broker address where a single node is used. # (string value) -#rabbit_host=heat +#rabbit_host=localhost # The RabbitMQ broker port where a single node is used. # (integer value) check_uptodate.sh: heat.conf.sample is not up to date. check_uptodate.sh: Please run /home/ubuntu/code/heat/tools/config/generate_sample.sh. ERROR: InvocationError: '/home/ubuntu/code/heat/tools/config/check_uptodate.sh' pep8 runtests: commands[2] | /home/ubuntu/code/heat/tools/requirements_style_check.sh requirements.txt test-requirements.txt pep8 runtests: commands[3] | bash -c find heat -type f -regex '.*\.pot?' -print0|xargs -0 -n 1 msgfmt --check-format -o /dev/null ___ summary ERROR: pep8: commands failed ubuntu@mjs-dstk-821a:~/code/heat$ ubuntu@mjs-dstk-821a:~/code/heat$ ubuntu@mjs-dstk-821a:~/code/heat$ tools/config/generate_sample.sh ubuntu@mjs-dstk-821a:~/code/heat$ ubuntu@mjs-dstk-821a:~/code/heat$ ubuntu@mjs-dstk-821a:~/code/heat$ ubuntu@mjs-dstk-821a:~/code/heat$ tox -epep8 pep8 develop-inst-noop: /home/ubuntu/code/heat pep8 runtests: PYTHONHASHSEED='0' pep8 runtests: commands[0] | flake8 heat bin/heat-api bin/heat-api-cfn bin/heat-api-cloudwatch bin/heat-engine bin/heat-manage contrib pep8 runtests: commands[1] | /home/ubuntu/code/heat/tools/config/check_uptodate.sh --- /tmp/heat.DqIhK5/heat.conf.sample2014-08-24 06:54:34.62884 + +++ etc/heat/heat.conf.sample2014-08-24 06:53:51.54084 + @@ -159,10 +159,6 @@ # Size of RPC connection pool. (integer value) #rpc_conn_pool_size=30 -# Modules of exceptions that are permitted to be recreated -# upon receiving exception data from an rpc call. (list value) -#allowed_rpc_exception_modules=oslo.messaging.exceptions,nova.exception,cinder.exception,exceptions - # Qpid broker hostname. (string value) #qpid_hostname=heat @@ -301,15 +297,6 @@ # Heartbeat time-to-live. (integer value) #matchmaker_heartbeat_ttl=600 -# Host to locate redis. (string value) -#host=127.0.0.1 - -# Use this port to connect to redis host. (integer value) -#port=6379 - -# Password for Redis server (optional). (string value) -#password=None - # Size of RPC greenthread pool. (integer value) #rpc_thread_pool_size=64 @@ -1229,6 +1216,22 @@ #hash_algorithms=md5 +[matchmaker_redis] + +# +# Options defined in oslo.messaging +# + +#
Re: [openstack-dev] [Keystone][Marconi][Heat] Creating accounts in Keystone
I don't know how Zaqar does its magic, but I'd love to see simple signed URLs rather than users/passwords. This would work for Heat as well. That way we only have to pass in a single predictably formatted string. Excerpts from Zane Bitter's message of 2014-08-22 14:35:38 -0700: Here's an interesting fact about Zaqar (the project formerly known as Marconi) that I hadn't thought about before this week: it's probably the first OpenStack project where a major part of the API primarily faces software running in the cloud rather than facing the user. That is to say, nobody is going to be sending themselves messages on their laptop, from their laptop, via a cloud. At least one end of any given queue is likely to be on a VM in the cloud. That makes me wonder: how does Zaqar authenticate users who are sending and receiving messages (as opposed to setting up the queues in the first place)? Presumably using Keystone, in which case it will run into a problem we've been struggling with in Heat since the very early days. Keystone is generally a front end for an identity store with a 1:1 correspondence between users and actual natural persons. Only the operator can add or remove accounts. This breaks down as soon as you need to authenticate automated services running in the cloud - in particular, you never ever want to store the credentials belonging to an actual natural person in a server in the cloud. Heat has managed to work around this to some extent (for those running the Keystone v3 API) by creating users in a separate domain and more or less doing our own authorisation for them. However, this requires action on the part of the operator, and isn't an option for the end user. I guess Zaqar could do something similar and pass out sets of credentials good only for reading and writing to queues (respectively), but it seems like it would be better if the user could create the keystone accounts and set their own access control rules on the queues. On AWS the very first thing a user does is create a bunch of IAM accounts so that they virtually never have to use the credentials associated with their natural person ever again. There are both user accounts and service accounts - the latter IIUC have automatically-rotating keys. Is there anything like this planned in Keystone? Zaqar is likely only the first (I guess second, if you count Heat) of many services that will need it. I have this irrational fear that somebody is going to tell me that this issue is the reason for the hierarchical-multitenancy idea - fear because that both sounds like it requires intrusive changes in every OpenStack project and fails to solve the problem. I hope somebody will disabuse me of that notion in 3... 2... 1... cheers, Zane. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] [ptls] The Czar system, or how to scale PTLs
Excerpts from Dolph Mathews's message of 2014-08-22 09:45:37 -0700: On Fri, Aug 22, 2014 at 11:32 AM, Zane Bitter zbit...@redhat.com wrote: On 22/08/14 11:19, Thierry Carrez wrote: Zane Bitter wrote: On 22/08/14 08:33, Thierry Carrez wrote: We also still need someone to have the final say in case of deadlocked issues. -1 we really don't. I know we disagree on that :) No problem, you and I work in different programs so we can both get our way ;) People say we don't have that many deadlocks in OpenStack for which the PTL ultimate power is needed, so we could get rid of them. I'd argue that the main reason we don't have that many deadlocks in OpenStack is precisely *because* we have a system to break them if they arise. s/that many/any/ IME and I think that threatening to break a deadlock by fiat is just as bad as actually doing it. And by 'bad' I mean community-poisoningly, trust-destroyingly bad. I guess I've been active in too many dysfunctional free and open source software projects -- I put a very high value on the ability to make a final decision. Not being able to make a decision is about as community-poisoning, and also results in inability to make any significant change or decision. I'm all for getting a final decision, but a 'final' decision that has been imposed from outside rather than internalised by the participants is... rarely final. The expectation of a PTL isn't to stomp around and make final decisions, it's to step in when necessary and help both sides find the best solution. To moderate. Have we had many instances where a project's community divided into two camps and dug in to the point where they actually needed active moderation? And in those cases, was the PTL not already on one side of said argument? I'd prefer specific examples here. I have yet to see a deadlock in Heat that wasn't resolved by better communication. Moderation == bettering communication. I'm under the impression that you and Thierry are agreeing here, just from opposite ends of the same spectrum. I agree as well. PTL is a servant of the community, as any good leader is. If the PTL feels they have to drop the hammer, or if an impass is reached where they are asked to, it is because they have failed to get everyone communicating effectively, not because that's their job. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Glance][Heat] Murano split dsicussion
Excerpts from Angus Salkeld's message of 2014-08-21 20:14:12 -0700: On Fri, Aug 22, 2014 at 12:34 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Georgy Okrokvertskhov's message of 2014-08-20 13:14:28 -0700: During last Atlanta summit there were couple discussions about Application Catalog and Application space projects in OpenStack. These cross-project discussions occurred as a result of Murano incubation request [1] during Icehouse cycle. On the TC meeting devoted to Murano incubation there was an idea about splitting the Murano into parts which might belong to different programs[2]. Today, I would like to initiate a discussion about potential splitting of Murano between two or three programs. *App Catalog API to Catalog Program* Application Catalog part can belong to Catalog program, the package repository will move to artifacts repository part where Murano team already participates. API part of App Catalog will add a thin layer of API methods specific to Murano applications and potentially can be implemented as a plugin to artifacts repository. Also this API layer will expose other 3rd party systems API like CloudFoundry ServiceBroker API which is used by CloudFoundry marketplace feature to provide an integration layer between OpenStack Application packages and 3rd party PaaS tools. I thought this was basically already agreed upon, and that Glance was just growing the ability to store more than just images. *Murano Engine to Orchestration Program* Murano engine orchestrates the Heat template generation. Complementary to a Heat declarative approach, Murano engine uses imperative approach so that it is possible to control the whole flow of the template generation. The engine uses Heat updates to update Heat templates to reflect changes in applications layout. Murano engine has a concept of actions - special flows which can be called at any time after application deployment to change application parameters or update stacks. The engine is actually complementary to Heat engine and adds the following: - orchestrate multiple Heat stacks - DR deployments, HA setups, multiple datacenters deployment These sound like features already requested directly in Heat. - Initiate and controls stack updates on application specific events Sounds like workflow. :) - Error handling and self-healing - being imperative Murano allows you to handle issues and implement additional logic around error handling and self-healing. Also sounds like workflow. I think we need to re-think what a program is before we consider this. I really don't know much about Murano. I have no interest in it at get off my lawn;) And turn down that music! Sorry for the fist shaking, but I wan to highlight that I'm happy to consider it, just not with programs working the way they do now. http://stackalytics.com/?project_type=allmodule=murano-group HP seems to be involved, you should check it out. HP is involved in a lot of OpenStack things. It's a bit hard for me to keep my eyes on everything we do. Good to know that others have been able to take some time and buy into it a bit. +1 for distributing the load. :) all, and nobody has come to me saying If we only had Murano in our orchestration toolbox, we'd solve xxx. But making them part of the I thought you were saying that opsworks was neat the other day? Murano from what I understand was partly inspired from opsworks, yes it's a layer up, but still really the same field. I was saying that OpsWorks is reportedly popular, yes. I did not make the connection at all from OpsWorks to Murano, and nobody had pointed that out to me until now. Orchestration program would imply that we'll do design sessions together, that we'll share the same mission statement, and that we'll have just This is exactly what I hope will happen. Which sessions from last summit would we want to give up to make room for the Murano-only focused sessions? How much time in our IRC meeting should we give to Murano-only concerns? Forgive me for being harsh. We have a cloud to deploy using Heat, and it is taking far too long to get Heat to do that in an acceptable manner already. Adding load to our PTL and increasing the burden on our communication channels doesn't really seem like something that will increase our velocity. I could be dead wrong though, Murano could be exactly what we need. I just don't see it, and I'm sorry to be so direct about saying that. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
Excerpts from Michael Chapman's message of 2014-08-21 23:30:44 -0700: On Fri, Aug 22, 2014 at 2:57 AM, Jay Pipes jaypi...@gmail.com wrote: On 08/19/2014 11:28 PM, Robert Collins wrote: On 20 August 2014 02:37, Jay Pipes jaypi...@gmail.com wrote: ... I'd like to see more unification of implementations in TripleO - but I still believe our basic principle of using OpenStack technologies that already exist in preference to third party ones is still sound, and offers substantial dogfood and virtuous circle benefits. No doubt Triple-O serves a valuable dogfood and virtuous cycle purpose. However, I would move that the Deployment Program should welcome the many projects currently in the stackforge/ code namespace that do deployment of OpenStack using traditional configuration management tools like Chef, Puppet, and Ansible. It cannot be argued that these configuration management systems are the de-facto way that OpenStack is deployed outside of HP, and they belong in the Deployment Program, IMO. I think you mean it 'can be argued'... ;). No, I definitely mean cannot be argued :) HP is the only company I know of that is deploying OpenStack using Triple-O. The vast majority of deployers I know of are deploying OpenStack using configuration management platforms and various systems or glue code for baremetal provisioning. Note that I am not saying that Triple-O is bad in any way! I'm only saying that it does not represent the way that the majority of real-world deployments are done. And I'd be happy if folk in those communities want to join in the deployment program and have code repositories in openstack/. To date, none have asked. My point in this thread has been and continues to be that by having the TC bless a certain project as The OpenStack Way of X, that we implicitly are saying to other valid alternatives Sorry, no need to apply here.. As a TC member, I would welcome someone from the Chef community proposing the Chef cookbooks for inclusion in the Deployment program, to live under the openstack/ code namespace. Same for the Puppet modules. While you may personally welcome the Chef community to propose joining the deployment Program and living under the openstack/ code namespace, I'm just saying that the impression our governance model and policies create is one of exclusion, not inclusion. Hope that clarifies better what I've been getting at. (As one of the core reviewers for the Puppet modules) Without a standardised package build process it's quite difficult to test trunk Puppet modules vs trunk official projects. This means we cut release branches some time after the projects themselves to give people a chance to test. Until this changes and the modules can be released with the same cadence as the integrated release I believe they should remain on Stackforge. Seems like the distros that build the packages are all doing lots of daily-build type stuff that could somehow be leveraged to get over that. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?
It has been brought to my attention that Ironic uses the biggest hammer in the IPMI toolbox to control chassis power: https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142 Which is ret = ipmicmd.set_power('off', wait) This is the most abrupt form, where the system power should be flipped off at a hardware level. The short press on the power button would be 'shutdown' instead of 'off'. I also understand that this has been brought up before, and that the answer given was SSH in and shut it down yourself. I can respect that position, but I have run into a bit of a pickle using it. Observe: - ssh box.ip poweroff - poll ironic until power state is off. - This is a race. Ironic is asserting the power. As soon as it sees that the power is off, it will turn it back on. - ssh box.ip halt - NO way to know that this has worked. Once SSH is off and the network stack is gone, I cannot actually verify that the disks were unmounted properly, which is the primary area of concern that I have. This is particulary important if I'm issuing a rebuild + preserve ephemeral, as it is likely I will have lots of I/O going on, and I want to make sure that it is all quiesced before I reboot to replace the software and reboot. Perhaps I missed something. If so, please do educate me on how I can achieve this without hacking around it. Currently my workaround is to manually unmount the state partition, which is something system shutdown is supposed to do and may become problematic if system processes are holding it open. It seems to me that Ironic should at least try to use the graceful shutdown. There can be a timeout, but it would need to be something a user can disable so if graceful never works we never just dump the power on the box. Even a journaled filesystem will take quite a bit to do a full fsck. The inability to gracefully shutdown in a reasonable amount of time is an error state really, and I need to go to the box and inspect it, which is precisely the reason we have ERROR states. Thanks for your time. :) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?
Excerpts from Jay Pipes's message of 2014-08-22 11:16:05 -0700: On 08/22/2014 01:48 PM, Clint Byrum wrote: It has been brought to my attention that Ironic uses the biggest hammer in the IPMI toolbox to control chassis power: https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142 Which is ret = ipmicmd.set_power('off', wait) This is the most abrupt form, where the system power should be flipped off at a hardware level. The short press on the power button would be 'shutdown' instead of 'off'. I also understand that this has been brought up before, and that the answer given was SSH in and shut it down yourself. I can respect that position, but I have run into a bit of a pickle using it. Observe: - ssh box.ip poweroff - poll ironic until power state is off. - This is a race. Ironic is asserting the power. As soon as it sees that the power is off, it will turn it back on. - ssh box.ip halt - NO way to know that this has worked. Once SSH is off and the network stack is gone, I cannot actually verify that the disks were unmounted properly, which is the primary area of concern that I have. This is particulary important if I'm issuing a rebuild + preserve ephemeral, as it is likely I will have lots of I/O going on, and I want to make sure that it is all quiesced before I reboot to replace the software and reboot. Perhaps I missed something. If so, please do educate me on how I can achieve this without hacking around it. Currently my workaround is to manually unmount the state partition, which is something system shutdown is supposed to do and may become problematic if system processes are holding it open. It seems to me that Ironic should at least try to use the graceful shutdown. There can be a timeout, but it would need to be something a user can disable so if graceful never works we never just dump the power on the box. Even a journaled filesystem will take quite a bit to do a full fsck. The inability to gracefully shutdown in a reasonable amount of time is an error state really, and I need to go to the box and inspect it, which is precisely the reason we have ERROR states. What about placing a runlevel script in /etc/init.d/ and symlinking it to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount the state partition in that script which would ensure disk state was quiesced, no? That's already what OS's do in their rc0.d. My point is, I don't have any way to know that process happened, without the box turning itself off after it succeeded. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
Excerpts from Duncan Thomas's message of 2014-08-21 09:21:06 -0700: On 21 August 2014 14:27, Jay Pipes jaypi...@gmail.com wrote: Specifically for Triple-O, by making the Deployment program == Triple-O, the TC has picked the disk-image-based deployment of an undercloud design as The OpenStack Way of Deployment. And as I've said previously in this thread, I believe that the deployment space is similarly unsettled, and that it would be more appropriate to let the Chef cookbooks and Puppet modules currently sitting in the stackforge/ code namespace live in the openstack/ code namespace. Totally agree with Jay here, I know people who gave up on trying to get any official project around deployment because they were told they had to do it under the TripleO umbrella This was why the _program_ versus _project_ distinction was made. But I think we ended up being 1:1 anyway. Perhaps the deployment program's mission statement is too narrow, and we should iterate on that. That others took their ball and went home, instead of asking for a review of that ruling, is a bit disconcerting. That probably strikes to the heart of the current crisis. If we were being reasonable, alternatives to an official OpenStack program's mission statement would be debated and considered thoughtfully. I know I made the mistake early on of pushing the narrow _TripleO_ vision into what should have been a much broader Deployment program. I'm not entirely sure why that seemed o-k to me at the time, or why it was allowed to continue, but I think it may be a good exercise to review those events and try to come up with a few theories or even conclusions as to what we could do better. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
Excerpts from David Kranz's message of 2014-08-21 12:45:05 -0700: On 08/21/2014 02:39 PM, gordon chung wrote: The point I've been making is that by the TC continuing to bless only the Ceilometer project as the OpenStack Way of Metering, I think we do a disservice to our users by picking a winner in a space that is clearly still unsettled. can we avoid using the word 'blessed' -- it's extremely vague and seems controversial. from what i know, no one is being told project x's services are the be all end all and based on experience, companies (should) know this. i've worked with other alternatives even though i contribute to ceilometer. Totally agree with Jay here, I know people who gave up on trying to get any official project around deployment because they were told they had to do it under the TripleO umbrella from the pov of a project that seems to be brought up constantly and maybe it's my naivety, i don't really understand the fascination with branding and the stigma people have placed on non-'openstack'/stackforge projects. it can't be a legal thing because i've gone through that potential mess. also, it's just as easy to contribute to 'non-openstack' projects as 'openstack' projects (even easier if we're honest). Yes, we should be honest. The even easier part is what Sandy cited as the primary motivation for pursuing stacktach instead of ceilometer. I think we need to consider the difference between why OpenStack wants to bless a project, and why a project might want to be blessed by OpenStack. Many folks believe that for OpenStack to be successful it needs to present itself as a stack that can be tested and deployed, not a sack of parts that only the most extremely clever people can manage to assemble into an actual cloud. In order to have such a stack, some code (or, alternatively, dare I say API...) needs to be blessed. Reasonable debates will continue about which pieces are essential to this stack, and which should be left to deployers, but metering was seen as such a component and therefore something needed to be blessed. The hope was that every one would jump on that and make it great but it seems that didn't quite happen (at least yet). Though Open Source has many advantages over proprietary development, the ability to choose a direction and marshal resources for efficient delivery is the biggest advantage of proprietary development like what AWS does. The TC process of blessing is, IMO, an attempt to compensate for that in an OpenSource project. Of course if the wrong code is blessed, the negative impact can be significant. Blessing APIs would be Hm, I wonder if the only difference there is when AWS blesses the wrong thing, they evaluate the business impact, and respond by going in a different direction, all behind closed doors. The shame is limited to that inner circle. Here, with full transparency, calling something the wrong thing is pretty much public humiliation for the team involved. So it stands to reason that we shouldn't call something the right thing if we aren't comfortable with the potential public shaming. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Glance][Heat] Murano split dsicussion
Excerpts from Georgy Okrokvertskhov's message of 2014-08-20 13:14:28 -0700: During last Atlanta summit there were couple discussions about Application Catalog and Application space projects in OpenStack. These cross-project discussions occurred as a result of Murano incubation request [1] during Icehouse cycle. On the TC meeting devoted to Murano incubation there was an idea about splitting the Murano into parts which might belong to different programs[2]. Today, I would like to initiate a discussion about potential splitting of Murano between two or three programs. *App Catalog API to Catalog Program* Application Catalog part can belong to Catalog program, the package repository will move to artifacts repository part where Murano team already participates. API part of App Catalog will add a thin layer of API methods specific to Murano applications and potentially can be implemented as a plugin to artifacts repository. Also this API layer will expose other 3rd party systems API like CloudFoundry ServiceBroker API which is used by CloudFoundry marketplace feature to provide an integration layer between OpenStack Application packages and 3rd party PaaS tools. I thought this was basically already agreed upon, and that Glance was just growing the ability to store more than just images. *Murano Engine to Orchestration Program* Murano engine orchestrates the Heat template generation. Complementary to a Heat declarative approach, Murano engine uses imperative approach so that it is possible to control the whole flow of the template generation. The engine uses Heat updates to update Heat templates to reflect changes in applications layout. Murano engine has a concept of actions - special flows which can be called at any time after application deployment to change application parameters or update stacks. The engine is actually complementary to Heat engine and adds the following: - orchestrate multiple Heat stacks - DR deployments, HA setups, multiple datacenters deployment These sound like features already requested directly in Heat. - Initiate and controls stack updates on application specific events Sounds like workflow. :) - Error handling and self-healing - being imperative Murano allows you to handle issues and implement additional logic around error handling and self-healing. Also sounds like workflow. I think we need to re-think what a program is before we consider this. I really don't know much about Murano. I have no interest in it at all, and nobody has come to me saying If we only had Murano in our orchestration toolbox, we'd solve xxx. But making them part of the Orchestration program would imply that we'll do design sessions together, that we'll share the same mission statement, and that we'll have just one PTL. I fail to see why they're not another, higher level program that builds on top of the other services. *Murano UI to Dashboard Program* Application Catalog requires a UI focused on user experience. Currently there is a Horizon plugin for Murano App Catalog which adds Application catalog page to browse, search and filter applications. It also adds a dynamic UI functionality to render a Horizon forms without writing an actual code. I feel like putting all the UI plugins in Horizon is the same mistake as putting all of the functional tests in Tempest. It doesn't have the affect of breaking the gate but it probably is a lot of burden on a single team. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
Excerpts from Robert Collins's message of 2014-08-18 23:41:20 -0700: On 18 August 2014 09:32, Clint Byrum cl...@fewbar.com wrote: I can see your perspective but I don't think its internally consistent... Here's why folk are questioning Ceilometer: Nova is a set of tools to abstract virtualization implementations. With a big chunk of local things - local image storage (now in glance), scheduling, rebalancing, ACLs and quotas. Other implementations that abstract over VM's at various layers already existed when Nova started - some bad ( some very bad!) and others actually quite ok. The fact that we have local implementations of domain specific things is irrelevant to the difference I'm trying to point out. Glance needs to work with the same authentication semantics and share a common access catalog to work well with Nova. It's unlikely there's a generic image catalog that would ever fit this bill. In many ways glance is just an abstraction of file storage backends and a database to track a certain domain of files (images, and soon, templates and other such things). The point of mentioning Nova is, we didn't write libvirt, or xen, we wrote an abstraction so that users could consume them via a REST API that shares these useful automated backends like glance. Neutron is a set of tools to abstract SDN/NFV implementations. And implements a DHCP service, DNS service, overlay networking : its much more than an abstraction-over-other-implementations. Native DHCP and overlay? Last I checked Neutron used dnsmasq and openvswitch, but it has been a few months, and I know that is an eon in OpenStack time. Cinder is a set of tools to abstract block-device implementations. Trove is a set of tools to simplify consumption of existing databases. Sahara is a set of tools to simplify Hadoop consumption. Swift is a feature-complete implementation of object storage, none of which existed when it was started. Swift was started in 2009; Eucalyptus goes back to 2007, with Walrus part of that - I haven't checked precise dates, but I'm pretty sure that it existed and was usable by the start of 2009. There may well be other object storage implementations too - I simply haven't checked. Indeed, and MogileFS was sort of like Swift but not HTTP based. Perhaps Walrus was evaluated and inadequate for the CloudFiles product requirements? I don't know. But there weren't de-facto object stores at the time because object stores were just becoming popular. Keystone supports all of the above, unifying their auth. And implementing an IdP (which I know they want to stop doing ;)). And in fact lots of OpenStack projects, for various reasons support *not* using Keystone (something that bugs me, but thats a different discussion). My point was it is justified to have a whole implementation and not just abstraction because it is meant to enable the ecosystem, not _be_ the ecosystem. I actually think Keystone is problematic too, and I often wonder why we haven't just do OAuth, but I'm not trying to throw every project under the bus. I'm trying to state that we accept Keystone because it has grown organically to support the needs of all the other pieces. Horizon supports all of the above, unifying their GUI. Ceilometer is a complete implementation of data collection and alerting. There is no shortage of implementations that exist already. I'm also core on two projects that are getting some push back these days: Heat is a complete implementation of orchestration. There are at least a few of these already in existence, though not as many as their are data collection and alerting systems. TripleO is an attempt to deploy OpenStack using tools that OpenStack provides. There are already quite a few other tools that _can_ deploy OpenStack, so it stands to reason that people will question why we don't just use those. It is my hope we'll push more into the unifying the implementations space and withdraw a bit from the implementing stuff space. So, you see, people are happy to unify around a single abstraction, but not so much around a brand new implementation of things that already exist. If the other examples we had were a lot purer, this explanation would make sense. I think there's more to it than that though :). If purity is required to show a difference, then I don't think I know how to demonstrate what I think is obvious to most of us: Ceilometer is an end to end implementation of things that exist in many battle tested implementations. I struggle to think of another component of OpenStack that has this distinction. What exactly, I don't know, but its just too easy an answer, and one that doesn't stand up to non-trivial examination :(. I'd like to see more unification of implementations in TripleO - but I still believe our basic principle of using OpenStack technologies that already exist in preference to third party ones is still
Re: [openstack-dev] [all] The future of the integrated release
Excerpts from Jay Pipes's message of 2014-08-20 14:53:22 -0700: On 08/20/2014 05:06 PM, Chris Friesen wrote: On 08/20/2014 07:21 AM, Jay Pipes wrote: Hi Thierry, thanks for the reply. Comments inline. :) On 08/20/2014 06:32 AM, Thierry Carrez wrote: If we want to follow your model, we probably would have to dissolve programs as they stand right now, and have blessed categories on one side, and teams on the other (with projects from some teams being blessed as the current solution). Why do we have to have blessed categories at all? I'd like to think of a day when the TC isn't picking winners or losers at all. Level the playing field and let the quality of the projects themselves determine the winner in the space. Stop the incubation and graduation madness and change the role of the TC to instead play an advisory role to upcoming (and existing!) projects on the best ways to integrate with other OpenStack projects, if integration is something that is natural for the project to work towards. It seems to me that at some point you need to have a recommended way of doing things, otherwise it's going to be *really hard* for someone to bring up an OpenStack installation. Why can't there be multiple recommended ways of setting up an OpenStack installation? Matter of fact, in reality, there already are multiple recommended ways of setting up an OpenStack installation, aren't there? There's multiple distributions of OpenStack, multiple ways of doing bare-metal deployment, multiple ways of deploying different message queues and DBs, multiple ways of establishing networking, multiple open and proprietary monitoring systems to choose from, etc. And I don't really see anything wrong with that. This is an argument for loosely coupling things, rather than tightly integrating things. You will almost always win my vote with that sort of movement, and you have here. +1. We already run into issues with something as basic as competing SQL databases. If the TC suddenly said Only MySQL will be supported, that would not mean that the greater OpenStack community would be served better. It would just unnecessarily take options away from deployers. This is really where supported becomes the mutex binding us all. The more supported options, the larger the matrix, the more complex a user's decision process becomes. If every component has several competing implementations and none of them are official how many more interaction issues are going to trip us up? IMO, OpenStack should be about choice. Choice of hypervisor, choice of DB and MQ infrastructure, choice of operating systems, choice of storage vendors, choice of networking vendors. Err, uh. I think OpenStack should be about users. If having 400 choices means users are just confused, then OpenStack becomes nothing and everything all at once. Choices should be part of the whole not when 1% of the market wants a choice, but when 20%+ of the market _requires_ a choice. What we shouldn't do is harm that 1%'s ability to be successful. We should foster it and help it grow, but we don't just pull it into the program and say You're ALSO in OpenStack now! and we also don't want to force those users to make a hard choice because the better solution is not blessed. If there are multiple actively-developed projects that address the same problem space, I think it serves our OpenStack users best to let the projects work things out themselves and let the cream rise to the top. If the cream ends up being one of those projects, so be it. If the cream ends up being a mix of both projects, so be it. The production community will end up determining what that cream should be based on what it deploys into its clouds and what input it supplies to the teams working on competing implementations. I'm really not a fan of making it a competitive market. If a space has a diverse set of problems, we can expect it will have a diverse set of solutions that overlap. But that doesn't mean they both need to drive toward making that overlap all-encompassing. Sometimes that happens and it is good, and sometimes that happens and it causes horrible bloat. And who knows... what works or is recommended by one deployer may not be what is best for another type of deployer and I believe we (the TC/governance) do a disservice to our user community by picking a winner in a space too early (or continuing to pick a winner in a clearly unsettled space). Right, I think our current situation crowds out diversity, when what we want to do is enable it, without confusing the users. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
Here's why folk are questioning Ceilometer: Nova is a set of tools to abstract virtualization implementations. Neutron is a set of tools to abstract SDN/NFV implementations. Cinder is a set of tools to abstract block-device implementations. Trove is a set of tools to simplify consumption of existing databases. Sahara is a set of tools to simplify Hadoop consumption. Swift is a feature-complete implementation of object storage, none of which existed when it was started. Keystone supports all of the above, unifying their auth. Horizon supports all of the above, unifying their GUI. Ceilometer is a complete implementation of data collection and alerting. There is no shortage of implementations that exist already. I'm also core on two projects that are getting some push back these days: Heat is a complete implementation of orchestration. There are at least a few of these already in existence, though not as many as their are data collection and alerting systems. TripleO is an attempt to deploy OpenStack using tools that OpenStack provides. There are already quite a few other tools that _can_ deploy OpenStack, so it stands to reason that people will question why we don't just use those. It is my hope we'll push more into the unifying the implementations space and withdraw a bit from the implementing stuff space. So, you see, people are happy to unify around a single abstraction, but not so much around a brand new implementation of things that already exist. Excerpts from Nadya Privalova's message of 2014-08-17 11:11:34 -0700: Hello all, As a Ceilometer's core, I'd like to add my 0.02$. During previous discussions it was mentioned several projects which were started or continue to be developed after Ceilometer became integrated. The main question I'm thinking of is why it was impossible to contribute into existing integrated project? Is it because of Ceilometer's architecture, the team or there are some other (maybe political) reasons? I think it's a very sad situation when we have 3-4 Ceilometer-like projects from different companies instead of the only one that satisfies everybody. (We don't see it in other projects. Though, maybe there are several Novas os Neutrons on StackForge and I don't know about it...) Of course, sometimes it's much easier to start the project from scratch. But there should be strong reasons for doing this if we are talking about integrated project. IMHO the idea, the role is the most important thing when we are talking about integrated project. And if Ceilometer's role is really needed (and I think it is) then we should improve existing implementation, merge all needs into the one project and the result will be still Ceilometer. Thanks, Nadya On Fri, Aug 15, 2014 at 12:41 AM, Joe Gordon joe.gord...@gmail.com wrote: On Wed, Aug 13, 2014 at 12:24 PM, Doug Hellmann d...@doughellmann.com wrote: On Aug 13, 2014, at 3:05 PM, Eoghan Glynn egl...@redhat.com wrote: At the end of the day, that's probably going to mean saying No to more things. Everytime I turn around everyone wants the TC to say No to things, just not to their particular thing. :) Which is human nature. But I think if we don't start saying No to more things we're going to end up with a pile of mud that no one is happy with. That we're being so abstract about all of this is frustrating. I get that no-one wants to start a flamewar, but can someone be concrete about what they feel we should say 'no' to but are likely to say 'yes' to? I'll bite, but please note this is a strawman. No: * Accepting any more projects into incubation until we are comfortable with the state of things again * Marconi * Ceilometer Well -1 to that, obviously, from me. Ceilometer is on track to fully execute on the gap analysis coverage plan agreed with the TC at the outset of this cycle, and has an active plan in progress to address architectural debt. Yes, there seems to be an attitude among several people in the community that the Ceilometer team denies that there are issues and refuses to work on them. Neither of those things is the case from our perspective. Totally agree. Can you be more specific about the shortcomings you see in the project that aren’t being addressed? Once again, this is just a strawman. I'm just not sure OpenStack has 'blessed' the best solution out there. https://wiki.openstack.org/wiki/Ceilometer/Graduation#Why_we_think_we.27re_ready - Successfully passed the challenge of being adopted by 3 related projects which have agreed to join or use ceilometer: - Synaps - Healthnmon - StackTach https://wiki.openstack.org/w/index.php?title=StackTachaction=editredlink=1 Stacktach seems to still be under active development ( http://git.openstack.org/cgit/stackforge/stacktach/log/), is used by
Re: [openstack-dev] [TripleO] fix poor tarball support in source-repositories
Excerpts from Jyoti Ranjan's message of 2014-08-16 00:57:52 -0700: We will have to be little bit cautious in using glob because of its inherent usage pattern. For e.g. the file starting with . will not get matched. That is a separate bug, but I think the answer to that is to use rsync instead of mv and globs. So this: mv $tmp/./* $destdir becomes this: rsync --remove-source-files $tmp/. $destdir ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Time to Samba! :-)
Excerpts from Martinx - ジェームズ's message of 2014-08-16 12:03:20 -0700: Hey Stackers, I'm wondering here... Samba4 is pretty solid (up coming 4.2 rocks), I'm using it on a daily basis as an AD DC controller, for both Windows and Linux Instances! With replication, file system ACLs - cifs, built-in LDAP, dynamic DNS with Bind9 as a backend (no netbios) and etc... Pretty cool! In OpenStack ecosystem, there are awesome solutions like Trove, Solum, Designate and etc... Amazing times BTW! So, why not try to integrate Samba4, working as an AD DC, within OpenStack itself?! But, if we did that, what would be left for us to reinvent in our own slightly different way? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] fix poor tarball support in source-repositories
Excerpts from Brownell, Jonathan C (Corvallis)'s message of 2014-08-15 08:11:18 -0700: The current DIB element support for downloading tarballs via source-repository allows an entry in the following form: name tar targetdir url Today, this feature is currently used only by the mysql DIB element. You can see how it's used here: https://github.com/openstack/tripleo-image-elements/blob/master/elements/mysql/source-repository-mysql However, the underlying diskimage-builder implementation of tarball handling is rather odd and inflexible. After downloading the file (or retrieving from cache) and unpacking into a tmp directory, it performs: mv $tmp/*/* $targetdir This does work as long as the tarball follows a structure where all its files/directories are contained within a single directory, but it fails if the tarball contains no subdirectories. (Even worse is when it contains some files and some subdirectories, in which case the files are lost and the contents of all subdirs get lumped together in the output folder.) Since this tarball support is only used today by the mysql DIB element, I would love to fix this in both diskimage-builder and tripleo-image-element by changing to simply: mv $tmp/* $targetdir And then manually tweaking the directory structure of $targetdir from a new install.d script in the mysql element to restore the desired layout. However, it's important to note that this will break backwards compatibility if tarball support is used in its current fashion by users with private DIB elements. Personally, I consider the current behavior so egregious that it really needs to be fixed across the board rather than preserving backwards compatibility. Do others agree? If not, do you have suggestions as to how to improve this mechanism cleanly without sacrificing backwards compatibility? How about we make a glob to use, so like this: mysql tar /usr/local/mysql http://someplace/mysql.tar.gz mysql-5.* That would result in mv $tmp/mysql-5.*/* $targetdir And then we would warn that assuming the glob will be '*' is deprecated, to be changed in a later release. Users who want your proposed behavior would use . until the default changes. That would result in mv $tmp/./* $targetdir ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] The future of the integrated release
Excerpts from Thierry Carrez's message of 2014-08-13 02:54:58 -0700: Rochelle.RochelleGrober wrote: [...] So, with all that prologue, here is what I propose (and please consider proposing your improvements/changes to it). I would like to see for Kilo: - IRC meetings and mailing list meetings beginning with Juno release and continuing through the summit that focus on core project needs (what Thierry call strategic) that as a set would be considered the primary focus of the Kilo release for each project. This could include high priority bugs, refactoring projects, small improvement projects, high interest extensions and new features, specs that didn't make it into Juno, etc. - Develop the list and prioritize it into Needs and Wants. Consider these the feeder projects for the two runways if you like. - Discuss the lists. Maybe have a community vote? The vote will freeze the list, but as in most development project freezes, it can be a soft freeze that the core, or drivers or TC can amend (or throw out for that matter). [...] One thing we've been unable to do so far is to set release goals at the beginning of a release cycle and stick to those. It used to be because we were so fast moving that new awesome stuff was proposed mid-cycle and ends up being a key feature (sometimes THE key feature) for the project. Now it's because there is so much proposed noone knows what will actually get completed. So while I agree that what you propose is the ultimate solution (and the workflow I've pushed PTLs to follow every single OpenStack release so far), we have struggled to have the visibility, long-term thinking and discipline to stick to it in the past. If you look at the post-summit plans and compare to what we end up in a release, you'll see quite a lot of differences :) I think that shows agility, and isn't actually a problem. 6 months is quite a long time in the future for some business models. Strategic improvements for the project should be able to stick to a 6 month schedule, but companies will likely be tactical about where their developer resources are directed for feature work. The fact that those resources land code upstream is one of the greatest strengths of OpenStack. Any potential impact on how that happens should be carefully considered when making any changes to process and governance. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO
Excerpts from Steve Baker's message of 2014-08-10 15:33:26 -0700: On 02/08/14 04:07, Allison Randal wrote: A few of us have been independently experimenting with Ansible as a backend for TripleO, and have just decided to try experimenting together. I've chatted with Robert, and he says that TripleO was always intended to have pluggable backends (CM layer), and just never had anyone interested in working on them. (I see it now, even in the early docs and talks, I guess I just couldn't see the forest for the trees.) So, the work is in line with the overall goals of the TripleO project. We're starting with a tiny scope, focused only on updating a running TripleO deployment, so our first work is in: - Create an Ansible Dynamic Inventory plugin to extract metadata from Heat - Improve/extend the Ansible nova_compute Cloud Module (or create a new one), for Nova rebuild - Develop a minimal handoff from Heat to Ansible, particularly focused on the interactions between os-collect-config and Ansible We're merging our work in this repo, until we figure out where it should live: https://github.com/allisonrandal/tripleo-ansible We've set ourselves one week as the first sanity-check to see whether this idea is going anywhere, and we may scrap it all at that point. But, it seems best to be totally transparent about the idea from the start, so no-one is surprised later. Having pluggable backends for configuration seems like a good idea, and Ansible is a great choice for the first alternative backend. TripleO is intended to be loosely coupled for many components, not just in-instance configuration. However what this repo seems to be doing at the moment is bypassing heat to do a stack update, and I can only assume there is an eventual goal to not use heat at all for stack orchestration too. Granted, until blueprint update-failure-recovery lands[1] then doing a stack-update is about as much fun as russian roulette. But this effort is tactical rather than strategic, especially given TripleO's mission statement. We intend to stay modular. Ansible won't replace Heat from end to end. Right now we're stuck with an update that just doesn't work. It isn't just about update-failure-recovery, which is coming along nicely, but it is also about the lack of signals to control rebuild, poor support for addressing machines as groups, and unacceptable performance in large stacks. We remain committed to driving these things into Heat, which will allow us to address these things the way a large scale operation will need to. But until we can land those things in Heat, we need something more flexible like Ansible to go around Heat and do things in the exact order we need them done. Ansible doesn't have a REST API, which is a non-starter for modern automation, but the need to control workflow is greater than the need to have a REST API at this point. If I were to use Ansible for TripleO configuration I would start with something like the following: * Install an ansible software-config hook onto the image to be triggered by os-refresh-config[2][3] * Incrementally replace StructuredConfig resources in tripleo-heat-templates with SoftwareConfig resources that include the ansible playbooks via get_file * The above can start in a fork of tripleo-heat-templates, but can eventually be structured using resource providers so that the deployer chooses what configuration backend to use by selecting the environment file that contains the appropriate config resources Now you have a cloud orchestrated by heat and configured by Ansible. If it is still deemed necessary to do an out-of-band update to the stack then you're in a much better position to do an ansible push, since you can use the same playbook files that heat used to bring up the stack. That would be a good plan if we wanted to fix issues with os-*-config, but that is the opposite of reality. We are working around Heat orchestration issues with Ansible. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO
Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700: On 11/08/14 10:46, Clint Byrum wrote: Right now we're stuck with an update that just doesn't work. It isn't just about update-failure-recovery, which is coming along nicely, but it is also about the lack of signals to control rebuild, poor support for addressing machines as groups, and unacceptable performance in large stacks. Are there blueprints/bugs filed for all of these issues? Convergnce addresses the poor performance for large stacks in general. We also have this: https://bugs.launchpad.net/heat/+bug/1306743 Which shows how slow metadata access can get. I have worked on patches but haven't been able to complete them. We made big strides but we are at a point where 40 nodes polling Heat every 30s is too much for one CPU to handle. When we scaled Heat out onto more CPUs on one box by forking we ran into eventlet issues. We also ran into issues because even with many processes we can only use one to resolve templates for a single stack during update, which was also excessively slow. We haven't been able to come back around to those yet, but you can see where this has turned into a bit of a rat hole of optimization. action-aware-sw-config is sort of what we want for rebuild. We collaborated with the trove devs on how to also address it for resize a while back but I have lost track of that work as it has taken a back seat to more pressing issues. Addressing groups is a general problem that I've had a hard time articulating in the past. Tomas Sedovic has done a good job with this TripleO spec, but I don't know that we've asked for an explicit change in a bug or spec in Heat just yet: https://review.openstack.org/#/c/97939/ There are a number of other issues noted in that spec which are already addressed in Heat, but require refactoring in TripleO's templates and tools, and that work continues. The point remains: we need something that works now, and doing an alternate implementation for updates is actually faster than addressing all of these issues. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO
Excerpts from Steven Hardy's message of 2014-08-11 11:40:07 -0700: On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote: Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700: On 11/08/14 10:46, Clint Byrum wrote: Right now we're stuck with an update that just doesn't work. It isn't just about update-failure-recovery, which is coming along nicely, but it is also about the lack of signals to control rebuild, poor support for addressing machines as groups, and unacceptable performance in large stacks. Are there blueprints/bugs filed for all of these issues? Convergnce addresses the poor performance for large stacks in general. We also have this: https://bugs.launchpad.net/heat/+bug/1306743 Which shows how slow metadata access can get. I have worked on patches but haven't been able to complete them. We made big strides but we are at a point where 40 nodes polling Heat every 30s is too much for one CPU to handle. When we scaled Heat out onto more CPUs on one box by forking we ran into eventlet issues. We also ran into issues because even with many processes we can only use one to resolve templates for a single stack during update, which was also excessively slow. Related to this, and a discussion we had recently at the TripleO meetup is this spec I raised today: https://review.openstack.org/#/c/113296/ It's following up on the idea that we could potentially address (or at least mitigate, pending the fully convergence-ified heat) some of these scalability concerns, if TripleO moves from the one-giant-template model to a more modular nested-stack/provider model (e.g what Tomas has been working on) I've not got into enough detail on that yet to be sure if it's acheivable for Juno, but it seems initially to be complex-but-doable. I'd welcome feedback on that idea and how it may fit in with the more granular convergence-engine model. Can you link to the eventlet/forking issues bug please? I thought since bug #1321303 was fixed that multiple engines and multiple workers should work OK, and obviously that being true is a precondition to expending significant effort on the nested stack decoupling plan above. That was the issue. So we fixed that bug, but we never un-reverted the patch that forks enough engines to use up all the CPU's on a box by default. That would likely help a lot with metadata access speed (we could manually do it in TripleO but we tend to push defaults. :) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO
Excerpts from Zane Bitter's message of 2014-08-11 13:35:44 -0700: On 11/08/14 14:49, Clint Byrum wrote: Excerpts from Steven Hardy's message of 2014-08-11 11:40:07 -0700: On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote: Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700: On 11/08/14 10:46, Clint Byrum wrote: Right now we're stuck with an update that just doesn't work. It isn't just about update-failure-recovery, which is coming along nicely, but it is also about the lack of signals to control rebuild, poor support for addressing machines as groups, and unacceptable performance in large stacks. Are there blueprints/bugs filed for all of these issues? Convergnce addresses the poor performance for large stacks in general. We also have this: https://bugs.launchpad.net/heat/+bug/1306743 Which shows how slow metadata access can get. I have worked on patches but haven't been able to complete them. We made big strides but we are at a point where 40 nodes polling Heat every 30s is too much for one CPU This sounds like the same figure I heard at the design summit; did the DB call optimisation work that Steve Baker did immediately after that not have any effect? Steve's work got us to 40. From 7. to handle. When we scaled Heat out onto more CPUs on one box by forking we ran into eventlet issues. We also ran into issues because even with many processes we can only use one to resolve templates for a single stack during update, which was also excessively slow. Related to this, and a discussion we had recently at the TripleO meetup is this spec I raised today: https://review.openstack.org/#/c/113296/ It's following up on the idea that we could potentially address (or at least mitigate, pending the fully convergence-ified heat) some of these scalability concerns, if TripleO moves from the one-giant-template model to a more modular nested-stack/provider model (e.g what Tomas has been working on) I've not got into enough detail on that yet to be sure if it's acheivable for Juno, but it seems initially to be complex-but-doable. I'd welcome feedback on that idea and how it may fit in with the more granular convergence-engine model. Can you link to the eventlet/forking issues bug please? I thought since bug #1321303 was fixed that multiple engines and multiple workers should work OK, and obviously that being true is a precondition to expending significant effort on the nested stack decoupling plan above. That was the issue. So we fixed that bug, but we never un-reverted the patch that forks enough engines to use up all the CPU's on a box by default. That would likely help a lot with metadata access speed (we could manually do it in TripleO but we tend to push defaults. :) Right, and we decided we wouldn't because it's wrong to do that to people by default. In some cases the optimal running configuration for TripleO will differ from the friendliest out-of-the-box configuration for Heat users in general, and in those cases - of which this is one - TripleO will need to specify the configuration. Whether or not the default should be to fork 1 process per CPU is a debate for another time. The point is, we can safely use the forking in Heat now to perhaps improve performance of metadata polling. Chasing that, and other optimizations, has not led us to a place where we can get to, say, 100 real nodes _today_. We're chasing another way to get to the scale and capability we need _today_, in much the same way we did with merge.py. We'll find the way to get it done more elegantly as time permits. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] a small experiment with Ansible in TripleO
I've been fiddling on github. This repo is unfortunately named the same but is not the same ancestry as yours. Anyway, the branch 'fiddling' has a working Heat inventory plugin which should give you a hostvar of 'heat_metadata' per host in the given stack. https://github.com/SpamapS/tripleo-ansible/blob/fiddling/plugins/inventory/heat.py Note that in the root there is a 'heat-ansible-inventory.conf' that is an example config (works w/ devstack) to query a heat stack and turn it into an ansible inventory. That uses oslo.config so all of the usual patterns for loading configs in openstack should apply. Excerpts from Allison Randal's message of 2014-08-01 09:07:44 -0700: A few of us have been independently experimenting with Ansible as a backend for TripleO, and have just decided to try experimenting together. I've chatted with Robert, and he says that TripleO was always intended to have pluggable backends (CM layer), and just never had anyone interested in working on them. (I see it now, even in the early docs and talks, I guess I just couldn't see the forest for the trees.) So, the work is in line with the overall goals of the TripleO project. We're starting with a tiny scope, focused only on updating a running TripleO deployment, so our first work is in: - Create an Ansible Dynamic Inventory plugin to extract metadata from Heat - Improve/extend the Ansible nova_compute Cloud Module (or create a new one), for Nova rebuild - Develop a minimal handoff from Heat to Ansible, particularly focused on the interactions between os-collect-config and Ansible We're merging our work in this repo, until we figure out where it should live: https://github.com/allisonrandal/tripleo-ansible We've set ourselves one week as the first sanity-check to see whether this idea is going anywhere, and we may scrap it all at that point. But, it seems best to be totally transparent about the idea from the start, so no-one is surprised later. Cheers, Allison ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Cross-server locking for neutron server
Please do not re-invent locking.. the way we reinvented locking in Heat. ;) There are well known distributed coordination services such as Zookeeper and etcd, and there is an abstraction for them already called tooz: https://git.openstack.org/cgit/stackforge/tooz/ Excerpts from Elena Ezhova's message of 2014-07-30 09:09:27 -0700: Hello everyone! Some recent change requests ([1], [2]) show that there is a number of issues with locking db resources in Neutron. One of them is initialization of drivers which can be performed simultaneously by several neutron servers. In this case locking is essential for avoiding conflicts which is now mostly done via using SQLAlchemy's with_lockmode() method, which emits SELECT..FOR UPDATE resulting in rows being locked within a transaction. As it has been already stated by Mike Bayer [3], this statement is not supported by Galera and, what’s more, by Postgresql for which a lock doesn’t work in case when a table is empty. That is why there is a need for an easy solution that would allow cross-server locking and would work for every backend. First thing that comes into mind is to create a table which would contain all locks acquired by various pieces of code. Each time a code, that wishes to access a table that needs locking, would have to perform the following steps: 1. Check whether a lock is already acquired by using SELECT lock_name FROM cross_server_locks table. 2. If SELECT returned None, acquire a lock by inserting it into the cross_server_locks table. In other case wait and then try again until a timeout is reached. 3. After a code has executed it should release the lock by deleting the corresponding entry from the cross_server_locks table. The locking process can be implemented by decorating a function that performs a transaction by a special function, or as a context manager. Thus, I wanted to ask the community whether this approach deserves consideration and, if yes, it would be necessary to decide on the format of an entry in cross_server_locks table: how a lock_name should be formed, whether to support different locking modes, etc. [1] https://review.openstack.org/#/c/101982/ [2] https://review.openstack.org/#/c/107350/ [3] https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Pessimistic_Locking_-_SELECT_FOR_UPDATE ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Cross-server locking for neutron server
Excerpts from Doug Wiegley's message of 2014-07-30 09:48:17 -0700: I'd have to look at the Neutron code, but I suspect that a simple strategy of issuing the UPDATE SQL statement with a WHERE condition that I¹m assuming the locking is for serializing code, whereas for what you describe above, is there some reason we wouldn¹t just use a transaction? I believe the code in question is doing something like this: 1) Check DB for initialized SDN controller driver 2) Not initialized - initialize the SDN controller via its API 3) Record in DB that it is initialized Step (2) above needs serialization, not (3). Compare and update will end up working like a distributed lock anyway, because the db model will have to be changed to have an initializing state, and then if initializing fails, you'll have to have a timeout.. and stealing for stuck processes. Sometimes a distributed lock is actually a simpler solution. Tooz will need work, no doubt. Perhaps if we call it 'oslo.locking' it will make more sense. Anyway, my point stands: trust the experts, avoid reinventing locking. And if you don't like tooz, extract the locking code from Heat and turn it into an oslo.locking library or something. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [Solum] Stack update and raw_template backup
Excerpts from Anant Patil's message of 2014-07-29 23:21:05 -0700: On 28-Jul-14 22:37, Clint Byrum wrote: Excerpts from Zane Bitter's message of 2014-07-28 07:25:24 -0700: On 26/07/14 00:04, Anant Patil wrote: When the stack is updated, a diff of updated template and current template can be stored to optimize database. And perhaps Heat should have an API to retrieve this history of templates for inspection etc. when the stack admin needs it. If there's a demand for that feature we could implement it, but it doesn't easily fall out of the current implementation any more. We are never going to do it even 1/10th as well as git. In fact we won't even do it 1/0th as well as CVS. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Zane, I am working the defect you had filed, which would clean up backup stack along with the resources, templates and other data. However, I simply don't want to delete the templates for the same reason as we don't hard-delete the stack. Anyone who deploys a stack and updates it over time would want to view the the updates in the templates for debugging or auditing reasons. It is not fair to assume that every user has a VCS with him to store the templates. It is kind of inconvenience for me to not have the ability to view my updates in templates. Sounds like a nice to have feature. I'd suggest you propose it as a blueprint and spec. I will personally be against us spending time and adding complexity for such a feature when it is so much better served by VCS. And I would also suggest that we _can_ assume that users have VCS. When is the last time you encountered a developer or ops professional that did not use at least some kind of VCS? For me, it was 2003, and it took approximately 20 minutes to implement. And if we want that as a service, I believe Solum is working on doing that. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Cross-server locking for neutron server
Excerpts from Jay Pipes's message of 2014-07-30 13:53:38 -0700: On 07/30/2014 12:21 PM, Kevin Benton wrote: Maybe I misunderstood your approach then. I though you were suggesting where a node performs an UPDATE record WHERE record = last_state_node_saw query and then checks the number of affected rows. That's optimistic locking by every definition I've heard of it. It matches the following statement from the wiki article you linked to as well: The latter situation (optimistic locking) is only appropriate when there is less chance of someone needing to access the record while it is locked; otherwise it cannot be certain that the update will succeed because the attempt to update the record will fail if another user updates the record first. Did I misinterpret how your approach works? The record is never locked in my approach, why is why I don't like to think of it as optimistic locking. It's more like optimistic read and update with retry if certain conditions continue to be met... :) To be very precise, the record is never locked explicitly -- either through the use of SELECT FOR UPDATE or some explicit file or distributed lock. InnoDB won't even hold a lock on anything, as it will simply add a new version to the row using its MGCC (sometimes called MVCC) methods. The technique I am showing in the patch relies on the behaviour of the SQL UPDATE statement with a WHERE clause that contains certain columns and values from the original view of the record. The behaviour of the UPDATE statement will be a NOOP when some other thread has updated the record in between the time that the first thread read the record, and the time the first thread attempted to update the record. The caller of UPDATE can detect this NOOP by checking the number of affected rows, and retry the UPDATE if certain conditions remain kosher... So, there's actually no locks taken in the entire process, which is why I object to the term optimistic locking :) I think where the confusion has been is that the initial SELECT and the following UPDATE statements are *not* done in the context of a single SQL transaction... This is all true at a low level Jay. But if you're serializing something outside the DB by using the doing it versus done it state, it still acts like a lock. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] Saving the original raw template in the DB
Excerpts from Ton Ngo's message of 2014-07-29 13:53:12 -0700: Hi everyone, The raw template saved in the DB used to be the original template that a user submits. With the recent fix for stack update, it now reflects the template that is actually deployed, so it may be different from the original template because some resources may fail to deploy. I would like to solicit some feedback on saving the original template in the DB separately from the deployed template. I can think of two use cases for retrieving the original template: Debugging: running stack-update using the same template after fixing environmental problems. The CLI and API can be extended to allow reusing the original template without having to provide it again. Convergence or retry: some initial resource deployment may fail intermittently, but the user can retry later. I believe this use case is far better handled via vcs. We need the template to parse the current state of the stack. The user will have their intended template and can have their intended parameter values all included in a VCS. Are there other potential use cases?The cost would be an extra copy of the template in the raw template table for each stack if there is failure, and a new column in the stack table to hold the id. We can argue that the user should have the original template to resubmit, but it seems useful and convenient to save it in the DB. Ton Ngo, Additional cost is the additional complexity of code to manage the data. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] Stack update and raw_template backup
Excerpts from Zane Bitter's message of 2014-07-28 07:25:24 -0700: On 26/07/14 00:04, Anant Patil wrote: When the stack is updated, a diff of updated template and current template can be stored to optimize database. And perhaps Heat should have an API to retrieve this history of templates for inspection etc. when the stack admin needs it. If there's a demand for that feature we could implement it, but it doesn't easily fall out of the current implementation any more. We are never going to do it even 1/10th as well as git. In fact we won't even do it 1/0th as well as CVS. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Use MariaDB by default on Fedora
Excerpts from John Griffith's message of 2014-07-25 06:59:38 -0700: On Fri, Jul 25, 2014 at 7:38 AM, Kerrin, Michael michael.ker...@hp.com wrote: Coming back to this. I have updated the review https://review.openstack.org/#/c/90134/ so that it passing CI for ubuntu (obviously failing on fedora) and I am happy with it. In order to close this off my plan is to getting feedback on the mysql element in this review. Any changes that people request in the next few days I will make and test via the CI and internally. Next I will rename mysql - percona and restore the old mysql in this review. At which point the percona code will not be tested via CI so I don't want to make any more changes at that point so I hope it will get approved. So this review will move to adding a percona element. Then following the mariadb integration I would like to get this https://review.openstack.org/#/c/109415/ change to tripleo-incubator through that will include the new percona element in ubuntu images. So in the CI fedora will us mariadb and ubuntu will use percona. Looking forward to any feedback, Michael On 09 July 2014 14:44:15, Sullivan, Jon Paul wrote: -Original Message- From: Giulio Fidente [mailto:gfide...@redhat.com] Sent: 04 July 2014 14:37 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [TripleO] Use MariaDB by default on Fedora On 07/01/2014 05:47 PM, Michael Kerrin wrote: I propose making mysql an abstract element and user must choose either percona or mariadb-rpm element.CI must be setup correctly +1 seems a cleaner and more sustainable approach There was some concern from lifeless around recreating package-style dependencies in dib with element-provides/element-deps, in particular a suggestion that meta-elements are not desirable[1] (I hope I am paraphrasing you correctly Rob). That said, this is exactly the reason that element-provides was brought in, so that the definition of the image could have mysql as an element, but that the DIB_*_EXTRA_ARGS variable would provide the correct one, which would then list itself as providing mysql. This would not prevent the sharing of common code through a differently-named element, such as mysql-common. [1] see comments on April 10th in https://review.openstack.org/#/c/85776/ -- Giulio Fidente GPG KEY: 08D733BA ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Thanks, Jon-Paul Sullivan ☺ Cloud Services - @hpcloud Postal Address: Hewlett-Packard Galway Limited, Ballybrit Business Park, Galway. Registered Office: Hewlett-Packard Galway Limited, 63-74 Sir John Rogerson's Quay, Dublin 2. Registered Number: 361933 The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated, you should consider this message and attachments as HP CONFIDENTIAL. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev So this all sounds like an interesting mess. I'm not even really sure I follow all that's going on in the database area with the exception of the design which it seems is something that takes no account for testing or commonality across platforms (pretty bad IMO) but I don't have any insight there so I'll butt out. The LIO versus Tgt thing however is a bit troubling. Is there a reason that TripleO decided to do the exact opposite of what the defaults are in the rest of OpenStack here? Also any reason why if there was a valid justification for this it didn't seem like it might be worthwhile to work with the rest of the OpenStack community and share what they considered to be the better solution here? John, please be specific when you say the defaults are in the rest of OpenStack. We have a stated goal to deploy _with the defaults_. The default iscsi_helper is tgtadmin. We deploy with that unless another is selected. As you see below, nothing is asserted there unless a value is set: https://git.openstack.org/cgit/openstack/tripleo-image-elements/tree/elements/cinder/os-apply-config/etc/cinder/cinder.conf#n41 And the default in the Heat templates that will set that value matches cinder's current default:
Re: [openstack-dev] [heat]Heat Db Model updates
Excerpts from Zane Bitter's message of 2014-07-24 12:09:39 -0700: On 17/07/14 07:51, Ryan Brown wrote: On 07/17/2014 03:33 AM, Steven Hardy wrote: On Thu, Jul 17, 2014 at 12:31:05AM -0400, Zane Bitter wrote: On 16/07/14 23:48, Manickam, Kanagaraj wrote: SNIP *Resource* Status action should be enum of predefined status +1 Rsrc_metadata - make full name resource_metadata -0. I don't see any benefit here. Agreed I'd actually be in favor of the change from rsrc-resource, I feel like rsrc is a pretty opaque abbreviation. I'd just like to remind everyone that these changes are not free. Database migrations are a pain to manage, and every new one slows down our unit tests. We now support multiple heat-engines connected to the same database and people want to upgrade their installations, so that means we have to be able to handle different versions talking to the same database. Unless somebody has a bright idea I haven't thought of, I assume that means carrying code to handle both versions for 6 months before actually being able to implement the migration. Or are we saying that you have to completely shut down all instances of Heat to do an upgrade? The name of the nova_instance column is so egregiously misleading that it's probably worth the pain. Using an enumeration for the states will save a lot of space in the database (though it would be a much more obvious win if we were querying on those columns). Changing a random prefix that was added to avoid a namespace conflict to a slightly different random prefix is well below the cost-benefit line IMO. In past lives managing apps like Heat, We've always kept supporting the previous schema in new code versions. So the process is: * Upgrade all code * Restart all services * Upgrade database schema * Wait a bit for reverts * Remove backward compatibility Now this was always in more of a continuous delivery environment, so there was not more than a few weeks of waiting for reverts. In OpenStack we'd have a single release to wait. We're not special though, doesn't Nova have some sort of object versioning code that helps them manage the versions of each type of data for this very purpose? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [gate] The gate: a failure analysis
Thanks Matthew for the analysis. I think you missed something though. Right now the frustration is that unrelated intermittent bugs stop your presumably good change from getting in. Without gating, the result would be that even more bugs, many of them not intermittent at all, would get in. Right now, the one random developer who has to hunt down the rechecks and do them is inconvenienced. But without a gate, _every single_ developer will be inconvenienced until the fix is merged. The false negative rate is _way_ too high. Nobody would disagree there. However, adding more false negatives and allowing more people to ignore the ones we already have, seems like it would have the opposite effect: Now instead of annoying the people who hit the random intermittent bugs, we'll be annoying _everybody_ as they hit the non-intermittent ones. Excerpts from Matthew Booth's message of 2014-07-21 03:38:07 -0700: On Friday evening I had a dependent series of 5 changes all with approval waiting to be merged. These were all refactor changes in the VMware driver. The changes were: * VMware: DatastorePath join() and __eq__() https://review.openstack.org/#/c/103949/ * VMware: use datastore classes get_allowed_datastores/_sub_folder https://review.openstack.org/#/c/103950/ * VMware: use datastore classes in file_move/delete/exists, mkdir https://review.openstack.org/#/c/103951/ * VMware: Trivial indentation cleanups in vmops https://review.openstack.org/#/c/104149/ * VMware: Convert vmops to use instance as an object https://review.openstack.org/#/c/104144/ The last change merged this morning. In order to merge these changes, over the weekend I manually submitted: * 35 rechecks due to false negatives, an average of 7 per change * 19 resubmissions after a change passed, but its dependency did not Other interesting numbers: * 16 unique bugs * An 87% false negative rate * 0 bugs found in the change under test Because we don't fail fast, that is an average of at least 7.3 hours in the gate. Much more in fact, because some runs fail on the second pass, not the first. Because we don't resubmit automatically, that is only if a developer is actively monitoring the process continuously, and resubmits immediately on failure. In practise this is much longer, because sometimes we have to sleep. All of the above numbers are counted from the change receiving an approval +2 until final merging. There were far more failures than this during the approval process. Why do we test individual changes in the gate? The purpose is to find errors *in the change under test*. By the above numbers, it has failed to achieve this at least 16 times previously. Probability of finding a bug in the change under test: Small Cost of testing: High Opportunity cost of slowing development: High and for comparison: Cost of reverting rare false positives:Small The current process expends a lot of resources, and does not achieve its goal of finding bugs *in the changes under test*. In addition to using a lot of technical resources, it also prevents good change from making its way into the project and, not unimportantly, saps the will to live of its victims. The cost of the process is overwhelmingly greater than its benefits. The gate process as it stands is a significant net negative to the project. Does this mean that it is worthless to run these tests? Absolutely not! These tests are vital to highlight a severe quality deficiency in OpenStack. Not addressing this is, imho, an existential risk to the project. However, the current approach is to pick contributors from the community at random and hold them personally responsible for project bugs selected at random. Not only has this approach failed, it is impractical, unreasonable, and poisonous to the community at large. It is also unrelated to the purpose of gate testing, which is to find bugs *in the changes under test*. I would like to make the radical proposal that we stop gating on CI failures. We will continue to run them on every change, but only after the change has been successfully merged. Benefits: * Without rechecks, the gate will use 8 times fewer resources. * Log analysis is still available to indicate the emergence of races. * Fixes can be merged quicker. * Vastly less developer time spent monitoring gate failures. Costs: * A rare class of merge bug will make it into master. Note that the benefits above will also offset the cost of resolving this rare class of merge bug. Of course, we still have the problem of finding resources to monitor and fix CI failures. An additional benefit of not gating on CI will be that we can no longer pretend that picking developers for project-affecting bugs by lottery is likely to achieve results. As a project we need to understand the importance of CI failures. We need a proper
Re: [openstack-dev] [Tripleo][Heat] Heat is not able to create swift cloud server
Excerpts from Peeyush Gupta's message of 2014-07-20 23:13:16 -0700: Hi all, I have been trying to set up tripleo using instack with RDO. Now, when deploying overcloud, the script is failing consistently with CREATE_FAILED error: + heat stack-create -f overcloud.yaml -P AdminToken=efe958561450ba61d7ef8249d29b0be1ba95dc11 -P AdminPassword=2b919f2ac7790ca1053ac58bc4621ca0967a0cba -P CinderPassword=e7d61883a573a3dffc65a5fb958c94686baac848 -P GlancePassword=cb896d6392e08241d504f3a0a2b489fc6f2612dd -P HeatPassword=7a3138ef58365bb666cb30c8377447b74e75a0ef -P NeutronPassword=4480ec8f2e004be4b06d14e1e228d882e18b3c2c -P NovaPassword=e4a34b6caeeb7dbc497fb1c557a396c422b4d103 -P NeutronPublicInterface=eth0 -P SwiftPassword=ed3761a03959e0d636b8d6fc826103734069f9dc -P SwiftHashSuffix=1a26593813bb7d6b38418db747b4243d4f1b5a56 -P NovaComputeLibvirtType=qemu -P 'GlanceLogFile='\'''\''' -P NeutronDnsmasqOptions=dhcp-option-force=26,1400 overcloud +--+++--+ | id | stack_name | stack_status | creation_time | +--+++--+ | 737ada9f-aa45-45b6-a42b-c0a496d2407e | overcloud | CREATE_IN_PROGRESS | 2014-07-21T06:02:22Z | +--+++--+ + tripleo wait_for_stack_ready 220 10 overcloud Command output matched 'CREATE_FAILED'. Exiting... Here is the heat log: 2014-07-18 06:51:11.884 30750 WARNING heat.common.keystoneclient [-] stack_user_domain ID not set in heat.conf falling back to using default 2014-07-18 06:51:12.921 30750 WARNING heat.common.keystoneclient [-] stack_user_domain ID not set in heat.conf falling back to using default 2014-07-18 06:51:16.058 30750 ERROR heat.engine.resource [-] CREATE : Server SwiftStorage0 [07e42c3d-0f1b-4bb9-b980-ffbb74ac770d] Stack overcloud [0ca028e7-682b-41ef-8af0-b2eb67bee272] 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource Traceback (most recent call last): 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource File /usr/lib/python2.7/site-packages/heat/engine/resource.py, line 420, in _do_action 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource while not check(handle_data): 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource File /usr/lib/python2.7/site-packages/heat/engine/resources/server.py, line 545, in check_create_complete 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource return self._check_active(server) 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource File /usr/lib/python2.7/site-packages/heat/engine/resources/server.py, line 561, in _check_active 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource raise exc 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource Error: Creation of server overcloud-SwiftStorage0-qdjqbif6peva failed. 2014-07-18 06:51:16.058 30750 TRACE heat.engine.resource 2014-07-18 06:51:16.255 30750 WARNING heat.common.keystoneclient [-] stack_user_domain ID not set in heat.conf falling back to using default 2014-07-18 06:51:16.939 30750 WARNING heat.common.keystoneclient [-] stack_user_domain ID not set in heat.conf falling back to using default 2014-07-18 06:51:17.368 30750 WARNING heat.common.keystoneclient [-] stack_user_domain ID not set in heat.conf falling back to using default 2014-07-18 06:51:17.638 30750 WARNING heat.common.keystoneclient [-] stack_user_domain ID not set in heat.conf falling back to using default 2014-07-18 06:51:18.158 30750 WARNING heat.common.keystoneclient [-] stack_user_domain ID not set in heat.conf falling back to using default 2014-07-18 06:51:18.613 30750 WARNING heat.common.keystoneclient [-] stack_user_domain ID not set in heat.conf falling back to using default 2014-07-18 06:51:19.113 30750 WARNING heat.common.keystoneclient [-] stack_user_domain ID not set in heat.conf falling back to using default 2014-07-18 06:51:19.765 30750 WARNING heat.common.keystoneclient [-] stack_user_domain ID not set in heat.conf falling back to using default 2014-07-18 06:51:20.247 30750 WARNING heat.engine.service [-] Stack create failed, status FAILED How can I resolve this? Heat is just responding to Nova. You need to look at nova and find out why that server failed. 'nova show overcloud-SwiftStorage0-qdjqbif6peva' should work. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] os-refresh-config run frequency
Excerpts from Dan Prince's message of 2014-07-20 11:51:27 -0700: On Thu, 2014-07-17 at 15:54 +0100, Michael Kerrin wrote: On Thursday 26 June 2014 12:20:30 Clint Byrum wrote: Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-26 04:13:31 -0700: Hi all, I've been working more and more with TripleO recently and whilst it does seem to solve a number of problems well, I have found a couple of idiosyncrasies that I feel would be easy to address. My primary concern lies in the fact that os-refresh-config does not run on every boot/reboot of a system. Surely a reboot *is* a configuration change and therefore we should ensure that the box has come up in the expected state with the correct config? This is easily fixed through the addition of an @reboot entry in /etc/crontab to run o-r-c or (less easily) by re-designing o-r-c to run as a service. My secondary concern is that through not running os-refresh-config on a regular basis by default (i.e. every 15 minutes or something in the same style as chef/cfengine/puppet), we leave ourselves exposed to someone trying to make a quick fix to a production node and taking that node offline the next time it reboots because the config was still left as broken owing to a lack of updates to HEAT (I'm thinking a quick change to allow root access via SSH during a major incident that is then left unchanged for months because no-one updated HEAT). There are a number of options to fix this including Modifying os-collect-config to auto-run os-refresh-config on a regular basis or setting os-refresh-config to be its own service running via upstart or similar that triggers every 15 minutes I'm sure there are other solutions to these problems, however I know from experience that claiming this is solved through education of users or (more severely!) via HR is not a sensible approach to take as by the time you realise that your configuration has been changed for the last 24 hours it's often too late! So I see two problems highlighted above. 1) We don't re-assert ephemeral state set by o-r-c scripts. You're right, and we've been talking about it for a while. The right thing to do is have os-collect-config re-run its command on boot. I don't think a cron job is the right way to go, we should just have a file in /var/run that is placed there only on a successful run of the command. If that file does not exist, then we run the command. I've just opened this bug in response: https://bugs.launchpad.net/os-collect-config/+bug/1334804 I have been looking into bug #1334804 and I have a review up to resolve it. I want to highlight something. Currently on a reboot we start all services via upstart (on debian anyways) and there have been quite a lot of issues around this - missing upstart scripts and timing issues. I don't know the issues on fedora. So with a fix to #1334804, on a reboot upstart will start all the services first (with potentially out-of-date configuration), then o-c-c will start o-r-c and will now configure all services and restart them or start them if upstart isn't configured properly. I would like to turn off all boot scripts for services we configure and leave all this to o-r-c. I think this will simplify things and put us in control of starting services. I believe that it will also narrow the gap between fedora and debian or debian and debian so what works on one should work on the other and make it easier for developers. I'm not sold on this approach. At the very least I think we want to make this optional because not all deployments may want to have o-r-c be the central service starting agent. So I'm opposed to this being our (only!) default... I felt this way too. However, I'm open to it because I am worried that it is a bit idealistic without much justification for being so. We know o-r-c will be there, and really must be there. We're already saying it needs to run to assert ephemeral state, and one thing ephemeral is things started. Now, we can, and maybe even should, take a hard line long term that o-r-c does not do this. That it stores everything in system level configs that are started in the normal system boot. I _want_ this to be the case. But thus far, we've failed to assert that and things have occasionally been very broken on reboot. Short of forcing a reboot in every CI run, we're going to have trouble detecting this. So, I think we have two options: 1) O-r-c doing the asserting, with which we can more or less predict that subsequent boots will work in the same manner as the first boot. 2
Re: [openstack-dev] [heat] health maintenance in autoscaling groups
Excerpts from Mike Spreitzer's message of 2014-07-18 09:12:21 -0700: Thomas Herve thomas.he...@enovance.com wrote on 07/17/2014 02:06:13 AM: There are 4 resources related to neutron load balancing. OS::Neutron::LoadBalancer is probably the least useful and the one you can *not* use, as it's only there for compatibility with AWS::AutoScaling::AutoScalingGroup. OS::Neutron::HealthMonitor does the health checking part, although maybe not in the way you want it. OK, let's work with these. My current view is this: supposing the Convergence work delivers monitoring of health according to a member's status in its service and reacts accordingly, the gaps (compared to AWS functionality) are the abilities to (1) get member health from application level pings (e.g., URL polling) and (2) accept member health declarations from an external system, with consistent reaction to health information from all sources. Convergence will not deliver monitoring, though I understand how one might have that misunderstanding. Convergence will check with the API that controls a physical resource to determine what Heat should consider its status to be for the purpose of ongoing orchestration. Source (1) is what an OS::Neutron::HealthMonitor specifies, and an OS::Neutron::Pool is the thing that takes such a spec. So we could complete the (1) part if there were a way to tell a scaling group to poll the member health information developed by an OS::Neutron::Pool. Does that look like the right approach? For (2), this would amount to having an API that an external system (with proper authorization) can use to declare member health. In the grand and glorious future when scaling groups have true APIs rather than being Heat hacks, such a thing would be part of those APIs. In the immediate future we could simply add this to the Heat API. Such an operation would take somethings like a stack name or UUID, the name or UUID of a resource that is a scaling group, and the member name or UUID of the Resource whose health is being declared, and health_status=unhealthy. Does that look about right? Isn't (2) covered already by the cloudwatch API in Heat? I am going to claim ignorance of it a bit, as I've never used it, but it seems like the same thing. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] health maintenance in autoscaling groups
Excerpts from Mike Spreitzer's message of 2014-07-18 10:38:32 -0700: Clint Byrum cl...@fewbar.com wrote on 07/18/2014 12:56:32 PM: Excerpts from Mike Spreitzer's message of 2014-07-18 09:12:21 -0700: ... OK, let's work with these. My current view is this: supposing the Convergence work delivers monitoring of health according to a member's status in its service and reacts accordingly, the gaps (compared to AWS functionality) are the abilities to (1) get member health from application level pings (e.g., URL polling) and (2) accept member health declarations from an external system, with consistent reaction to health information from all sources. Convergence will not deliver monitoring, though I understand how one might have that misunderstanding. Convergence will check with the API that controls a physical resource to determine what Heat should consider its status to be for the purpose of ongoing orchestration. If I understand correctly, your point is that healing is not automatic. Since a scaling group is a nested stack, the observing part of Convergence will automatically note in the DB when the physical resource behind a scaling group member (in its role as a stack resource) is deleted. And when convergence engine gets around to acting on that Resource, the backing physical resource will be automatically re-created. But there is nothing that automatically links the notice of divergence to the converging action. Have I got that right? Yes you have it right. I just wanted to be clear, that is not monitoring. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] fair standards for all hypervisor drivers
Excerpts from Chris Friesen's message of 2014-07-16 11:38:44 -0700: On 07/16/2014 11:59 AM, Monty Taylor wrote: On 07/16/2014 07:27 PM, Vishvananda Ishaya wrote: This is a really good point. As someone who has to deal with packaging issues constantly, it is odd to me that libvirt is one of the few places where we depend on upstream packaging. We constantly pull in new python dependencies from pypi that are not packaged in ubuntu. If we had to wait for packaging before merging the whole system would grind to a halt. I think we should be updating our libvirt version more frequently vy installing from source or our own ppa instead of waiting for the ubuntu team to package it. Shrinking in terror from what I'm about to say ... but I actually agree with this, There are SEVERAL logistical issues we'd need to sort, not the least of which involve the actual mechanics of us doing that and properly gating,etc. But I think that, like the python depends where we tell distros what version we _need_ rather than using what version they have, libvirt, qemu, ovs and maybe one or two other things are areas in which we may want or need to have a strongish opinion. I'll bring this up in the room tomorrow at the Infra/QA meetup, and will probably be flayed alive for it - but maybe I can put forward a straw-man proposal on how this might work. How would this work...would you have them uninstall the distro-provided libvirt/qemu and replace them with newer ones? (In which case what happens if the version desired by OpenStack has bugs in features that OpenStack doesn't use, but that some other software that the user wants to run does use?) Or would you have OpenStack versions of them installed in parallel in an alternate location? Yes. See: docker, lxc, chroot. (Listed in descending hipsterness order). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] os-refresh-config run frequency
Excerpts from Michael Kerrin's message of 2014-07-17 07:54:26 -0700: On Thursday 26 June 2014 12:20:30 Clint Byrum wrote: Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-26 04:13:31 -0700: Hi all, I've been working more and more with TripleO recently and whilst it does seem to solve a number of problems well, I have found a couple of idiosyncrasies that I feel would be easy to address. My primary concern lies in the fact that os-refresh-config does not run on every boot/reboot of a system. Surely a reboot *is* a configuration change and therefore we should ensure that the box has come up in the expected state with the correct config? This is easily fixed through the addition of an @reboot entry in /etc/crontab to run o-r-c or (less easily) by re-designing o-r-c to run as a service. My secondary concern is that through not running os-refresh-config on a regular basis by default (i.e. every 15 minutes or something in the same style as chef/cfengine/puppet), we leave ourselves exposed to someone trying to make a quick fix to a production node and taking that node offline the next time it reboots because the config was still left as broken owing to a lack of updates to HEAT (I'm thinking a quick change to allow root access via SSH during a major incident that is then left unchanged for months because no-one updated HEAT). There are a number of options to fix this including Modifying os-collect-config to auto-run os-refresh-config on a regular basis or setting os-refresh-config to be its own service running via upstart or similar that triggers every 15 minutes I'm sure there are other solutions to these problems, however I know from experience that claiming this is solved through education of users or (more severely!) via HR is not a sensible approach to take as by the time you realise that your configuration has been changed for the last 24 hours it's often too late! So I see two problems highlighted above. 1) We don't re-assert ephemeral state set by o-r-c scripts. You're right, and we've been talking about it for a while. The right thing to do is have os-collect-config re-run its command on boot. I don't think a cron job is the right way to go, we should just have a file in /var/run that is placed there only on a successful run of the command. If that file does not exist, then we run the command. I've just opened this bug in response: https://bugs.launchpad.net/os-collect-config/+bug/1334804 I have been looking into bug #1334804 and I have a review up to resolve it. I want to highlight something. Currently on a reboot we start all services via upstart (on debian anyways) and there have been quite a lot of issues around this - missing upstart scripts and timing issues. I don't know the issues on fedora. So with a fix to #1334804, on a reboot upstart will start all the services first (with potentially out-of-date configuration), then o-c-c will start o-r- c and will now configure all services and restart them or start them if upstart isn't configured properly. I would like to turn off all boot scripts for services we configure and leave all this to o-r-c. I think this will simplify things and put us in control of starting services. I believe that it will also narrow the gap between fedora and debian or debian and debian so what works on one should work on the other and make it easier for developers. Agreed, and that is actually really simple. I hate to steal your thunder, but this is the patch: https://review.openstack.org/107772 Having the ability to service nova-api stop|start|restart is very handy but this will be a manually thing and I intend to leave that there. What do people think and how best do I push this forward. I feel that this leads into the the re-assert-system-state spec but mainly I think this is a bug and doesn't require a spec. I will be at the tripleo mid-cycle meetup next and willing to discuss this with anyone interested in this and put together the necessary bits to make this happen. As I said, it is simple. :) I suggest testing the patch above and adding anything I missed to it. Systemd based systems will likely need something different. I'm still burying my head int he sand and not learning systemd, but perhaps a follow-up patch from somebody who understands it can make those systems do the same thing. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat]Heat Db Model updates
Excerpts from Manickam, Kanagaraj's message of 2014-07-16 20:48:04 -0700: Event Why uuid and id both used? The event uuid is the user-facing ID. However, we need to return events to the user in insertion order. So we use an auto-increment primary key, and order by that in 'heat event-list stack_name'. We don't want to expose that integer to the user though, because knowing the rate at which these integers increase would reveal a lot about the goings on inside Heat. Resource_action is being used in both event and resource table, so it should be moved to common table If we're joining to resource already o-k, but it is worth noting that there is a desire to not use a SQL table for event storage. Maintaining those events on a large, busy stack will be expensive. The simpler solution is to just write batches of event files into swift. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] health maintenance in autoscaling groups
Excerpts from Mike Spreitzer's message of 2014-07-16 10:50:42 -0700: Clint Byrum cl...@fewbar.com wrote on 07/02/2014 01:54:49 PM: Excerpts from Qiming Teng's message of 2014-07-02 00:02:14 -0700: Just some random thoughts below ... On Tue, Jul 01, 2014 at 03:47:03PM -0400, Mike Spreitzer wrote: In AWS, an autoscaling group includes health maintenance functionality --- both an ability to detect basic forms of failures and an abilityto react properly to failures detected by itself or by a load balancer. What is the thinking about how to get this functionality in OpenStack? Since We are prototyping a solution to this problem at IBM Research - China lab. The idea is to leverage oslo.messaging and ceilometer events for instance (possibly other resource such as port, securitygroup ...) failure detection and handling. Hm.. perhaps you should be contributing some reviews here as you may have some real insight: https://review.openstack.org/#/c/100012/ This sounds a lot like what we're working on for continuous convergence. I noticed that health checking in AWS goes beyond convergence. In AWS an ELB can be configured with a URL to ping, for application-level health checking. And an ASG can simply be *told* the health of a member by a user's own external health system. I think we should have analogous functionality in OpenStack. Does that make sense to you? If so, do you have any opinion on the right way to integrate, so that we do not have three completely independent health maintenance systems? The check url is already a part of Neutron LBaaS IIRC. What may not be a part is notifications for when all members are reporting down (which might be something to trigger scale-up). If we don't have push checks in our auto scaling implementation then we don't have a proper auto scaling implementation. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Cinder coverage
Excerpts from Dan Prince's message of 2014-07-16 09:50:51 -0700: Hi TripleO! It would appear that we have no coverage in devtest which ensures that Cinder consistently works in the overcloud. As such the TripleO Cinder elements are often broken (as of today I can't fully use lio or tgt w/ upstream TripleO elements). How do people feel about swapping out our single 'nova boot' command to boot from a volume. Something like this: https://review.openstack.org/#/c/107437 There is a bit of tradeoff here in that the conversion will take a bit of time (qemu-img has to run). Also our boot code path won't be exactly the same as booting from an image. Long term we want to run Tempest but due to resource constraints we can't do that today. Until then this sort of deep systems test (running a command that exercises more code) might serve us well and give us the Cinder coverage we need. Thoughts? Tempest is a stretch goal. Given our long test times, until we get them down, I don't know if we can even flirt with tempest other than the most basic smoke tests. So yes, I like the idea of having our one smoke test be as wide as possible. Later on we can add Heat coverage by putting said smoke test into a Heat template. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
Excerpts from Victor Stinner's message of 2014-07-10 05:57:38 -0700: Le jeudi 10 juillet 2014, 14:48:04 Yuriy Taraday a écrit : I'm not suggesting that taskflow is useless and asyncio is better (apple vs oranges). I'm saying that using coroutines (asyncio) can improve ways we can use taskflow and provide clearer method of developing these flows. This was mostly response to the this is impossible with coroutines. I say it is possible and it can even be better. It would be nice to modify taskflow to support trollius coroutines. Coroutines supports asynchronous operations and has a better syntax than callbacks. You mean like this: https://review.openstack.org/#/c/90881/1/taskflow/engines/action_engine/executor.py Abandoned, but I think Josh is looking at it. :) For Mark's spec, add a new greenio executor to Oslo Messaging: I don't see the direct link to taskflow. taskflow can use Oslo Messaging to call RPC, but I don't see how to use taskflow internally to read a socket (driver), wait for the completion of the callback and then send back the result to the socket (driver). So oslo and the other low level bits are going to need to be modified to support coroutines. That is definitely something that will make them more generally useful anyway. I don't think Josh or I meant to get in the way of that. However, having this available is a step toward removing eventlet and doing the painful work to switch to asyncio. Josh's original email was in essence a reminder that we should consider a layer on top of asyncio and eventlet alike, so that the large scale code changes only happen once. I see trollius as a low-level tool to handle simple asynchronous operations, whereas taskflow is more high level to chain correctly more complex operations. _yes_ trollius and taskflow must not be exclusive options, they should cooperate, as we plan to support trollius coroutines in Oslo Messaging. In fact they are emphatically not exclusive. However, considering the order of adoption should produce a little less chaos for the project. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
Excerpts from Yuriy Taraday's message of 2014-07-09 03:36:00 -0700: On Tue, Jul 8, 2014 at 11:31 PM, Joshua Harlow harlo...@yahoo-inc.com wrote: I think clints response was likely better than what I can write here, but I'll add-on a few things, How do you write such code using taskflow? @asyncio.coroutine def foo(self): result = yield from some_async_op(...) return do_stuff(result) The idea (at a very high level) is that users don't write this; What users do write is a workflow, maybe the following (pseudocode): # Define the pieces of your workflow. TaskA(): def execute(): # Do whatever some_async_op did here. def revert(): # If execute had any side-effects undo them here. TaskFoo(): ... # Compose them together flow = linear_flow.Flow(my-stuff).add(TaskA(my-task-a), TaskFoo(my-foo)) I wouldn't consider this composition very user-friendly. I find it extremely user friendly when I consider that it gives you clear lines of delineation between the way it should work and what to do when it breaks. # Submit the workflow to an engine, let the engine do the work to execute it (and transfer any state between tasks as needed). The idea here is that when things like this are declaratively specified the only thing that matters is that the engine respects that declaration; not whether it uses asyncio, eventlet, pigeons, threads, remote workers[1]. It also adds some things that are not (imho) possible with co-routines (in part since they are at such a low level) like stopping the engine after 'my-task-a' runs and shutting off the software, upgrading it, restarting it and then picking back up at 'my-foo'. It's absolutely possible with coroutines and might provide even clearer view of what's going on. Like this: @asyncio.coroutine def my_workflow(ctx, ...): project = yield from ctx.run_task(create_project()) # Hey, we don't want to be linear. How about parallel tasks? volume, network = yield from asyncio.gather( ctx.run_task(create_volume(project)), ctx.run_task(create_network(project)), ) # We can put anything here - why not branch a bit? if create_one_vm: yield from ctx.run_task(create_vm(project, network)) else: # Or even loops - why not? for i in range(network.num_ips()): yield from ctx.run_task(create_vm(project, network)) Sorry but the code above is nothing like the code that Josh shared. When create_network(project) fails, how do we revert its side effects? If we want to resume this flow after reboot, how does that work? I understand that there is a desire to write everything in beautiful python yields, try's, finally's, and excepts. But the reality is that python's stack is lost the moment the process segfaults, power goes out on that PDU, or the admin rolls out a new kernel. We're not saying asyncio vs. taskflow. I've seen that mistake twice already in this thread. Josh and I are suggesting that if there is a movement to think about coroutines, there should also be some time spent thinking at a high level: how do we resume tasks, revert side effects, and control flow? If we embed taskflow deep in the code, we get those things, and we can treat tasks as coroutines and let taskflow's event loop be asyncio just the same. If we embed asyncio deep into the code, we don't get any of the high level functions and we get just as much code churn. There's no limit to coroutine usage. The only problem is the library that would bind everything together. In my example run_task will have to be really smart, keeping track of all started tasks, results of all finished ones, skipping all tasks that have already been done (and substituting already generated results). But all of this is doable. And I find this way of declaring workflows way more understandable than whatever would it look like with Flow.add's The way the flow is declared is important, as it leads to more isolated code. The single place where the flow is declared in Josh's example means that the flow can be imported, the state deserialized and inspected, and resumed by any piece of code: an API call, a daemon start up, an admin command, etc. I may be wrong, but it appears to me that the context that you built in your code example is hard, maybe impossible, to resume after a process restart unless _every_ task is entirely idempotent and thus can just be repeated over and over. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [TripleO] Proposal to add Jon Paul Sullivan and Alexis Lee to core review team
Hello! I've been looking at the statistics, and doing a bit of review of the reviewers, and I think we have an opportunity to expand the core reviewer team in TripleO. We absolutely need the help, and I think these two individuals are well positioned to do that. I would like to draw your attention to this page: http://russellbryant.net/openstack-stats/tripleo-reviewers-90.txt Specifically these two lines: +---+---++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +---+---++ | jonpaul-sullivan | 1880 43 145 0 077.1% | 28 ( 14.9%) | | lxsli | 1860 23 163 0 087.6% | 27 ( 14.5%) | Note that they are right at the level we expect, 3 per work day. And I've looked through their reviews and code contributions: it is clear that they understand what we're trying to do in TripleO, and how it all works. I am a little dismayed at the slightly high disagreement rate, but looking through the disagreements, most of them were jp and lxsli being more demanding of submitters, so I am less dismayed. So, I propose that we add jonpaul-sullivan and lxsli to the TripleO core reviewer team. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] stevedore plugins (and wait conditions)
Excerpts from Randall Burt's message of 2014-07-09 15:33:26 -0700: On Jul 9, 2014, at 4:38 PM, Zane Bitter zbit...@redhat.com wrote: On 08/07/14 17:17, Steven Hardy wrote: Regarding forcing deployers to make a one-time decision, I have a question re cost (money and performance) of the Swift approach vs just hitting the Heat API - If folks use the Swift resource and it stores data associated with the signal in Swift, does that incurr cost to the user in a public cloud scenario? Good question. I believe the way WaitConditions work in AWS is that it sets up a pre-signed URL in a bucket owned by CloudFormation. If we went with that approach we would probably want some sort of quota, I imagine. Just to clarify, you suggest that the swift-based signal mechanism use containers that Heat owns rather than ones owned by the user? +1, don't hide it. The other approach is to set up a new container, owned by the user, every time. In that case, a provider selecting this implementation would need to make it clear to customers if they would be billed for a WaitCondition resource. I'd prefer to avoid this scenario though (regardless of the plug-point). Why? If we won't let the user choose, then why wouldn't we let the provider make this choice? I don't think its wise of us to make decisions based on what a theoretical operator may theoretically do. If the same theoretical provider were to also charge users to create a trust, would we then be concerned about that implementation as well? What if said provider decides charges the user per resource in a stack regardless of what they are? Having Heat own the container(s) as suggested above doesn't preclude that operator from charging the stack owner for those either. This is a nice use case for preview. A user should be able to preview a stack and know what will be consumed. Wait conditions will show a swift container if preview is worth anything. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
Excerpts from Victor Stinner's message of 2014-07-08 05:47:36 -0700: Hi Joshua, You asked a lot of questions. I will try to answer. Le lundi 7 juillet 2014, 17:41:34 Joshua Harlow a écrit : * Why focus on a replacement low level execution model integration instead of higher level workflow library or service (taskflow, mistral... other) integration? I don't know tasklow, I cannot answer to this question. How do you write such code using taskflow? @asyncio.coroutine def foo(self): result = yield from some_async_op(...) return do_stuff(result) Victor, this is a low level piece of code, which highlights the problem that taskflow's higher level structure is meant to address. In writing OpenStack, we want to accomplish tasks based on a number of events. Users, errors, etc. We don't explicitly want to run coroutines, we want to attach volumes, spawn vms, and store files. See this: http://docs.openstack.org/developer/taskflow/examples.html The result is consumed in the next task in the flow. Meanwhile we get a clear definition of work-flow and very clear methods for resumption, retry, etc. So the expression is not as tightly bound as the code above, but that is the point, because we want to break things up into tasks which are clearly defined and then be able to resume each one individually. So what I think Josh is getting at, is that we could add asyncio support into taskflow as an abstraction for tasks that want to be non-blocking, and then we can focus on refactoring the code around high level work-flow expression rather than low level asyncio and coroutines. * Was the heat (asyncio-like) execution model[1] examined and learned from before considering moving to asyncio? I looked at Heat coroutines, but it has a design very different from asyncio. In short, asyncio uses an event loop running somewhere in the background, whereas Heat explicitly schedules the execution of some tasks (with TaskRunner), blocks until it gets the result and then stop completly its event loop. It's possible to implement that with asyncio, there is for example a run_until_complete() method stopping the event loop when a future is done. But asyncio event loop is designed to run forever, so various projects can run tasks at the same time, not only a very specific section of the code to run a set of tasks. asyncio is not only designed to schedule callbacks, it's also designed to manager file descriptors (especially sockets). It can also spawn and manager subprocessses. This is not supported by Heat scheduler. IMO Heat scheduler is too specific, it cannot be used widely in OpenStack. This is sort of backwards to what Josh was suggesting. Heat can't continue with the current approach, which is coroutine based, because we need the the execution stack to not be in RAM on a single engine. We are going to achieve even more concurrency than we have now through an even higher level of task abstraction as part of the move to a convergence model. We will likely use task-flow to express these tasks so that they are more resumable and generally resilient to failure. Along a related question, seeing that openstack needs to support py2.x and py3.x will this mean that trollius will be required to be used in 3.x (as it is the least common denominator, not new syntax like 'yield from' that won't exist in 2.x). Does this mean that libraries that will now be required to change will be required to use trollius (the pulsar[6] framework seemed to mesh these two nicely); is this understood by those authors? It *is* possible to write code working on asyncio and trollius: http://trollius.readthedocs.org/#write-code-working-on-trollius-and-tulip They are different options for that. They are already projects supporting asyncio and trollius. Is this the direction we want to go down (if we stay focused on ensuring py3.x compatible, then why not just jump to py3.x in the first place)? FYI OpenStack does not support Python 3 right now. I'm working on porting OpenStack to Python 3, we made huge progress, but it's not done yet. Anyway, the new RHEL 7 release doesn't provide Python 3.3 in the default system, you have to enable the SCL repository (which provides Python 3.3). And Python 2.7 or even 2.6 is still used in production. I would also prefer to use directly yield from and just drop Python 2 support. But dropping Python 2 support is not going to happen before at least 2 years. Long term porting is important, however, we have immediate needs for improvements in resilience and scalability. We cannot hang _any_ of that on Python 3. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Heat] [Marconi] Heat and concurrent signal processing needs some deep thought
I just noticed this review: https://review.openstack.org/#/c/90325/ And gave it some real thought. This will likely break any large scale usage of signals, and I think breaks the user expectations. Nobody expects to get a failure for a signal. It is one of those things that you fire and forget. I'm done, deal with it. If we start returning errors, or 409's or 503's, I don't think users are writing their in-instance initialization tooling to retry. I think we need to accept it and reliably deliver it. Does anybody have any good ideas for how to go forward with this? I'd much rather borrow a solution from some other project than try to invent something for Heat. I've added Marconi as I suspect there has already been some thought put into how a user-facing set of tools would send messages. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
Excerpts from Joshua Harlow's message of 2014-07-07 10:41:34 -0700: So I've been thinking how to respond to this email, and here goes (shields up!), First things first; thanks mark and victor for the detailed plan and making it visible to all. It's very nicely put together and the amount of thought put into it is great to see. I always welcome an effort to move toward a new structured explicit programming model (which asyncio clearly helps make possible and strongly encourages/requires). I too appreciate the level of detail in the proposal. I think I understand where it wants to go. So now to some questions that I've been thinking about how to address/raise/ask (if any of these appear as FUD, they were not meant to be): * Why focus on a replacement low level execution model integration instead of higher level workflow library or service (taskflow, mistral... other) integration? Since pretty much all of openstack is focused around workflows that get triggered by some API activated by some user/entity having a new execution model (asyncio) IMHO doesn't seem to be shifting the needle in the direction that improves the scalability, robustness and crash-tolerance of those workflows (and the associated projects those workflows are currently defined reside in). I *mostly* understand why we want to move to asyncio (py3, getting rid of eventlet, better performance? new awesomeness...) but it doesn't feel that important to actually accomplish seeing the big holes that openstack has right now with scalability, robustness... Let's imagine a different view on this; if all openstack projects declaratively define the workflows there APIs trigger (nova is working on task APIs, cinder is getting there to...), and in the future when the projects are *only* responsible for composing those workflows and handling the API inputs responses then the need for asyncio or other technology can move out from the individual projects and into something else (possibly something that is being built used as we speak). With this kind of approach the execution model can be an internal implementation detail of the workflow 'engine/processor' (it will also be responsible for fault-tolerant, robust and scalable execution). If this seems reasonable, then why not focus on integrating said thing into openstack and move the projects to a model that is independent of eventlet, asyncio (or the next greatest thing) instead? This seems to push the needle in the right direction and IMHO (and hopefully others opinions) has a much bigger potential to improve the various projects than just switching to a new underlying execution model. * Was the heat (asyncio-like) execution model[1] examined and learned from before considering moving to asyncio? I will try not to put words into the heat developers mouths (I can't do it justice anyway, hopefully they can chime in here) but I believe that heat has a system that is very similar to asyncio and coroutines right now and they are actively moving to a different model due to problems in part due to using that coroutine model in heat. So if they are moving somewhat away from that model (to a more declaratively workflow model that can be interrupted and converged upon [2]) why would it be beneficial for other projects to move toward the model they are moving away from (instead of repeating the issues the heat team had with coroutines, ex, visibility into stack/coroutine state, scale limitations, interruptibility...)? I'd like to hear Zane's opinions as he developed the rather light weight code that we use. It has been quite a learning curve for me but I do understand how to use the task scheduler we have in Heat now. Heat's model is similar to asyncio, but is entirely limited in scope. I think it has stayed relatively manageable because it is really only used for a few explicit tasks where a high degree of concurrency makes a lot of sense. We are not using it for I/O concurrency (eventlet still does that) but rather for request concurrency. So we tell nova to boot 100 servers with 100 coroutines that have 100 other coroutines to block further execution until those servers are active. We are by no means using it as a general purpose concurrency programming model. That said, as somebody working on the specification to move toward a more taskflow-like (perhaps even entirely taskflow-based) model in Heat, I think that is the way to go. The fact that we already have an event loop that doesn't need to be explicit except at the very lowest levels makes me want to keep that model. And we clearly need help with how to define workflows, which something like taskflow will do nicely. * A side-question, how do asyncio and/or trollius support debugging, do they support tracing individual co-routines? What about introspecting the state a coroutine has associated with it? Eventlet at least has http://eventlet.net/doc/modules/debug.html (which is
Re: [openstack-dev] [Heat] Upwards-compatibility for HOT
Excerpts from Zane Bitter's message of 2014-07-07 14:25:50 -0700: With the Icehouse release we announced that there would be no further backwards-incompatible changes to HOT without a revision bump. However, I notice that we've already made an upward-incompatible change in Juno: https://review.openstack.org/#/c/102718/ So a user will be able to create a valid template for a Juno (or later) version of Heat with the version heat_template_version: 2013-05-23 but the same template may break on an Icehouse installation of Heat with the stable HOT parser. IMO this is almost equally as bad as breaking backwards compatibility, since a user moving between clouds will generally have no idea whether they are going forward or backward in version terms. Sounds like a bug in Juno that we need to fix. I agree, this is a new template version. (Note: AWS don't use the version field this way, because there is only one AWS and therefore in theory they don't have this problem. This implies that we might need a more sophisticated versioning system.) A good manual with a this was introduced in version X and this was changed in version Y would, IMO be enough, to help users not go crazy and help us know whether something is a bug or not. We can probably achieve this entirely in the in-code template guide. I'd like to propose a policy that we bump the revision of HOT whenever we make a change from the previous stable version, and that we declare the new version stable at the end of each release cycle. Maybe we can post-date it to indicate the policy more clearly. (I'd also like to propose that the Juno version drops cfn-style function support.) Agreed. I'm also curious if we're going to reject a template with version 2013-05-23 that includes list_join. If we don't reject it, we probably need to look at how to show the user warnings about version/feature skew. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][specs] Please stop doing specs for any changes in projects
Excerpts from Dolph Mathews's message of 2014-07-01 10:02:13 -0700: The argument has been made in the past that small features will require correspondingly small specs. If there's a counter-argument to this example (a small feature requiring a relatively large amount of spec effort), I'd love to have links to both the spec and the resulting implementation so we can discuss exactly why the spec was an unnecessary additional effort. Indeed. The line to be drawn isn't around the size, IMO, but around communication. Nobody has the bandwidth to watch all of the git logs. Nobody has the bandwidth to poll all of the developers what has changed in the interfaces available. So the line for me is whether or not users and operators will need to know something is under way and may want to comment _before_ a change to an interface is made. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Use MariaDB by default on Fedora
Excerpts from Michael Kerrin's message of 2014-06-30 02:16:07 -0700: I am trying to finish off https://review.openstack.org/#/c/90134 - percona xtradb cluster for debian based system. I have read into this thread that I can error out on Redhat systems when trying to install percona and tell them to use mariadb instead, percona isn't support here. Is this correct? Probably. But if CI for Fedora breaks as a result you'll need a solution first. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][specs] Please stop doing specs for any changes in projects
Excerpts from Boris Pavlovic's message of 2014-06-30 14:11:08 -0700: Hi all, Specs are interesting idea, that may be really useful, when you need to discuss large topics: 1) work on API 2) Large refactoring 3) Large features 4) Performance, scale, ha, security issues that requires big changes And I really dislike idea of adding spec for every patch. Especially when changes (features) are small, don't affect too much, and they are optional. It really kills OpenStack. And will drastically slow down process of contribution and reduce amount of contributors. Who says there needs to be a spec for every patch? I agree with your items above. Any other change is likely just fixing a bug. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] os-refresh-config run frequency
Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-27 00:14:49 -0700: Hi Clint, -Original Message- From: Clint Byrum [mailto:cl...@fewbar.com] Sent: 26 June 2014 20:21 To: openstack-dev Subject: Re: [openstack-dev] [TripleO] os-refresh-config run frequency So I see two problems highlighted above. 1) We don't re-assert ephemeral state set by o-r-c scripts. You're right, and we've been talking about it for a while. The right thing to do is have os-collect- config re-run its command on boot. I don't think a cron job is the right way to go, we should just have a file in /var/run that is placed there only on a successful run of the command. If that file does not exist, then we run the command. I've just opened this bug in response: https://bugs.launchpad.net/os-collect-config/+bug/1334804 Cool, I'm more than happy for this to be done elsewhere, I'm glad that people are in agreement with me on the concept and that work has already started on this. I'll add some notes to the bug if needed later on today. 2) We don't re-assert any state on a regular basis. So one reason we haven't focused on this, is that we have a stretch goal of running with a readonly root partition. It's gotten lost in a lot of the craziness of just get it working, but with rebuilds blowing away root now, leading to anything not on the state drive (/mnt currently), there's a good chance that this will work relatively well. Now, since people get root, they can always override the readonly root and make changes. golemwe hates thiss!/golem. I'm open to ideas, however, os-refresh-config is definitely not the place to solve this. It is intended as a non-resident command to be called when it is time to assert state. os-collect-config is intended to gather configurations, and expose them to a command that it runs, and thus should be the mechanism by which os- refresh-config is run. I'd like to keep this conversation separate from one in which we discuss more mechanisms to make os-refresh-config robust. There are a bunch of things we can do, but I think we should focus just on how do we re-assert state?. OK, that's fair enough. Because we're able to say right now that it is only for running when config changes, we can wave our hands and say it's ok that we restart everything on every run. As Jan alluded to, that won't work so well if we run it every 20 minutes. Agreed, and chatting with Jan and a couple of others yesterday we came to the conclusion that whatever we do here it will require tweaking of a number of elements to safely restart services. So, I wonder if we can introduce a config version into os-collect-config. Basically os-collect-config would keep a version along with its cache. Whenever a new version is detected, os-collect-config would set a value in the environment that informs the command this is a new version of config. From that, scripts can do things like this: if [ -n $OS_CONFIG_NEW_VERSION ] ; then service X restart else if !service X status ; then service X start fi This would lay the groundwork for future abilities to compare old/new so we can take shortcuts by diffing the two config versions. For instance if we look at old vs. new and we don't see any of the keys we care about changed, we can skip restarting. I like this approach - does this require a new spec? If so, I'll start an etherpad to collect thoughts on it before writing it up for approval. I think this should be a tripleo spec. If you're volunteering write it, hooray \o/. It will require several work items. Off the top of my head: - Add version awareness to os-collect-config - Add version awareness to all os-refresh-config scripts that do disruptive things. - Add periodic command run to os-collect-config Let's call it 're-assert-system-state'. Sound good? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Use MariaDB by default on Fedora
Excerpts from James Slagle's message of 2014-06-27 12:59:36 -0700: Things are a bit confusing right now, especially with what's been proposed. Let me try and clarify (even if just for my own sake). Currently the choices offered are: 1. mysql percona with the percona tarball Percona Xtradb Cluster, not mysql percona 2. mariadb galera with mariadb.org packages 3. mariadb galera with rdo packages And, we're proposing to add 4. mysql percona with percona packages: https://review.openstack.org/#/c/90134 5. mariadb galera with fedora packages https://review.openstack.org/#/c/102815/ 4 replaces 1, but only for Ubuntu/Debian, it doesn't work on Fedora/RH 5 replaces 3 (neither of which work on Ubuntu/Debian, obviously) Do we still need 1? Fedora/RH + percona tarball. I personally don't think so. Do we still need 2? Fedora/RH or Ubuntu/Debian with galera packages from maraidb.org. For the Fedora/RH case, I doubt it, people will just use 5. 3 will be gone (replaced by 5). So, yes, I'd like to see 5 as the default for Fedora/RH and 4 as the default for Ubuntu/Debian, and both those tested in CI. And get rid of (or deprecate) 1-3. I'm actually more confused now than before I read this. The use of numbers is just making my head spin. It can be stated this way I think: On RPM systems, use MariaDB Galera packages. If packages are in the distro, use distro packages. If packages are not in the distro, use RDO packages. On DEB systems, use Percona XtraDB Cluster packages. If packages are in the distro, use distro packages. If packages are not in the distro, use upstream packages. If anything doesn't match those principles, it is a bug. On Thu, Jun 26, 2014 at 5:30 PM, Giulio Fidente gfide...@redhat.com wrote: On 06/26/2014 11:11 AM, Jan Provaznik wrote: On 06/25/2014 06:58 PM, Giulio Fidente wrote: On 06/16/2014 11:14 PM, Clint Byrum wrote: Excerpts from Gregory Haynes's message of 2014-06-16 14:04:19 -0700: Excerpts from Jan Provazník's message of 2014-06-16 20:28:29 +: Hi, MariaDB is now included in Fedora repositories, this makes it easier to install and more stable option for Fedora installations. Currently MariaDB can be used by including mariadb (use mariadb.org pkgs) or mariadb-rdo (use redhat RDO pkgs) element when building an image. What do you think about using MariaDB as default option for Fedora when running devtest scripts? (first, I believe Jan means that MariaDB _Galera_ is now in Fedora) I think so too. Id like to give this a try. This does start to change us from being a deployment of openstck to being a deployment per distro but IMO thats a reasonable position. Id also like to propose that if we decide against doing this then these elements should not live in tripleo-image-elements. I'm not so sure I agree. We have lio and tgt because lio is on RHEL but everywhere else is still using tgt IIRC. However, I also am not so sure that it is actually a good idea for people to ship on MariaDB since it is not in the gate. As it diverges from MySQL (starting in earnest with 10.x), there will undoubtedly be subtle issues that arise. So I'd say having MariaDB get tested along with Fedora will actually improve those users' test coverage, which is a good thing. I am favourable to the idea of switching to mariadb for fedora based distros. Currently the default mysql element seems to be switching [1], yet for ubuntu/debian only, from the percona provided binary tarball of mysql to the percona provided packaged version of mysql. In theory we could further update it to use percona provided packages of mysql on fedora too but I'm not sure there is much interest in using that combination where people gets mariadb and galera from the official repos. IIRC fedora packages for percona xtradb cluster are not provided (unless something has changed recently). I see, so on fedora it will be definitely easier and safer to just use the mariadb/galera packages provided in the official repo ... and this further reinforces my idea that it is the best option to use that by default for fedora -- Giulio Fidente GPG KEY: 08D733BA ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] os-refresh-config run frequency
Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-26 04:13:31 -0700: Hi all, I've been working more and more with TripleO recently and whilst it does seem to solve a number of problems well, I have found a couple of idiosyncrasies that I feel would be easy to address. My primary concern lies in the fact that os-refresh-config does not run on every boot/reboot of a system. Surely a reboot *is* a configuration change and therefore we should ensure that the box has come up in the expected state with the correct config? This is easily fixed through the addition of an @reboot entry in /etc/crontab to run o-r-c or (less easily) by re-designing o-r-c to run as a service. My secondary concern is that through not running os-refresh-config on a regular basis by default (i.e. every 15 minutes or something in the same style as chef/cfengine/puppet), we leave ourselves exposed to someone trying to make a quick fix to a production node and taking that node offline the next time it reboots because the config was still left as broken owing to a lack of updates to HEAT (I'm thinking a quick change to allow root access via SSH during a major incident that is then left unchanged for months because no-one updated HEAT). There are a number of options to fix this including Modifying os-collect-config to auto-run os-refresh-config on a regular basis or setting os-refresh-config to be its own service running via upstart or similar that triggers every 15 minutes I'm sure there are other solutions to these problems, however I know from experience that claiming this is solved through education of users or (more severely!) via HR is not a sensible approach to take as by the time you realise that your configuration has been changed for the last 24 hours it's often too late! So I see two problems highlighted above. 1) We don't re-assert ephemeral state set by o-r-c scripts. You're right, and we've been talking about it for a while. The right thing to do is have os-collect-config re-run its command on boot. I don't think a cron job is the right way to go, we should just have a file in /var/run that is placed there only on a successful run of the command. If that file does not exist, then we run the command. I've just opened this bug in response: https://bugs.launchpad.net/os-collect-config/+bug/1334804 2) We don't re-assert any state on a regular basis. So one reason we haven't focused on this, is that we have a stretch goal of running with a readonly root partition. It's gotten lost in a lot of the craziness of just get it working, but with rebuilds blowing away root now, leading to anything not on the state drive (/mnt currently), there's a good chance that this will work relatively well. Now, since people get root, they can always override the readonly root and make changes. golemwe hates thiss!/golem. I'm open to ideas, however, os-refresh-config is definitely not the place to solve this. It is intended as a non-resident command to be called when it is time to assert state. os-collect-config is intended to gather configurations, and expose them to a command that it runs, and thus should be the mechanism by which os-refresh-config is run. I'd like to keep this conversation separate from one in which we discuss more mechanisms to make os-refresh-config robust. There are a bunch of things we can do, but I think we should focus just on how do we re-assert state?. Because we're able to say right now that it is only for running when config changes, we can wave our hands and say it's ok that we restart everything on every run. As Jan alluded to, that won't work so well if we run it every 20 minutes. So, I wonder if we can introduce a config version into os-collect-config. Basically os-collect-config would keep a version along with its cache. Whenever a new version is detected, os-collect-config would set a value in the environment that informs the command this is a new version of config. From that, scripts can do things like this: if [ -n $OS_CONFIG_NEW_VERSION ] ; then service X restart else if !service X status ; then service X start fi This would lay the groundwork for future abilities to compare old/new so we can take shortcuts by diffing the two config versions. For instance if we look at old vs. new and we don't see any of the keys we care about changed, we can skip restarting. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [hacking] rules for removal
Excerpts from Mark McLoughlin's message of 2014-06-24 12:49:52 -0700: On Tue, 2014-06-24 at 09:51 -0700, Clint Byrum wrote: Excerpts from Monty Taylor's message of 2014-06-24 06:48:06 -0700: On 06/22/2014 02:49 PM, Duncan Thomas wrote: On 22 June 2014 14:41, Amrith Kumar amr...@tesora.com wrote: In addition to making changes to the hacking rules, why don't we mandate also that perceived problems in the commit message shall not be an acceptable reason to -1 a change. -1. There are some /really/ bad commit messages out there, and some of us try to use the commit messages to usefully sort through the changes (i.e. I often -1 in cinder a change only affects one driver and that isn't clear from the summary). If the perceived problem is grammatical, I'm a bit more on board with it not a reason to rev a patch, but core reviewers can +2/A over the top of a -1 anyway... 100% agree. Spelling and grammar are rude to review on - especially since we have (and want) a LOT of non-native English speakers. It's not our job to teach people better grammar. Heck - we have people from different English backgrounds with differing disagreements on what good grammar _IS_ We shouldn't quibble over _anything_ grammatical in a commit message. If there is a disagreement about it, the comments should be ignored. There are definitely a few grammar rules that are loose and those should be largely ignored. However, we should correct grammar when there is a clear solution, as those same people who do not speak English as their first language are likely to be confused by poor grammar. We're not doing it to teach grammar. We're doing it to ensure readability. The importance of clear English varies with context, but commit messages are a place where we should try hard to just let it go, particularly with those who do not speak English as their first language. Commit messages stick around forever and it's important that they are useful, but they will be read by a small number of people who are going to be in a position to spend a small amount of time getting over whatever dissonance is caused by a typo or imperfect grammar. The times that one is reading git messages are often the most stressful such as when a regression has occurred in production. Given that, I believe it is entirely worth it to me that the commit messages on my patches are accurate and understandable. I embrace all feedback which leads to them being more clear. I will of course stand back from grammar correcting and not block patches if there are many who disagree. I think specs are pretty similar and don't warrant much additional grammar nitpicking. Sure, they're longer pieces of text and slightly more people will rely on them for information, but they're not intended to be complete documentation. Disagree. I will only state this one more time as I think everyone knows how I feel: if we are going to grow beyond the english-as-a-first-language world we simply cannot assume that those reading specs will be native speakers. Good spelling and grammar helps us grow. Bad spelling and grammar holds us back. Where grammar is so poor that readers would be easily misled in important ways, then sure that should be fixed. But there comes a point when we're no longer working to avoid confusion and instead just being pendants. Taking issue[1] with this: whatever scaling mechanism Heat and we end up going with. because it has a dangling preposition is an example of going way beyond the point of productive pedantry IMHO :-) I actually agree that it would not at all be a reason to block a patch. However, there is some ambiguity in that sentence that may not be clear to a native speaker. It is not 100% clear if we are going with Heat, or with the scaling mechanism. That is the only reason for the dangling preposition debate. However, there is a debate, and thus I would _never_ block a patch based on this rule. It was feedback.. just as sometimes there is feedback in commit messages that isn't taken and doesn't lead to a -1. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [hacking] rules for removal
Excerpts from Mark McLoughlin's message of 2014-06-22 00:39:29 -0700: The main point is that this is something worth addressing as a wider community rather than in individual reviews with a limited audience. And that doing it with a bit of humor might help take the sting out of it. Yes, a private message saying Hey fellow earthling, do we really care whether httplib is grouped with os or eventlet? is a productive thing indeed! However, turning that into something that we can all publicly laugh about requires some real skill, and trust between us all. Given our digital communication mediums, it is not likely we can do it all that regularly. I think at best we could gather them all into a couple of keynote slides with the authors' blessings (oh please, somebody do this!) However, if we can all look inward and just do it with self deprecation I'm all for it. The main point I am making is that the less grey areas we have, the less of this we ever have to worry about. It is worth it, to me, to look into keeping this rule alive so that we never ever have to discuss import grouping. (BTW, how many release cycles does one have to deprecate themselves before they remove themselves?) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [hacking] rules for removal
Excerpts from Christopher Yeoh's message of 2014-06-22 18:46:59 -0700: On Mon, Jun 23, 2014 at 4:43 AM, Jay Pipes jaypi...@gmail.com wrote: On 06/22/2014 09:41 AM, Amrith Kumar wrote: In addition to making changes to the hacking rules, why don't we mandate also that perceived problems in the commit message shall not be an acceptable reason to -1 a change. Would this improve the situation? I actually *do* think a very poor commit message for a substantial patch deserves a -1. The git commit message is our history for the patch, and it is important in its own right. Now, nits like a single misspelled word or the commit summary being 60 characters instead of 50 are not what I'm talking about, of course. I'm speaking only about when a commit message blatantly disregards the best practices of commit message writing [1] and doesn't offer anything of value to the reviewer. +1. Minor typos and grammatical errors I don't care about (but will put in suggested fixes if the patch needs to be updated anyway). However, commit messages are very important for future debugging. One or two line vague commit messages can make life a lot harder for others in the future when writing a short description is not what I'd consider an excessive burden. And there should be no assumption that the person reading the commit message will have easy access to the bug database. We've had this discussion already, but just remember that not everybody reading those commit messages will be a native English speaker. The more incorrect the grammar and punctuation is, the more confusing it will be to somebody who is already struggling with those concepts. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ceilometer] [Heat] Ceilometer aware people, please advise us on processing notifications..
Hello! I would like to turn your attention to this specification draft that I've written: https://review.openstack.org/#/c/100012/1/specs/convergence-continuous-observer.rst Angus has suggested that perhaps Ceilometer is a better place to handle this. Can you please comment on the review, or can we have a brief mailing list discussion about how best to filter notifications? Basically in Heat when a user boots an instance, we would like to act as soon as it is active, and not have to poll the nova API to know when that is. Angus has suggested that perhaps we can just tell ceilometer to hit Heat with a web hook when that happens. Thanks! ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [hacking] rules for removal
Excerpts from Sean Dague's message of 2014-06-21 05:08:01 -0700: On 06/20/2014 09:26 PM, Clint Byrum wrote: Excerpts from Sean Dague's message of 2014-06-20 11:07:39 -0700: H803 - First line of a commit message must *not* end in a period. This was mostly a response to an unreasonable core reviewer that was -1ing people for not having periods. I think any core reviewer that -1s for this either way should be thrown off the island, or at least made fun of, a lot. Again, the clarity of a commit message is not made or lost by the lack or existence of a period at the end of the first line. Perhaps we can make a bot that writes disparaging remarks on any -1's that mention period in the line after the short commit message. :) [For the reader: read the comments below, and then come back to this] Note that I'm not at all unaware of the irony I created by making the statement above and then the statements below. I feel like I'm a Fox News reporter being called out on the daily show actually. ;) H305 - Enforcement of libraries fitting correctly into stdlib, 3rdparty, our tree. This biggest issue here is it's built in a world where there was only 1 viable python version, 2.7. Python's stdlib is actually pretty dynamic and grows over time. As we embrace more python 3, and as distros start to make python3 be front and center, what does this even mean? The current enforcement can't pass on both python2 and python3 at the same time in many cases because of that. I think we should find a way to make this work. Like it or not, this will garner -1's by people for stylistic reasons and I'd rather it be the bots than the humans do it. The algorithm is something like this pseudo python: for block in import_blocks: if is_this_set_in_a_known_lib_collection(block): continue if is_this_set_entirely_local(block): continue if is_this_set_entirely_installed_libs(block): continue raise AnError(block) And just make the python2 and python3 stdlibs both be a match. Basically I'm saying, let's just be more forgiving but keep the check so we can avoid most of the -1 please group libs and stdlibs separately patches. You can avoid that by yelling at reviewers if that's the *only* feedback they are giving. I totally agree we can do that. Pedantic reviewers that are reviewing for this kind of thing only should be scorned. I realistically like the idea markmc came up with - https://twitter.com/markmc_/status/480073387600269312 I also agree it is really fun to think about shaming those annoying actions. It is also not fun _at all_ to be publicly shamed. In fact I'd say it is at least an order of magnitude less fun. There is an old saying, praise in public, punish in private. It is one reason the -1 comments I give always include praise for whatever is right for new contributors. Not everyone is a grizzled veteran. It is far more interesting to me to solve the grouping problem in a way that works for us long term (python 2 and 3) than it is to develop a culture that builds any of its core activities on negative emotional feedback. That's not to say we can't say hey you're doing it wrong. I mean to say that direct feedback like that belongs in private IRC messages or email, not in public everyone can see that reviews. Give people a chance to save face. Meanwhile, the less we have to have one on one negative feedback, the easier the job of reviewers is. The last thing we want to do is have more reasons for people to NOT do reviews. I no longer buy the theory that something like this is saving time. What it's actually doing is training a new generation of reviewers that the right thing to do it review for nits. That's not actually what we want, we want people reviewing for how this could go wrong. I'm not sure how hacking is training reviewers. I feel like hacking is training developers. Reviewers don't even need to look at it until the pep8 tox job passes. It's really instructive to realize that we've definitely gone beyond shared culture with what's in hacking. Look at how much of it is turned off in projects. It's pretty high. If this project is going to remain useful at all it really needs to prune back to what's actually shared culture. I think having things turned off at the project level is o-k. The more strict a project's automated style rules, the less they have to quibble and train new reviewers on the fact that we don't do that here. However, I don't think rules being turned off is evidence that rules are unhelpful. It most likely means that those rules didn't exist when the code base was created and they turned them off because of incubation or a new set of rules arrived and they didn't have time to land the new patches. That is a per-project choice and should remain so, but I don't think that choice means that those rules wouldn't have a long term positive effect of stopping
Re: [openstack-dev] 答复: [Heat] fine grained quotas
I started to type the same response as Duncan last night, and I do have the same concern. The fine grained quotas in nova, for instance, can be used to measure potential use of the whole system _exactly_. You can give a bit more to one tenant while you're building out your infrastructure for more tenants to come on board at the lower quotas and know that the one more demanding tenant will still be happy. But how much RAM does it cost to have 1000 stacks creating all at once? How much CPU does it cost? Those are not really 1:1 correlated, and so I also question whether one can really use these quotas to do such fine grained planning. Excerpts from Duncan Thomas's message of 2014-06-20 05:12:44 -0700: There's a maintenance and testing cost to the added complexity, and as far as I can tell, no solid use-case. Under what circumstance would a cloud provider want different limits for different tenants? What concrete problem does it solve? On 20 June 2014 04:35, Huangtianhua huangtian...@huawei.com wrote: Hi, Clint, Thank you for your comments on my BP and code! The BP I proposed is all about putting dynamic, admin-configurable limitations on stack number per tenant and stack complexity. Therefore, you can consider my BP as an extension to your config file-based limitation mechanism. If the admin does not want to configure fined-grained, tenant-specific limits, the values in config become the defalut values of those limits. And just like only an Admin can config the limit items in the config file, the limit update and delete APIs I proposed are also Admin-only. Therefore, users can not set those values by themselves to break the anti-DoS capability you mentioned. The reason I want to introduce the APIs and the dynamic configurable capability to those limits mainly lies in that, since various tenants have various underlying resource quota, and even various template/stack complexity requirements, I think a global, static-configured limitation mechanism could be refined to echo user requirements better. Your idea? By the way, I do think that, the DoS problem is interesting in Heat. Can we have more discussion on that? Thanks again! -邮件原件- 发件人: Clint Byrum [mailto:cl...@fewbar.com] 发送时间: 2014年6月20日 6:33 收件人: openstack-dev 主题: Re: [openstack-dev] [Heat] fine grained quotas Excerpts from Randall Burt's message of 2014-06-19 15:21:14 -0700: On Jun 19, 2014, at 4:17 PM, Clint Byrum cl...@fewbar.com wrote: I was made aware of the following blueprint today: http://blueprints.launchpad.net/heat/+spec/add-quota-api-for-heat http://review.openstack.org/#/c/96696/14 Before this goes much further.. I want to suggest that this work be cancelled, even though the code looks excellent. The reason those limits are in the config file is that these are not billable items and they have a _tiny_ footprint in comparison to the physical resources they will allocate in Nova/Cinder/Neutron/etc. IMO we don't need fine grained quotas in Heat because everything the user will create with these templates will cost them and have its own quota system. The limits (which I added) are entirely to prevent a DoS of the engine. What's more, I don't think this is something we should expose via API other than to perhaps query what those quota values are. It is possible that some provider would want to bill on number of stacks, etc (I personally agree with Clint here), it seems that is something that could/should be handled external to Heat itself. Far be it from any of us to dictate a single business model. However, Heat is a tool which encourages consumption of billable resources by making it easier to tie them together. This is why FedEx gives away envelopes and will come pick up your packages for free. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] CI status update for this week
Excerpts from Charles Crouch's message of 2014-06-20 13:51:49 -0700: - Original Message - Not a great week for TripleO CI. We had 3 different failures related to: Nova [1]: we were using a deprecated config option Heat [2]: missing heat data obtained from the Heat CFN API Neutron [3]: a broken GRE overlay network setup The last two are bugs, but is there anything tripleo can do about avoiding the first one in the future?: e.g. reviewing a list of deprecated options and seeing when they will be removed. do the integrated projects have a protocol for when an option is deprecated and at what point it can be removed? e.g. if I make something deprecated in icehouse I can remove it in juno, but if I make something deprecated at the start of juno I can't remove it at the end of juno? Was this being logged as deprecated for a while? I think we probably should aspire to fail CI if something starts printing out deprecation warnings. We have a few more sprinkled here and there that I see in logs; those are just ticking time bombs. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [hacking] rules for removal
Excerpts from Sean Dague's message of 2014-06-20 11:07:39 -0700: After seeing a bunch of code changes to enforce new hacking rules, I'd like to propose dropping some of the rules we have. The overall patch series is here - https://review.openstack.org/#/q/status:open+project:openstack-dev/hacking+branch:master+topic:be_less_silly,n,z H402 - 1 line doc strings should end in punctuation. The real statement is this should be a summary sentence. A sentence is not just a set of words that end in a period. Squirel fast bob. It's something deeper. This rule thus isn't really semantically useful, especially when you are talking about at 69 character maximum (79 - 4 space indent - 6 quote characters). Yes. I despise this one. H803 - First line of a commit message must *not* end in a period. This was mostly a response to an unreasonable core reviewer that was -1ing people for not having periods. I think any core reviewer that -1s for this either way should be thrown off the island, or at least made fun of, a lot. Again, the clarity of a commit message is not made or lost by the lack or existence of a period at the end of the first line. Perhaps we can make a bot that writes disparaging remarks on any -1's that mention period in the line after the short commit message. :) H305 - Enforcement of libraries fitting correctly into stdlib, 3rdparty, our tree. This biggest issue here is it's built in a world where there was only 1 viable python version, 2.7. Python's stdlib is actually pretty dynamic and grows over time. As we embrace more python 3, and as distros start to make python3 be front and center, what does this even mean? The current enforcement can't pass on both python2 and python3 at the same time in many cases because of that. I think we should find a way to make this work. Like it or not, this will garner -1's by people for stylistic reasons and I'd rather it be the bots than the humans do it. The algorithm is something like this pseudo python: for block in import_blocks: if is_this_set_in_a_known_lib_collection(block): continue if is_this_set_entirely_local(block): continue if is_this_set_entirely_installed_libs(block): continue raise AnError(block) And just make the python2 and python3 stdlibs both be a match. Basically I'm saying, let's just be more forgiving but keep the check so we can avoid most of the -1 please group libs and stdlibs separately patches. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Heat] fine grained quotas
I was made aware of the following blueprint today: http://blueprints.launchpad.net/heat/+spec/add-quota-api-for-heat http://review.openstack.org/#/c/96696/14 Before this goes much further.. I want to suggest that this work be cancelled, even though the code looks excellent. The reason those limits are in the config file is that these are not billable items and they have a _tiny_ footprint in comparison to the physical resources they will allocate in Nova/Cinder/Neutron/etc. IMO we don't need fine grained quotas in Heat because everything the user will create with these templates will cost them and have its own quota system. The limits (which I added) are entirely to prevent a DoS of the engine. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] fine grained quotas
Excerpts from Randall Burt's message of 2014-06-19 15:21:14 -0700: On Jun 19, 2014, at 4:17 PM, Clint Byrum cl...@fewbar.com wrote: I was made aware of the following blueprint today: http://blueprints.launchpad.net/heat/+spec/add-quota-api-for-heat http://review.openstack.org/#/c/96696/14 Before this goes much further.. I want to suggest that this work be cancelled, even though the code looks excellent. The reason those limits are in the config file is that these are not billable items and they have a _tiny_ footprint in comparison to the physical resources they will allocate in Nova/Cinder/Neutron/etc. IMO we don't need fine grained quotas in Heat because everything the user will create with these templates will cost them and have its own quota system. The limits (which I added) are entirely to prevent a DoS of the engine. What's more, I don't think this is something we should expose via API other than to perhaps query what those quota values are. It is possible that some provider would want to bill on number of stacks, etc (I personally agree with Clint here), it seems that is something that could/should be handled external to Heat itself. Far be it from any of us to dictate a single business model. However, Heat is a tool which encourages consumption of billable resources by making it easier to tie them together. This is why FedEx gives away envelopes and will come pick up your packages for free. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Distributed locking
Excerpts from Matthew Booth's message of 2014-06-17 01:36:11 -0700: On 17/06/14 00:28, Joshua Harlow wrote: So this is a reader/write lock then? I have seen https://github.com/python-zk/kazoo/pull/141 come up in the kazoo (zookeeper python library) but there was a lack of a maintainer for that 'recipe', perhaps if we really find this needed we can help get that pull request 'sponsored' so that it can be used for this purpose? As far as resiliency, the thing I was thinking about was how correct do u want this lock to be? If u say go with memcached and a locking mechanism using it this will not be correct but it might work good enough under normal usage. So that¹s why I was wondering about what level of correctness do you want and what do you want to happen if a server that is maintaining the lock record dies. In memcaches case this will literally be 1 server, even if sharding is being used, since a key hashes to one server. So if that one server goes down (or a network split happens) then it is possible for two entities to believe they own the same lock (and if the network split recovers this gets even weirder); so that¹s what I was wondering about when mentioning resiliency and how much incorrectness you are willing to tolerate. From my POV, the most important things are: * 2 nodes must never believe they hold the same lock * A node must eventually get the lock If these are musts, then memcache is a no-go for locking. memcached is likely to delete anything it is storing in its RAM, at any time. Also if you have several memcache servers, a momentary network blip could lead to acquiring the lock erroneously. The only thing it is useful for is coalescing, where a broken lock just means wasted resources, erroneous errors, etc. If consistency is needed, then you need a consistent backend. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Backwards compatibility policy for our projects
Excerpts from Tomas Sedovic's message of 2014-06-17 04:56:24 -0700: On 16/06/14 18:51, Clint Byrum wrote: Excerpts from Tomas Sedovic's message of 2014-06-16 09:19:40 -0700: All, After having proposed some changes[1][2] to tripleo-heat-templates[3], reviewers suggested adding a deprecation period for the merge.py script. While TripleO is an official OpenStack program, none of the projects under its umbrella (including tripleo-heat-templates) have gone through incubation and integration nor have they been shipped with Icehouse. So there is no implicit compatibility guarantee and I have not found anything about maintaining backwards compatibility neither on the TripleO wiki page[4], tripleo-heat-template's readme[5] or tripleo-incubator's readme[6]. The Release Management wiki page[7] suggests that we follow Semantic Versioning[8], under which prior to 1.0.0 (t-h-t is ) anything goes. According to that wiki, we are using a stronger guarantee where we do promise to bump the minor version on incompatible changes -- but this again suggests that we do not promise to maintain backwards compatibility -- just that we document whenever we break it. I think there are no guarantees, and no promises. I also think that we've kept tripleo_heat_merge pretty narrow in surface area since making it into a module, so I'm not concerned that it will be incredibly difficult to keep those features alive for a while. According to Robert, there are now downstreams that have shipped things (with the implication that they don't expect things to change without a deprecation period) so there's clearly a disconnect here. I think it is more of a we will cause them extra work thing. If we can make a best effort and deprecate for a few releases (as in, a few releases of t-h-t, not OpenStack), they'll likely appreciate that. If we can't do it without a lot of effort, we shouldn't bother. Oh. I did assume we were talking about OpenStack releases, not t-h-t, sorry. I have nothing against making a new tht release that deprecates the features we're no longer using and dropping them for good in a later release. What do you suggest would be a reasonable waiting period? Say a month or so? I think it would be good if we could remove all the deprecated stuff before we start porting our templates to HOT. If we do promise backwards compatibility, we should document it somewhere and if we don't we should probably make that more visible, too, so people know what to expect. I prefer the latter, because it will make the merge.py cleanup easier and every published bit of information I could find suggests that's our current stance anyway. This is more about good will than promising. If it is easy enough to just keep the code around and have it complain to us if we accidentally resurrect a feature, that should be enough. We could even introduce a switch to the CLI like --strict that we can run in our gate and that won't allow us to keep using deprecated features. So I'd like to see us deprecate not because we have to, but because we can do it with only a small amount of effort. Right, that's fair enough. I've thought about adding a strict switch, too, but I'd like to start removing code from merge.py, not adding more :-). Let's just leave the capability forever. We're not adding things to merge.py or taking it in any new directions. Keeping the code does not cost us anything. Some day merge.py won't be used, and then it will be like we deleted the whole thing. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Backwards compatibility policy for our projects
Excerpts from Tomas Sedovic's message of 2014-06-16 09:19:40 -0700: All, After having proposed some changes[1][2] to tripleo-heat-templates[3], reviewers suggested adding a deprecation period for the merge.py script. While TripleO is an official OpenStack program, none of the projects under its umbrella (including tripleo-heat-templates) have gone through incubation and integration nor have they been shipped with Icehouse. So there is no implicit compatibility guarantee and I have not found anything about maintaining backwards compatibility neither on the TripleO wiki page[4], tripleo-heat-template's readme[5] or tripleo-incubator's readme[6]. The Release Management wiki page[7] suggests that we follow Semantic Versioning[8], under which prior to 1.0.0 (t-h-t is ) anything goes. According to that wiki, we are using a stronger guarantee where we do promise to bump the minor version on incompatible changes -- but this again suggests that we do not promise to maintain backwards compatibility -- just that we document whenever we break it. I think there are no guarantees, and no promises. I also think that we've kept tripleo_heat_merge pretty narrow in surface area since making it into a module, so I'm not concerned that it will be incredibly difficult to keep those features alive for a while. According to Robert, there are now downstreams that have shipped things (with the implication that they don't expect things to change without a deprecation period) so there's clearly a disconnect here. I think it is more of a we will cause them extra work thing. If we can make a best effort and deprecate for a few releases (as in, a few releases of t-h-t, not OpenStack), they'll likely appreciate that. If we can't do it without a lot of effort, we shouldn't bother. If we do promise backwards compatibility, we should document it somewhere and if we don't we should probably make that more visible, too, so people know what to expect. I prefer the latter, because it will make the merge.py cleanup easier and every published bit of information I could find suggests that's our current stance anyway. This is more about good will than promising. If it is easy enough to just keep the code around and have it complain to us if we accidentally resurrect a feature, that should be enough. We could even introduce a switch to the CLI like --strict that we can run in our gate and that won't allow us to keep using deprecated features. So I'd like to see us deprecate not because we have to, but because we can do it with only a small amount of effort. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Backwards compatibility policy for our projects
Excerpts from Duncan Thomas's message of 2014-06-16 09:41:49 -0700: On 16 June 2014 17:30, Jason Rist jr...@redhat.com wrote: I'm going to have to agree with Tomas here. There doesn't seem to be any reasonable expectation of backwards compatibility for the reasons he outlined, despite some downstream releases that may be impacted. Backward compatibility is a hard habit to get into, and easy to put off. If you're not making any guarantees now, when are you going to start making them? How much breakage can users expect? Without wanting to look entirely like a troll, should TripleO be dropped as an official until it can start making such guarantees? I think every other official OpenStack project has a stable API policy of some kind, even if they don't entirely match... I actually agree with the sentiment of your statement, which is backward compatibility matters. However, there is one thing that is inaccurate in your statements: TripleO is not a project, it is a program. These tools are products of that program's mission which is to deploy OpenStack using itself as much as possible. Where there are holes, we fill them with existing tools or we write minimal tools such as the tripleo_heat_merge Heat template pre-processor. This particular tool is marked for death as soon as Heat grows the appropriate capabilities to allow that. This tool never wants to be integrated into the release. So it is a little hard to justify bending over backwards for BC. But I don't think that is what anybody is requesting. We're not looking for this tool to remain super agile and grow, thus making any existing code and interfaces a burden. So I think it is pretty easy to just start marking features as deprecated and raising deprecation warnings when they're used. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] revert hacking to 0.8 series
Excerpts from Sean Dague's message of 2014-06-16 05:15:54 -0700: Hacking 0.9 series was released pretty late for Juno. The entire check queue was flooded this morning with requirements proposals failing pep8 because of it (so at 6am EST we were waiting 1.5 hrs for a check node). The previous soft policy with pep8 updates was that we set a pep8 version basically release week, and changes stopped being done for style after first milestone. I think in the spirit of that we should revert the hacking requirements update back to the 0.8 series for Juno. We're past milestone 1, so shouldn't be working on style only fixes at this point. Proposed review here - https://review.openstack.org/#/c/100231/ I also think in future hacking major releases need to happen within one week of release, or not at all for that series. +1. Hacking is supposed to help us avoid redundant nit-picking in reviews. If it places any large burden on developers, whether by merge conflicting or backing up CI, it is a failure IMO. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Backwards compatibility policy for our projects
Excerpts from Duncan Thomas's message of 2014-06-16 10:46:12 -0700: Hi Clint This looks like a special pleading here - all OpenStack projects (or 'program' if you prefer - I'm honestly not seeing a difference) have bits that they've written quickly and would rather not have to maintain, but in order to allow people to make use of them downstream have to do that work. Ask the cinder team about how much I try to stay on top of any back-compat issues. I don't just prefer program. It is an entirely different thing: https://wiki.openstack.org/wiki/Programs https://wiki.openstack.org/wiki/Governance/NewProjects If TripleO is not ready to take up that burden, then IMO it shouldn't be an official project. If the bits that make it up are too immature to actually be maintained with reasonable guarantees that they won't just pull the rug out from any consumers, then their use needs to be re-thought. Currently, tripleO enjoys huge benefits from its official status, but isn't delivering to that standard. No other project has a hope of coming in as an official deployment tool while tripleO holds that niche. Despite this, tripleO is barely usable, and doesn't seem to be maturing towards taking up the responsibilities that other projects have had forced upon them. If it isn't ready for that, should it go back to incubation and give some other team or technology a fair chance to step up to the plate? TripleO _isn't_ an official project. It is a program to make OpenStack deploy itself. This is the same as the infra program, which has a mission to support development. We're not calling for Zuul to be integrated into the release, we are just expecting it to keep supporting the goals of the infra program and OpenStack in general. What is the official deployment tool you mention? There isn't one. The tool we've been debating is something that enables OpenStack to be deployed using its own component, Heat, but that is sort of like oslo-incubator.. it is driving a proof of concept for inclusion into an official project. Ironic was spun out very early on because it was clear there was a need for an integrated project to manage baremetal. This is an example where pieces used for TripleO have been pushed into the integrated release. However, Heat already exists, and that is where the responsibility lies to orchestrate applications. We are driving quite a bit into Heat right now, with a massive refactor of the core to be more resilient to the types of challenges a datacenter environment will present. The features we get from the tripleo_heat_merge pre-processor that is in question will be the next thing to go into Heat. Expecting us to commit resources to both of those efforts doesn't make much sense. The program is driving its mission, and the tools will be incubated and integrated when that makes sense. Meanwhile, it turns out OpenStack _is not_ currently able to deploy itself. Users have to bolt things on, whether it is our tools, or salt/puppet/chef/ansible artifacts, users cannot use just what is in OpenStack to deploy OpenStack. But we need to be able to test from one end to the other while we get things landed in OpenStack.. and so, we use the pre-release model while we get to a releasable thing. I don't want to look like I'm specifically beating on tripleO here, but it is the first openstack component I've worked with that seems to have this little concern for downstream users *and* no apparent plans to fix it. Which component specifically are you referring to? Our plan, nay, our mission, is to fix it by pushing the necessary features into the relevant projects. Also, we actually take on a _higher_ burden of backward compatibility with some of our tools that we do want to release. They're not integrated, and we intend to keep them working with all releases of OpenStack because we intend to keep their interfaces stable for as long as those interfaces are relevant. diskimage-builder, os-apply-config, os-collect-config, os-refresh-config, are all stable, and don't need to be integrated into the OpenStack release because they're not even OpenStack specific. That's without going into all of the other difficulties myself and fellow developers have had trying to get involved with tripleO, which I'll go into at some other point. I would be quite interested in any feedback you can give us on how hard it might be to join the effort. It is a large effort, and I know new contributors can often get lost in a sea of possibilities if we, the long time contributors, aren't careful to get them bootstrapped. It is possible there are other places with similar problems, but this is the first I've run into - I'll call out any others I run into, since I think it is important, and discussing it publicly keeps everyone honest. If I've got the wrong expectations, I'd at least like to have the correction on record. I do think that there is a misunderstanding that TripleO is some kind of tool.
Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas
Excerpts from Doug Wiegley's message of 2014-06-10 14:41:29 -0700: Of what use is a database that randomly delete rows? That is, in effect, what you’re allowing. The secrets are only useful when paired with a service. And unless I’m mistaken, there’s no undo. So you’re letting users shoot themselves in the foot, for what reason, exactly? How do you expect openstack to rely on a data store that is fundamentally random at the whim of users? Every single service that uses Barbican will now have to hack in a defense mechanism of some kind, because they can’t trust that the secret they rely on will still be there later. Which defeats the purpose of this mission statement: Barbican is a ReST API designed for the secure storage, provisioning and management of secrets.” (And I don’t think anyone is suggesting that blind refcounts are the answer. At least, I hope not.) Anyway, I hear this has already been decided, so, so be it. Sounds like we’ll hack around it. Doug, nobody is calling Barbican a database. It is a place to store secrets. The idea is to loosely couple things, and if you need more assurances, use something like Heat to manage the relationships. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas
Excerpts from Doug Wiegley's message of 2014-06-16 13:22:26 -0700: nobody is calling Barbican a database. It is a place to store Š did you at least feel a heavy sense of irony as you typed those two statements? ³It¹s not a database, it just stores things!² :-) Not at all, though I understand that, clipped as so, it may look a bit ironic. I was using shorthand of database to mean a general purpose database. I should have qualified it to avoid any confusion. It is a narrow purpose storage service with strong access controls. We can call that a database if you like, but I think it has one very tiny role, and that is to audit and control access to secrets. The real irony here is that in this rather firm stand of keeping the user in control of their secrets, you are actually making the user LESS in control of their secrets. Copies of secrets will have to be made, whether stored under another tenant, or shadow copied somewhere. And the user will have no way to delete them, or even know that they exist. Why would you need to make copies outside of the in-RAM copy that is kept while the service runs? You're trying to do too much instead of operating in a nice loosely coupled fashion. The force flag would eliminate the common mistake cases enough that I¹d wager lbaas and most others would cease to worry, not duplicate, and just reference barbican id¹s and nothing else. (Not including backends that will already make a copy of the secret, but things like servicevm will not need to dup it.) The earlier assertion that we have to deal with the missing secrets case even with a force flag is, I think, false, because once the common errors have been eliminated, the potential window of accidental pain is reduced to those that really ask for it. The accidental pain thing makes no sense to me. I'm a user and I take responsibility for my data. If I don't want to have that responsibility, I will use less privileged users and delegate the higher amount of privilege to a system that does manage those relationships for me. Do we have mandatory file locking in Unix? No we don't. Why? Because some users want the power to remove files _no matter what_. We build in the expectation that things may disappear no matter what you do to prevent it. I think your LBaaS should be written with the same assumption. It will be more resilient and useful to more people if they do not have to play complicated games to remove a secret. Anyway, nobody has answered this. What user would indiscriminately delete their own data and expect that things depending on that data will continue to work indefinitely? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Use MariaDB by default on Fedora
Excerpts from Gregory Haynes's message of 2014-06-16 14:04:19 -0700: Excerpts from Jan Provazník's message of 2014-06-16 20:28:29 +: Hi, MariaDB is now included in Fedora repositories, this makes it easier to install and more stable option for Fedora installations. Currently MariaDB can be used by including mariadb (use mariadb.org pkgs) or mariadb-rdo (use redhat RDO pkgs) element when building an image. What do you think about using MariaDB as default option for Fedora when running devtest scripts? (first, I believe Jan means that MariaDB _Galera_ is now in Fedora) Id like to give this a try. This does start to change us from being a deployment of openstck to being a deployment per distro but IMO thats a reasonable position. Id also like to propose that if we decide against doing this then these elements should not live in tripleo-image-elements. I'm not so sure I agree. We have lio and tgt because lio is on RHEL but everywhere else is still using tgt IIRC. However, I also am not so sure that it is actually a good idea for people to ship on MariaDB since it is not in the gate. As it diverges from MySQL (starting in earnest with 10.x), there will undoubtedly be subtle issues that arise. So I'd say having MariaDB get tested along with Fedora will actually improve those users' test coverage, which is a good thing. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas
Excerpts from Carlos Garza's message of 2014-06-16 16:25:10 -0700: On Jun 16, 2014, at 4:06 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Doug Wiegley's message of 2014-06-16 13:22:26 -0700: nobody is calling Barbican a database. It is a place to store Š did you at least feel a heavy sense of irony as you typed those two statements? ³It¹s not a database, it just stores things!² :-) Not at all, though I understand that, clipped as so, it may look a bit ironic. I was using shorthand of database to mean a general purpose database. I should have qualified it to avoid any confusion. It is a narrow purpose storage service with strong access controls. We can call that a database if you like, but I think it has one very tiny role, and that is to audit and control access to secrets. The real irony here is that in this rather firm stand of keeping the user in control of their secrets, you are actually making the user LESS in control of their secrets. Copies of secrets will have to be made, whether stored under another tenant, or shadow copied somewhere. And the user will have no way to delete them, or even know that they exist. Why would you need to make copies outside of the in-RAM copy that is kept while the service runs? You're trying to do too much instead of operating in a nice loosely coupled fashion. Because the service may be restarted? The force flag would eliminate the common mistake cases enough that I¹d wager lbaas and most others would cease to worry, not duplicate, and just reference barbican id¹s and nothing else. (Not including backends that will already make a copy of the secret, but things like servicevm will not need to dup it.) The earlier assertion that we have to deal with the missing secrets case even with a force flag is, I think, false, because once the common errors have been eliminated, the potential window of accidental pain is reduced to those that really ask for it. The accidental pain thing makes no sense to me. I'm a user and I take responsibility for my data. If I don't want to have that responsibility, I will use less privileged users and delegate the higher amount of privilege to a system that does manage those relationships for me. Do we have mandatory file locking in Unix? No we don't. Why? Because some users want the power to remove files _no matter what_. We build in the expectation that things may disappear no matter what you do to prevent it. I think your LBaaS should be written with the same assumption. It will be more resilient and useful to more people if they do not have to play complicated games to remove a secret. Anyway, nobody has answered this. What user would indiscriminately delete their own data and expect that things depending on that data will continue to work indefinitely? Users that are expecting barbican operations to only occur during the initial loadbalancer provisioning. IE users that don't realize their LBconfigs don't natively store the private keys and would be be retrieving keys on the fly during every migration,HA spin up, service restart(from power failure) etc. But I agree we shouldn't do force flag locking as the barbican team has already dismissed the possibility of adding policy enforcement on behalf of other services. Shadow copying(into a lbaas owned account on barbican) was just so that our lbaas backend can access the keys outside of the users control if need be. :| I'm not sure what that means, but perhaps this is a nice use case for trusts, which would let the user hand LBaaS a revokable secret that gives LBaaS rights to impersonate the user for a specific keystone role. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] How to avoid property revalidation?
Excerpts from Steven Hardy's message of 2014-06-15 02:40:14 -0700: Hi all, So, I stumbled accross an issue while fixing up some tests, which is that AFAICS since Icehouse we continually revalidate every property every time they are accessed: https://github.com/openstack/heat/blob/stable/havana/heat/engine/properties.py#L716 This means that, for example, we revalidate every property every time an event is created: https://github.com/openstack/heat/blob/stable/havana/heat/engine/event.py#L44 And obviously also every time the property is accessed in the code implementing whatever action we're handling, and potentially also before the action (e.g the explicit validate before create/update). This repeated revalidation seems like it could get very expensive - for example there are several resources (Instance/Server resources in particular) which validate against glance via a custom constraint, so we're probably doing at least 6 calls to glance validating the image every create. My suspicion is this is one of the reasons for the performance regression observed in bug #1324102. I've been experimenting with some code which implements local caching of the validated properties, but according to the tests this introduces some problems where the cached value doesn't always match what is expected, still investigating why but I guess it's updates where we need to re-resolve what is cached during the update. Does anyone (and in particular Zane and Thomas who I know have deep experience in this area) have any ideas on what strategy we might employ to reduce this revalidation overhead? tl;dr: I think we should only validate structure in validate, and leave runtime validation to preview. I've been wondering about what we want to achieve with validation recently. It seems to me that the goal is to assist template authors in finding obvious issues in structure and content before they cause a runtime failure. But the error messages are so unhelpful we basically get this: http://cdn.memegenerator.net/instances/500x/50964597.jpg What holds us back from improving that is the complexity of doing runtime validation. To me, runtime is more of a 'preview' problem than a validate problem. A template that validates once should continue to validate on any version that supports the template format. But a preview will actually want to measure runtime things and use parameters, and thus is where runtime concerns belong. I wonder if we could move validation out of any runtime context, and remove any attempts to validate runtime things like image names/ids and such. That would allow us to remove any but pre-action validation calls. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Distributed locking
Excerpts from Matthew Booth's message of 2014-06-13 01:40:30 -0700: On 12/06/14 21:38, Joshua Harlow wrote: So just a few thoughts before going to far down this path, Can we make sure we really really understand the use-case where we think this is needed. I think it's fine that this use-case exists, but I just want to make it very clear to others why its needed and why distributing locking is the only *correct* way. An example use of this would be side-loading an image from another node's image cache rather than fetching it from glance, which would have very significant performance benefits in the VMware driver, and possibly other places. The copier must take a read lock on the image to prevent the owner from ageing it during the copy. Holding a read lock would also assure the copier that the image it is copying is complete. Really? Usually in the unix-inspired world we just open a file and it stays around until we close it. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Starting contributing to project
Excerpts from Sławek Kapłoński's message of 2014-06-15 13:10:56 -0700: Hello, I want to start contributing to neutron project. I found bug which I want to try fix: https://bugs.launchpad.net/neutron/+bug/1204956 and I have question about workflow in such case. Should I clone neutron reposiotory from branch master and do changes based on master branch or maybe should I do my changes starting from any other branch? What should I do next when I will for example do patch for such bug? Thanks in advance for any help and explanation about that This should explain everything you need to know: https://wiki.openstack.org/wiki/Gerrit_Workflow ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Fwd: Fwd: Debian people don't like bash8 as a project name (Bug#748383: ITP: bash8 -- bash script style guide checker)
Excerpts from Thomas Goirand's message of 2014-06-13 03:04:07 -0700: On 06/13/2014 06:53 AM, Morgan Fainberg wrote: Hi Thomas, I felt a couple sentences here were reasonable to add (more than “don’t care” from before). I understand your concerns here, and I totally get what you’re driving at, but in the packaging world wouldn’t this make sense to call it python-bash8? Yes, this is what will happen. Now the binary, I can agree (for reasons outlined) should probably not be named ‘bash8’, but the name of the “command” could be separate from the packaging / project name. If upstream chooses /usr/bin/bash8, I'll have to follow. I don't want to carry patches which I'd have to maintain. Beyond a relatively minor change to the resulting “binary” name [sure bash-tidy, or whatever we come up with], is there something more that really is awful (rather than just silly) about the naming? Renaming python-bash8 into something else is not possible, because the Debian standard is to use, as Debian name, what is used for the import. So if we have import xyz, then the package will be python-xyz. For python _libraries_ yes. But for a utility which happens to import that library, naming the package after what upstream calls it is a de facto standard. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat]Heat template parameters encryption
I tend to agree with you Keith, securing Heat is Heat's problem. Securing Nova is nova's problem. And I too would expect that those with admin access to Heat, would not have admin access to Nova. That is why we split these things up with API's. I still prefer that users encrypt secrets on the client side, and store said secrets in Barbican, passing only a temporary handle into templates for consumption. But until we have that, just encrypting hidden parameters would be simple to do and I wouldn't even mind it being on by default in devstack because only a small percentage of parameters are hidden. My initial reluctance to the plan was in encrypting everything, as that makes verifying things a lot harder. But just encrypting the passwords.. I think that's a decent plan. A couple of ideas: * Provide a utility to change the key (must update the entire database). * Allow multiple decryption keys (to enable tool above to work slowly). Excerpts from Keith Bray's message of 2014-06-11 22:29:13 -0700: On 6/11/14 2:43 AM, Steven Hardy sha...@redhat.com wrote: IMO, when a template author marks a parameter as hidden/secret, it seems incorrect to store that information in plain text. Well I'd still question why we're doing this, as my previous questions have not been answered: - AFAIK nova user-data is not encrypted, so surely you're just shifting the attack vector from one DB to another in nearly all cases? Having one system (e.g. Nova) not as secure as it could be isn't a reason to not secure another system as best we can. For every attack vector you close, you have another one to chase. I'm concerned that the merit of the feature is being debated, so let me see if I can address that: We want to use Heat to launch customer facing stacks. In a UI, we would prompt customers for Template inputs, including for example: Desired Wordpress Admin Password, Desired MySQL password, etc. The UI then makes an API call to Heat to orchestrate instantiation of the stack. With Heat as it is today, these customer specified credentials (as template parameters) would be stored in Heat's database in plain text. As a Heat Service Administrator, I do not need nor do I want the customer's Wordpress application password to be accessible to me. The application belongs to the customer, not to the infrastructure provider. Sure, I could blow the customer's entire instance away as the service provider. But, if I get fired or leave the company, I could no longer can blow away their instance... If I leave the company, however, I could have taken a copy of the Heat DB with me, or had looked that info up in the Heat DB before my exit, and I could then externally attack the customer's Wordpress instance. It makes no sense for us to store user specified creds unencrypted unless we are administering the customer's Wordpress instance for them, which we are not. We are administering the infrastructure only. I realize the encryption key could also be stolen, but in a production system the encryption key access gets locked down to a VERY small set of folks and not all the people that administer Heat (that's part of good security practices and makes auditing of a leaked encryption key much easier). - Is there any known way for heat to leak sensitive user data, other than a cloud operator with admin access to the DB stealing it? Surely cloud operators can trivially access all your resources anyway, including instances and the nova DB/API so they have this data anyway. Encrypting the data in the DB also helps in case if a leak of arbitrary DB data does surface in Heat. We are not aware of any issues with Heat today that could leak that data... But, we never know what vulnerabilities will be introduced or discovered in the future. At Rackspace, individual cloud operators can not trivially access all customer cloud resources. When operating a large cloud at scale, service administrator's operations and capabilities are limited to the systems they work on. While I could impersonate a user via Heat and do lot's of bad things across many of their resources, each of the other systems (Nova, Databases, Auth, etc.) audit the who is doing what on behave of what customer, so I can't do something malicious to a customer's Nova instance without the Auth System Administrators ensuring that HR knows I would be the person to blame. Similarly, a Nova system administrator can't delete a customer's Heat stack without our Heat administrators knowing who is to blame. We have checks and balances across our systems and purposefully segment our possible attack vectors. Leaving sensitive customer data unencrypted at rest provides many more options for that data to get in the wrong hands or be taken outside the company. It is quick and easy to do a MySQL dump if the DB linux system is compromised, which has nothing to do with Heat having a vulnerability. Our ask is to
Re: [openstack-dev] Gate proposal - drop Postgresql configurations in the gate
Excerpts from Matt Riedemann's message of 2014-06-12 08:15:46 -0700: On 6/12/2014 9:38 AM, Mike Bayer wrote: On 6/12/14, 8:26 AM, Julien Danjou wrote: On Thu, Jun 12 2014, Sean Dague wrote: That's not cacthable in unit or functional tests? Not in an accurate manner, no. Keeping jobs alive based on the theory that they might one day be useful is something we just don't have the liberty to do any more. We've not seen an idle node in zuul in 2 days... and we're only at j-1. j-3 will be at least +50% of this load. Sure, I'm not saying we don't have a problem. I'm just saying it's not a good solution to fix that problem IMHO. Just my 2c without having a full understanding of all of OpenStack's CI environment, Postgresql is definitely different enough that MySQL strict mode could still allow issues to slip through quite easily, and also as far as capacity issues, this might be longer term but I'm hoping to get database-related tests to be lots faster if we can move to a model that spends much less time creating databases and schemas. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Is there some organization out there that uses PostgreSQL in production that could stand up 3rd party CI with it? I know that at least for the DB2 support we're adding across the projects we're doing 3rd party CI for that. Granted it's a proprietary DB unlike PG but if we're talking about spending resources on testing for something that's not widely used, but there is a niche set of users that rely on it, we could/should move that to 3rd party CI. I'd much rather see us spend our test resources on getting multi-node testing running in the gate so we can test migrations in Nova. I think this is really the answer. To paraphrase the wise and well experienced engineer, Beyoncé: If you like it then you shoulda put CI on it. The project will succumb to a tragedy of the commons if it bends over backwards for every deployment variation available. But 3rd parties who care can always contribute resources and (if they play nice...) votes. I think there are a tiny number of things that will cause corner case bugs that could creep in, but as Sean says, we haven't actually seen these. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas
Excerpts from Adam Harwell's message of 2014-06-10 12:04:41 -0700: So, it looks like any sort of validation on Deletes in Barbican is going to be a no-go. I'd like to propose a third option, which might be the safest route to take for LBaaS while still providing some of the convenience of using Barbican as a central certificate store. Here is a diagram of the interaction sequence to create a loadbalancer: http://bit.ly/1pgAC7G Summary: Pass the Barbican TLS Container ID to the LBaaS create call, get the container from Barbican, and store a shadow-copy of the container again in Barbican, this time on the LBaaS service account. The secret will now be duplicated (it still exists on the original tenant, but also exists on the LBaaS tenant), but we're not talking about a huge amount of data here -- just a few kilobytes. With this approach, we retain most of the advantages we wanted to get from using Barbican -- we don't need to worry about taking secret data through the LBaaS API (we still just take a barbicanID from the user), and the user can still use a single barbicanID (the original one they created -- the copies are invisible to them) when passing their TLS info to other services. We gain the additional advantage that it no longer matters what happens to the original TLS container -- it could be deleted and it would not impact our service. What do you guys think of that option? A user hands LBaaS an ID, and then deletes it, and expects that LBaaS can continue working indefinitely? How is that user's reckless action LBaaS's problem? Do one thing: Be a good load balancer. Let users orchestrate your APIs according to their use case and tools. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [marconi] Reconsidering the unified API model
Excerpts from Janczuk, Tomasz's message of 2014-06-11 10:05:54 -0700: On 6/11/14, 2:43 AM, Gordon Sim g...@redhat.com wrote: On 06/10/2014 09:57 PM, Janczuk, Tomasz wrote: Using processes to isolate tenants is certainly possible. There is a range of isolation mechanisms that can be used, from VM level isolation (basically a separate deployment of the broker per-tenant), to process level isolation, to sub-process isolation. The higher the density the lower the overall marginal cost of adding a tenant to the system, and overall cost of operating it. From the cost perspective it is therefore desired to provide sub-process multi-tenancy mechanism; at the same time this is the most challenging approach. Where does the increased cost for process level isolation come from? Is it simply the extra process required (implying an eventual limit for a given VM)? With sub-process isolation you have to consider the fairness of scheduling between operations for different tenants, i.e. potentially limiting the processing done on behalf of any given tenant in a given period. You would also need to limit the memory used on behalf of any given tenant. Wouldn't you end up reinventing much of what the operating system does? Process level isolation is more costly than sub-process level isolation primarily due to larger memory consumption. For example, CGI has worse cost characteristics than FastCGI when scaled out. But the example closer to Marconi¹s use case is database systems: I can¹t put my finger on a single one that would isolate queries executed by its users using processes. There's at least one, and it is fairly popular: http://www.postgresql.org/docs/9.3/static/tutorial-arch.html The PostgreSQL server can handle multiple concurrent connections from clients. To achieve this it starts (forks) a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Ironic] [Heat] Mid-cycle collaborative meetup
Excerpts from Jaromir Coufal's message of 2014-06-08 16:44:58 -0700: Hi, it looks that there is no more activity on the survey for mid-cycle dates so I went forward to evaluate it. I created a table view into the etherpad [0] and results are following: * option1 (Jul 28 - Aug 1): 27 attendees - collides with Nova/Ironic * option2 (Jul 21-25) : 27 attendees * option3 (Jul 25-29) : 17 attendees - collides with Nova/Ironic * option4 (Aug 11-15) : 13 attendees I think that we can remove options 3 and 4 from the consideration, because there is lot of people who can't make it. So we have option1 and option2 left. Since Robert and Devananda (PTLs on the projects) can't make option1, which also conflicts with Nova/Ironic meetup, I think it is pretty straightforward. Based on the survey the winning date for the mid-cycle meetup is option2: July 21th - 25th. Does anybody have very strong reason why we shouldn't fix the date for option2 and proceed forward with the organization for the meetup? July 21-25 is also the shortest notice. I will not be able to attend as plans have already been made for the summer and I've already been travelling quite a bit recently, after all we were all just at the summit a few weeks ago. I question the reasoning that being close to FF is a bad thing, and suggest adding much later dates. But I understand since the chosen dates are so close, there is a need to make a decision immediately. Alternatively, I suggest that we split Heat out of this, and aim at later dates in August. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas
Excerpts from Vijay Venkatachalam's message of 2014-06-09 21:48:43 -0700: My vote is for option #2 (without the registration). It is simpler to start with this approach. How is delete handled though? Ex. What is the expectation when user attempts to delete a certificate/container which is referred by an entity like LBaaS listener? 1. Will there be validation in Barbican to prevent this? *OR* 2. LBaaS listener will have a dangling reference/pointer to certificate? Dangling reference. To avoid that, one should update all references before deleting. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas
Excerpts from Douglas Mendizabal's message of 2014-06-09 16:08:02 -0700: Hi all, I’m strongly in favor of having immutable TLS-typed containers, and very much opposed to storing every revision of changes done to a container. I think that storing versioned containers would add too much complexity to Barbican, where immutable containers would work well. Agree completely. Create a new one for new values. Keep the old ones while they're still active. I’m still not sold on the idea of registering services with Barbican, even though (or maybe especially because) Barbican would not be using this data for anything. I understand the problem that we’re trying to solve by associating different resources across projects, but I don’t feel like Barbican is the right place to do this. Agreed also, this is simply not Barbican or Neutron's role. Be a REST API for secrets and networking, not all dancing all singing nannies that prevent any possibly dangerous behavior with said API's. It seems we’re leaning towards option #2, but I would argue that orchestration of services is outside the scope of Barbican’s role as a secret-store. I think this is a problem that may need to be solved at a higher level. Maybe an openstack-wide registry of dependend entities across services? An optional openstack-wide registry of depended entities is called Heat. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas
Excerpts from Eichberger, German's message of 2014-06-06 15:52:54 -0700: Jorge + John, I am most concerned with a user changing his secret in barbican and then the LB trying to update and causing downtime. Some users like to control when the downtime occurs. Couldn't you allow a user to have multiple credentials, the way basically every key based user access system works (for an example see SSH). Users changing their credentials would create new ones, reference them in the appropriate consuming service, and dereference old ones when they are believed to be out of service. I see both specified options as overly complicated attempts to work around what would be solved gracefully with a many-to-one relationship of keys to users. For #1 it was suggested that once the event is delivered it would be up to a user to enable an auto-update flag. In the case of #2 I am a bit worried about error cases: e.g. uploading the certificates succeeds but registering the loadbalancer(s) fails. So using the barbican system for those warnings might not as fool proof as we are hoping. One thing I like about #2 over #1 is that it pushes a lot of the information to Barbican. I think a user would expect when he uploads a new certificate to Barbican that the system warns him right away about load balancers using the old cert. With #1 he might get an e-mails from LBaaS telling him things changed (and we helpfully updated all affected load balancers) -- which isn't as immediate as #2. If we implement an auto-update flag for #1 we can have both. User's who like #2 juts hit the flag. Then the discussion changes to what we should implement first and I agree with Jorge + John that this should likely be #2. IMO you're doing way too much and tending toward tight coupling which will make the system brittle. If you want to give the user orchestration, there is Heat. A template will manage the sort of things that you want, such as automatic replacement and dereferencing/deleting of older credentials. But not if your service doesn't support having n+1 active credentials at one time. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Mid cycle meetup
Excerpts from Devananda van der Veen's message of 2014-06-06 12:04:08 -0700: I have just announced the Ironic mid-cycle in Beaverton, co-located with Nova. That's the main one for Ironic. However, there are many folks working on both TripleO and Ironic, so I wouldn't be surprised if there is a (small?) group at the TripleO sprint hacking on Ironic, even if there's nothing official, and even if the dates overlap (which I really hope they don't). I'm going to try to attend the TripleO sprint if at all possible, as that project remains one of the largest users of Ironic that I'm aware of. Yes, we desperately need expertise as our intention is to push forward on scale testing, and we'll need experts on Ironic's internals to push optimizations where they're needed. I hope that the Ironic team is large enough that there can be some at the Nova sprint, and some at the TripleO sprint if they happen to be concurrent. I believe we would like for the TripleO sprint to be a bit later in the cycle though, and I'm seeing dates proposed that would reflect that. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat]Heat template parameters encryption
Excerpts from Steven Hardy's message of 2014-06-05 02:23:40 -0700: On Thu, Jun 05, 2014 at 12:17:07AM +, Randall Burt wrote: On Jun 4, 2014, at 7:05 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Zane Bitter's message of 2014-06-04 16:19:05 -0700: On 04/06/14 15:58, Vijendar Komalla wrote: Hi Devs, I have submitted an WIP review (https://review.openstack.org/#/c/97900/) for Heat parameters encryption blueprint https://blueprints.launchpad.net/heat/+spec/encrypt-hidden-parameters This quick and dirty implementation encrypts all the parameters on on Stack 'store' and decrypts on on Stack 'load'. Following are couple of improvements I am thinking about; 1. Instead of encrypting individual parameters, on Stack 'store' encrypt all the parameters together as a dictionary [something like crypt.encrypt(json.dumps(param_dictionary))] Yeah, definitely don't encrypt them individually. 2. Just encrypt parameters that were marked as 'hidden', instead of encrypting all parameters I would like to hear your feedback/suggestions. Just as a heads-up, we will soon need to store the properties of resources too, at which point parameters become the least of our problems. (In fact, in theory we wouldn't even need to store parameters... and probably by the time convergence is completely implemented, we won't.) Which is to say that there's almost certainly no point in discriminating between hidden and non-hidden parameters. I'll refrain from commenting on whether the extra security this affords is worth the giant pain it causes in debugging, except to say that IMO there should be a config option to disable the feature (and if it's enabled by default, it should probably be disabled by default in e.g. devstack). Storing secrets seems like a job for Barbican. That handles the giant pain problem because in devstack you can just tell Barbican to have an open read policy. I'd rather see good hooks for Barbican than blanket encryption. I've worked with a few things like this and they are despised and worked around universally because of the reason Zane has expressed concern about: debugging gets ridiculous. How about this: parameters: secrets: type: sensitive resources: sensitive_deployment: type: OS::Heat::StructuredDeployment properties: config: weverConfig server: myserver input_values: secret_handle: { get_param: secrets } The sensitive type would, on the client side, store the value in Barbican, never in Heat. Instead it would just pass in a handle which the user can then build policy around. Obviously this implies the user would set up Barbican's in-instance tools to access the secrets value. But the idea is, let Heat worry about being high performing and introspectable, and then let Barbican worry about sensitive things. While certainly ideal, it doesn't solve the current problem since we can't yet guarantee Barbican will even be available in a given release of OpenStack. In the meantime, Heat continues to store sensitive user information unencrypted in its database. Once Barbican is integrated, I'd be all for changing this implementation, but until then, we do need an interim solution. Sure, debugging is a pain and as developers we can certainly grumble, but leaking sensitive user information because we were too fussed to protect data at rest seems worse IMO. Additionally, the solution as described sounds like we're imposing a pretty awkward process on a user to save ourselves from having to decrypt some data in the cases where we can't access the stack information directly from the API or via debugging running Heat code (where the data isn't encrypted anymore). Under what circumstances are we leaking sensitive user information? Are you just trying to mitigate a potential attack vector, in the event of a bug which leaks data from the DB? If so, is the user-data encrypted in the nova DB? It seems to me that this will only be a worthwhile exercise if the sensitive stuff is encrypted everywhere, and many/most use-cases I can think of which require sensitive data involve that data ending up in nova user|meta-data? I tend to agree Steve. The strategy to move things into a system with strong policy controls like Barbican will mitigate these risks, as even compromise of the given secret access information may not yield access to the actual secrets. Basically, let's help facilitate end-to-end encryption and access control, not just mitigate one attack vector because the end-to-end one is hard. Until then, our DBs will have sensitive information, and such is life. (Of course, this also reminds me that I think we should probably add a one-time-pad type of access method that we can use to prevent compromise of our credentials
Re: [openstack-dev] [ironic bare metal installation issue]
Excerpts from 严超's message of 2014-06-03 21:23:25 -0700: Hi, All : I've deployed my ironic following this link: http://ma.ttwagner.com/bare-metal-deploys-with-devstack-and-ironic/ , all steps is completed. Now one of my node-show provision_state is active. But why is this node still in installation state as follow ? [image: 内嵌图片 1] Ironic has done all that it can for the machine. That is the kernel and ramdisk from the image, and Ironic has no real way to check that this deploy succeeds. It is on the same level as checking to see if your VM actually boots after kvm has been spawned. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Spam] [heat] Resource action API
Excerpts from yang zhang's message of 2014-06-04 00:01:41 -0700: Hi all, Now heat only supports suspending/resuming a whole stack, all the resources of the stack will be suspended/resumed,but sometime we just want to suspend or resume only a part of resources in the stack, so I think adding resource-action API for heat isnecessary. this API will be helpful to solve 2 problems:- If we want to suspend/resume the resources of the stack, you need to get the phy_id first and then call the API of other services, and this won't update the statusof the resource in heat, which often cause some unexpected problem.- this API could offer a turn on/off function for some native resources, e.g., we can turn on/off the autoscalinggroup or a single policy with the API, this is like the suspend/resume services feature[1] in AWS. I registered a bp for it, and you are welcome for discussing it. https://blueprints.launchpad.net/heat/+spec/resource-action-api [1] http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SuspendResume.html Regards!Zhang Yang Hi zhang. I'd rather we model the intended states of each resource, and ensure that Heat can assert them. Actions are tricky things to model. So if you want your nova server to be stopped, how about resources: server1: type: OS::Nova::Server properties: flavor: superbig image: TheBestOS state: STOPPED We don't really need to model actions then, just the API's we have available. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Spam] Re: [Spam] [heat] Resource action API
Excerpts from yang zhang's message of 2014-06-04 01:14:37 -0700: From: cl...@fewbar.com To: openstack-dev@lists.openstack.org Date: Wed, 4 Jun 2014 00:09:39 -0700 Subject: Re: [openstack-dev] [Spam] [heat] Resource action API Excerpts from yang zhang's message of 2014-06-04 00:01:41 -0700: Hi all, Now heat only supports suspending/resuming a whole stack, all the resources of the stack will be suspended/resumed,but sometime we just want to suspend or resume only a part of resources in the stack, so I think adding resource-action API for heat isnecessary. this API will be helpful to solve 2 problems:- If we want to suspend/resume the resources of the stack, you need to get the phy_id first and then call the API of other services, and this won't update the statusof the resource in heat, which often cause some unexpected problem.- this API could offer a turn on/off function for some native resources, e.g., we can turn on/off the autoscalinggroup or a single policy with the API, this is like the suspend/resume services feature[1] in AWS. I registered a bp for it, and you are welcome for discussing it. https://blueprints.launchpad.net/heat/+spec/resource-action-api [1] http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SuspendResume.html Regards!Zhang Yang Hi zhang. I'd rather we model the intended states of each resource, and ensure that Heat can assert them. Actions are tricky things to model. So if you want your nova server to be stopped, how about resources: server1: type: OS::Nova::Server properties: flavor: superbig image: TheBestOS state: STOPPED We don't really need to model actions then, just the API's we have available. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev At first, I want to do it like this, using a resource parameter, but this need to update the stack in order to suspend the Resource, It means we can't stop another resource when a resource is stopping, but it seems not a big deal, stopping resource usually is soon, compare to API, using resource parameter is easy to implement as the result of mature code of stack-update, we could finish it in a short period. Does anyone else have good ideas? It's a bit far off, but the eventual goal of the convergence effort is to make it so you _can_ update two things concurrently, since updates will just be recording intended state in the db, not waiting for all of that to complete. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat]Heat template parameters encryption
Excerpts from Zane Bitter's message of 2014-06-04 16:19:05 -0700: On 04/06/14 15:58, Vijendar Komalla wrote: Hi Devs, I have submitted an WIP review (https://review.openstack.org/#/c/97900/) for Heat parameters encryption blueprint https://blueprints.launchpad.net/heat/+spec/encrypt-hidden-parameters This quick and dirty implementation encrypts all the parameters on on Stack 'store' and decrypts on on Stack 'load'. Following are couple of improvements I am thinking about; 1. Instead of encrypting individual parameters, on Stack 'store' encrypt all the parameters together as a dictionary [something like crypt.encrypt(json.dumps(param_dictionary))] Yeah, definitely don't encrypt them individually. 2. Just encrypt parameters that were marked as 'hidden', instead of encrypting all parameters I would like to hear your feedback/suggestions. Just as a heads-up, we will soon need to store the properties of resources too, at which point parameters become the least of our problems. (In fact, in theory we wouldn't even need to store parameters... and probably by the time convergence is completely implemented, we won't.) Which is to say that there's almost certainly no point in discriminating between hidden and non-hidden parameters. I'll refrain from commenting on whether the extra security this affords is worth the giant pain it causes in debugging, except to say that IMO there should be a config option to disable the feature (and if it's enabled by default, it should probably be disabled by default in e.g. devstack). Storing secrets seems like a job for Barbican. That handles the giant pain problem because in devstack you can just tell Barbican to have an open read policy. I'd rather see good hooks for Barbican than blanket encryption. I've worked with a few things like this and they are despised and worked around universally because of the reason Zane has expressed concern about: debugging gets ridiculous. How about this: parameters: secrets: type: sensitive resources: sensitive_deployment: type: OS::Heat::StructuredDeployment properties: config: weverConfig server: myserver input_values: secret_handle: { get_param: secrets } The sensitive type would, on the client side, store the value in Barbican, never in Heat. Instead it would just pass in a handle which the user can then build policy around. Obviously this implies the user would set up Barbican's in-instance tools to access the secrets value. But the idea is, let Heat worry about being high performing and introspectable, and then let Barbican worry about sensitive things. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ironic workflow question]
Excerpts from 严超's message of 2014-06-04 20:34:01 -0700: BTW, If I run sudo ./bin/disk-image-create -a amd64 ubuntu deploy-ironic -o /tmp/deploy-ramdisk-ubuntu, What is the username/password for image deploy-ramdisk-ubuntu ? There isn't one. You can write an element if you want to include a backdoor user. Otherwise, just use nova's SSH keypair capability when you deploy your image onto boxes. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Short term scaling strategies for large Heat stacks
Excerpts from Steve Baker's message of 2014-06-02 14:37:25 -0700: On 31/05/14 07:01, Zane Bitter wrote: On 29/05/14 19:52, Clint Byrum wrote: update-failure-recovery === This is a blueprint I believe Zane is working on to land in Juno. It will allow us to retry a failed create or update action. Combined with the separate controller/compute node strategy, this may be our best option, but it is unclear whether that code will be available soon or not. The chunking is definitely required, because with 500 compute nodes, if node #250 fails, the remaining 249 nodes that are IN_PROGRESS will be cancelled, which makes the impact of a transient failure quite extreme. Also without chunking, we'll suffer from some of the performance problems we've seen where a single engine process will have to do all of the work to bring up a stack. Pros: * Uses blessed strategy Cons: * Implementation is not complete * Still suffers from heavy impact of failure * Requires chunking to be feasible I've already started working on this and I'm expecting to have this ready some time between the j-1 and j-2 milestones. I think these two strategies combined could probably get you a long way in the short term, though obviously they are not a replacement for the convergence strategy in the long term. BTW You missed off another strategy that we have discussed in the past, and which I think Steve Baker might(?) be working on: retrying failed calls at the client level. As part of the client-plugins blueprint I'm planning on implementing retry policies on API calls. So when currently we call: self.nova().servers.create(**kwargs) This will soon be: self.client().servers.create(**kwargs) And with a retry policy (assuming the default unique-ish server name is used): self.client_plugin().call_with_retry_policy('cleanup_yr_mess_and_try_again', self.client().servers.create, **kwargs) This should be suitable for handling transient errors on API calls such as 500s, response timeouts or token expiration. It shouldn't be used for resources which later come up in an ERROR state; convergence or update-failure-recovery would be better for that. Steve this is fantastic work and sorely needed. Thank you for working on it. Unfortunately, ERROR state machines is the majority of our problem. IPMI and PXE can be unreliable in some environments, and sometimes machines are broken in subtle ways. Also, the odd bug in Neutron, Nova, or Ironic will cause this. Convergence is not available to us for the short term, and really update-failure-recovery is some time off too, so we need more solutions unfortunately. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev