Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells – summit recap and move forward
On Fri, Dec 12, 2014 at 6:50 AM, Russell Bryant rbry...@redhat.com wrote: On 12/11/2014 12:55 PM, Andrew Laski wrote: Cells can handle a single API on top of globally distributed DCs. I have spoken with a group that is doing exactly that. But it requires that the API is a trusted part of the OpenStack deployments in those distributed DCs. And the way the rest of the components fit into that scenario is far from clear to me. Do you consider this more of a if you can make it work, good for you, or something we should aim to be more generally supported over time? Personally, I see the globally distributed OpenStack under a single API case much more complex, and worth considering out of scope for the short to medium term, at least. For me, this discussion boils down to ... 1) Do we consider these use cases in scope at all? 2) If we consider it in scope, is it enough of a priority to warrant a cross-OpenStack push in the near term to work on it? 3) If yes to #2, how would we do it? Cascading, or something built around cells? I haven't worried about #3 much, because I consider #2 or maybe even #1 to be a show stopper here. Agreed -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [NFV][Telco] pxe-boot
On Fri, Dec 12, 2014 at 1:48 AM, Pasquale Porreca pasquale.porr...@dektech.com.au wrote: From my point of view it is not advisable to base some functionalities of the instances on direct calls to Openstack API. This for 2 main reasons, the first one: if the Openstack code changes (and we know Openstack code does change) it will be required to change the code of the software running in the instance too; the second one: if in the future one wants to pass to another cloud infrastructure it will be more difficult to achieve it. Thoughts on your two reasons: 1) What happens if OpenStack code changes? While OpenStack is under very active development we have stable APIs, especially around something like booting an instance. So the API call to boot an instance with a specific image *should not* change as you upgrade OpenStack (unless we deprecate an API, but this will be a slow multi year process). 2) if in the future one wants to pass to another cloud infrastructure it will be more difficult to achieve it. Why not use something like apache jcloud to make this easier? https://jclouds.apache.org/ On 12/12/14 01:20, Joe Gordon wrote: On Wed, Dec 10, 2014 at 7:42 AM, Pasquale Porreca pasquale.porr...@dektech.com.au wrote: Well, one of the main reason to choose an open source product is to avoid vendor lock-in. I think it is not advisable to embed in the software running in an instance a call to OpenStack specific services. I'm sorry I don't follow the logic here, can you elaborate. -- Pasquale Porreca DEK Technologies Via dei Castelli Romani, 22 00040 Pomezia (Roma) Mobile +39 3394823805 Skype paskporr ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Kilo specs review day
On Thu, Dec 11, 2014 at 7:30 AM, Andrew Laski andrew.la...@rackspace.com wrote: On 12/10/2014 04:41 PM, Michael Still wrote: Hi, at the design summit we said that we would not approve specifications after the kilo-1 deadline, which is 18 December. Unfortunately, we’ve had a lot of specifications proposed this cycle (166 to my count), and haven’t kept up with the review workload. Therefore, I propose that Friday this week be a specs review day. We need to burn down the queue of specs needing review, as well as abandoning those which aren’t getting regular updates based on our review comments. I’d appreciate nova-specs-core doing reviews on Friday, but its always super helpful when non-cores review as well. A +1 for a developer or operator gives nova-specs-core a good signal of what might be ready to approve, and that helps us optimize our review time. For reference, the specs to review may be found at: https://review.openstack.org/#/q/project:openstack/nova- specs+status:open,n,z Thanks heaps, Michael It will be nice to have a good push before we hit the deadline. I would like to remind priority owners to update their list of any outstanding specs at https://etherpad.openstack. org/p/kilo-nova-priorities-tracking so they can be targeted during the review day. In preparation, I put together a nova-specs dashboard: https://review.openstack.org/141137 https://review.openstack.org/#/dashboard/?foreach=project%3A%5Eopenstack%2Fnova-specs+status%3Aopen+NOT+owner%3Aself+NOT+label%3AWorkflow%3C%3D-1+label%3AVerified%3E%3D1%252cjenkins+NOT+label%3ACode-Review%3E%3D-2%252cself+branch%3Amastertitle=Nova+SpecsYour+are+a+reviewer%2C+but+haven%27t+voted+in+the+current+revision=reviewer%3AselfNeeds+final+%2B2=label%3ACode-Review%3E%3D2+NOT%28reviewerin%3Anova-specs-core+label%3ACode-Review%3C%3D-1%29+limit%3A100Passed+Jenkins%2C+Positive+Nova-Core+Feedback=NOT+label%3ACode-Review%3E%3D2+%28reviewerin%3Anova-core+label%3ACode-Review%3E%3D1%29+NOT%28reviewerin%3Anova-core+label%3ACode-Review%3C%3D-1%29+limit%3A100Passed+Jenkins%2C+No+Positive+Nova-Core+Feedback%2C+No+Negative+Feedback=NOT+label%3ACode-Review%3C%3D-1+NOT+label%3ACode-Review%3E%3D2+NOT%28reviewerin%3Anova-core+label%3ACode-Review%3E%3D1%29+limit%3A100Wayward+Changes+%28Changes+with+no+code+review+in+the+last+7+days%29=NOT+label%3ACode-Review%3C%3D2+age%3A7dSome+negative+feedback%2C+might+still+be+worth+commenting=label%3ACode-Review%3D-1+NOT+label%3ACode-Review%3D-2+limit%3A100Dead+Specs=label%3ACode-Review%3C%3D-2 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells - summit recap and move forward
On Thu, Dec 11, 2014 at 1:02 AM, joehuang joehu...@huawei.com wrote: Hello, Russell, Many thanks for your reply. See inline comments. -Original Message- From: Russell Bryant [mailto:rbry...@redhat.com] Sent: Thursday, December 11, 2014 5:22 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells - summit recap and move forward On Fri, Dec 5, 2014 at 8:23 AM, joehuang joehu...@huawei.com wrote: Dear all TC PTL, In the 40 minutes cross-project summit session Approaches for scaling out[1], almost 100 peoples attended the meeting, and the conclusion is that cells can not cover the use cases and requirements which the OpenStack cascading solution[2] aim to address, the background including use cases and requirements is also described in the mail. I must admit that this was not the reaction I came away with the discussion with. There was a lot of confusion, and as we started looking closer, many (or perhaps most) people speaking up in the room did not agree that the requirements being stated are things we want to try to satisfy. [joehuang] Could you pls. confirm your opinion: 1) cells can not cover the use cases and requirements which the OpenStack cascading solution aim to address. 2) Need further discussion whether to satisfy the use cases and requirements. On 12/05/2014 06:47 PM, joehuang wrote: Hello, Davanum, Thanks for your reply. Cells can't meet the demand for the use cases and requirements described in the mail. You're right that cells doesn't solve all of the requirements you're discussing. Cells addresses scale in a region. My impression from the summit session and other discussions is that the scale issues addressed by cells are considered a priority, while the global API bits are not. [joehuang] Agree cells is in the first class priority. 1. Use cases a). Vodafone use case[4](OpenStack summit speech video from 9'02 to 12'30 ), establishing globally addressable tenants which result in efficient services deployment. Keystone has been working on federated identity. That part makes sense, and is already well under way. [joehuang] The major challenge for VDF use case is cross OpenStack networking for tenants. The tenant's VM/Volume may be allocated in different data centers geographically, but virtual network (L2/L3/FW/VPN/LB) should be built for each tenant automatically and isolated between tenants. Keystone federation can help authorization automation, but the cross OpenStack network automation challenge is still there. Using prosperity orchestration layer can solve the automation issue, but VDF don't like prosperity API in the north-bound, because no ecosystem is available. And other issues, for example, how to distribute image, also cannot be solved by Keystone federation. b). Telefonica use case[5], create virtual DC( data center) cross multiple physical DCs with seamless experience. If we're talking about multiple DCs that are effectively local to each other with high bandwidth and low latency, that's one conversation. My impression is that you want to provide a single OpenStack API on top of globally distributed DCs. I honestly don't see that as a problem we should be trying to tackle. I'd rather continue to focus on making OpenStack work *really* well split into regions. I think some people are trying to use cells in a geographically distributed way, as well. I'm not sure that's a well understood or supported thing, though. Perhaps the folks working on the new version of cells can comment further. [joehuang] 1) Splited region way cannot provide cross OpenStack networking automation for tenant. 2) exactly, the motivation for cascading is single OpenStack API on top of globally distributed DCs. Of course, cascading can also be used for DCs close to each other with high bandwidth and low latency. 3) Folks comment from cells are welcome. . c). ETSI NFV use cases[6], especially use case #1, #2, #3, #5, #6, 8#. For NFV cloud, it's in nature the cloud will be distributed but inter-connected in many data centers. I'm afraid I don't understand this one. In many conversations about NFV, I haven't heard this before. [joehuang] This is the ETSI requirement and use cases specification for NFV. ETSI is the home of the Industry Specification Group for NFV. In Figure 14 (virtualization of EPC) of this document, you can see that the operator's cloud including many data centers to provide connection service to end user by inter-connected VNFs. The requirements listed in ( https://wiki.openstack.org/wiki/TelcoWorkingGroup) is mainly about the requirements from specific VNF(like IMS, SBC, MME, HSS, S/P GW etc) to run over cloud, eg. migrate the traditional telco. APP from prosperity hardware to cloud. Not all NFV requirements have been covered yet. Forgive me there are so many telco terms here.
Re: [openstack-dev] [NFV][Telco] pxe-boot
On Wed, Dec 10, 2014 at 7:42 AM, Pasquale Porreca pasquale.porr...@dektech.com.au wrote: Well, one of the main reason to choose an open source product is to avoid vendor lock-in. I think it is not advisable to embed in the software running in an instance a call to OpenStack specific services. I'm sorry I don't follow the logic here, can you elaborate. On 12/10/14 00:20, Joe Gordon wrote: On Wed, Dec 3, 2014 at 1:16 AM, Pasquale Porreca pasquale.porr...@dektech.com.au wrote: The use case we were thinking about is a Network Function (e.g. IMS Nodes) implementation in which the high availability is based on OpenSAF. In this scenario there is an Active/Standby cluster of 2 System Controllers (SC) plus several Payloads (PL) that boot from network, controlled by the SC. The logic of which service to deploy on each payload is inside the SC. In OpenStack both SCs and PLs will be instances running in the cloud, anyway the PLs should still boot from network under the control of the SC. In fact to use Glance to store the image for the PLs and keep the control of the PLs in the SC, the SC should trigger the boot of the PLs with requests to Nova/Glance, but an application running inside an instance should not directly interact with a cloud infrastructure service like Glance or Nova. Why not? This is a fairly common practice. -- Pasquale Porreca DEK Technologies Via dei Castelli Romani, 22 00040 Pomezia (Roma) Mobile +39 3394823805 Skype paskporr ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells - summit recap and move forward
On Thu, Dec 11, 2014 at 6:25 PM, joehuang joehu...@huawei.com wrote: Hello, Joe Thank you for your good question. Question: How would something like flavors work across multiple vendors. The OpenStack API doesn't have any hard coded names and sizes for flavors. So a flavor such as m1.tiny may actually be very different vendor to vendor. Answer: The flavor is defined by Cloud Operator from the cascading OpenStack. And Nova-proxy ( which is the driver for “Nova as hypervisor” ) will sync the flavor to the cascaded OpenStack when it was first used in the cascaded OpenStack. If flavor was changed before a new VM is booted, the changed flavor will also be updated to the cascaded OpenStack just before the new VM booted request. Through this synchronization mechanism, all flavor used in multi-vendor’s cascaded OpenStack will be kept the same as what used in the cascading level, provide a consistent view for flavor. I don't think this is sufficient. If the underlying hardware the between multiple vendors is different setting the same values for a flavor will result in different performance characteristics. For example, nova allows for setting VCPUs, but nova doesn't provide an easy way to define how powerful a VCPU is. Also flavors are commonly hardware dependent, take what rackspace offers: http://www.rackspace.com/cloud/public-pricing#cloud-servers Rackspace has I/O Optimized flavors * High-performance, RAID 10-protected SSD storage * Option of booting from Cloud Block Storage (additional charges apply for Cloud Block Storage) * Redundant 10-Gigabit networking * Disk I/O scales with the number of data disks up to ~80,000 4K random read IOPS and ~70,000 4K random write IOPS.* How would cascading support something like this? Best Regards Chaoyi Huang ( joehuang ) *From:* Joe Gordon [mailto:joe.gord...@gmail.com] *Sent:* Friday, December 12, 2014 8:17 AM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells - summit recap and move forward On Thu, Dec 11, 2014 at 1:02 AM, joehuang joehu...@huawei.com wrote: Hello, Russell, Many thanks for your reply. See inline comments. -Original Message- From: Russell Bryant [mailto:rbry...@redhat.com] Sent: Thursday, December 11, 2014 5:22 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells – summit recap and move forward On Fri, Dec 5, 2014 at 8:23 AM, joehuang joehu...@huawei.com wrote: Dear all TC PTL, In the 40 minutes cross-project summit session “Approaches for scaling out”[1], almost 100 peoples attended the meeting, and the conclusion is that cells can not cover the use cases and requirements which the OpenStack cascading solution[2] aim to address, the background including use cases and requirements is also described in the mail. I must admit that this was not the reaction I came away with the discussion with. There was a lot of confusion, and as we started looking closer, many (or perhaps most) people speaking up in the room did not agree that the requirements being stated are things we want to try to satisfy. [joehuang] Could you pls. confirm your opinion: 1) cells can not cover the use cases and requirements which the OpenStack cascading solution aim to address. 2) Need further discussion whether to satisfy the use cases and requirements. On 12/05/2014 06:47 PM, joehuang wrote: Hello, Davanum, Thanks for your reply. Cells can't meet the demand for the use cases and requirements described in the mail. You're right that cells doesn't solve all of the requirements you're discussing. Cells addresses scale in a region. My impression from the summit session and other discussions is that the scale issues addressed by cells are considered a priority, while the global API bits are not. [joehuang] Agree cells is in the first class priority. 1. Use cases a). Vodafone use case[4](OpenStack summit speech video from 9'02 to 12'30 ), establishing globally addressable tenants which result in efficient services deployment. Keystone has been working on federated identity. That part makes sense, and is already well under way. [joehuang] The major challenge for VDF use case is cross OpenStack networking for tenants. The tenant's VM/Volume may be allocated in different data centers geographically, but virtual network (L2/L3/FW/VPN/LB) should be built for each tenant automatically and isolated between tenants. Keystone federation can help authorization automation, but the cross OpenStack network automation challenge is still there. Using prosperity orchestration layer can solve the automation issue, but VDF don't like prosperity API in the north-bound, because no ecosystem is available. And other issues, for example, how to distribute image, also cannot be solved by Keystone federation. b
Re: [openstack-dev] [nova] Kilo specs review day
On Wed, Dec 10, 2014 at 1:41 PM, Michael Still mi...@stillhq.com wrote: Hi, at the design summit we said that we would not approve specifications after the kilo-1 deadline, which is 18 December. Unfortunately, we’ve had a lot of specifications proposed this cycle (166 to my count), and haven’t kept up with the review workload. Therefore, I propose that Friday this week be a specs review day. We need to burn down the queue of specs needing review, as well as abandoning those which aren’t getting regular updates based on our review comments. I’d appreciate nova-specs-core doing reviews on Friday, but its always super helpful when non-cores review as well. A +1 for a developer or operator gives nova-specs-core a good signal of what might be ready to approve, and that helps us optimize our review time. For reference, the specs to review may be found at: https://review.openstack.org/#/q/project:openstack/nova-specs+status:open,n,z ++, count me in! Thanks heaps, Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [hacking] hacking package upgrade dependency with setuptools
On Tue, Dec 9, 2014 at 11:05 AM, Surojit Pathak suro.p...@gmail.com wrote: Hi all, On a RHEL system, as I upgrade hacking package from 0.8.0 to 0.9.5, I see flake8 stops working. Upgrading setuptools resolves the issue. But I do not see a change in version for pep8 or setuptools, with the upgrade in setuptools. Any issue in packaging? Any explanation of this behavior? Snippet - [suro@poweredsoured ~]$ pip list | grep hacking hacking (0.8.0) [suro@poweredsoured ~]$ [suro@poweredsoured app]$ sudo pip install hacking==0.9.5 ... Successful installation [suro@poweredsoured app]$ flake8 neutron/ ... File /usr/lib/python2.6/site-packages/pkg_resources.py, line 546, in resolve raise DistributionNotFound(req) pkg_resources.DistributionNotFound: pep8=1.4.6 [suro@poweredsoured app]$ pip list | grep pep8 pep8 (1.5.6) [suro@poweredsoured app]$ pip list | grep setuptools setuptools (0.6c11) [suro@poweredsoured app]$ sudo pip install -U setuptools ... Successfully installed setuptools Cleaning up... [suro@poweredsoured app]$ pip list | grep pep8 pep8 (1.5.6) [suro@poweredsoured app]$ pip list | grep setuptools setuptools (0.6c11) [suro@poweredsoured app]$ flake8 neutron/ [suro@poweredsoured app]$ Could this be pbr related? -pbr=0.5.21,1.0 +pbr=0.6,!=0.7,1.0 -- Regards, Surojit Pathak ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [qa] How to delete a VM which is in ERROR state?
On Sat, Dec 6, 2014 at 5:08 PM, Danny Choi (dannchoi) dannc...@cisco.com wrote: Hi, I have a VM which is in ERROR state. +--+--+++-++ | ID | Name | Status | Task State | Power State | Networks | +--+--+++-++ | 1cb5bf96-619c-4174-baae-dd0d8c3d40c5 | cirros--1cb5bf96-619c-4174-baae-dd0d8c3d40c5 | ERROR | - | NOSTATE || I tried in both CLI “nova delete” and Horizon “terminate instance”. Both accepted the delete command without any error. However, the VM never got deleted. Is there a way to remove the VM? What version of nova are you using? This is definitely a serious bug, you should be able to delete an instance in error state. Can you file a bug that includes steps on how to reproduce the bug along with all relevant logs. bugs.launchpad.net/nova Thanks, Danny ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [keystone][all] Max Complexity Check Considered Harmful
On Mon, Dec 8, 2014 at 5:03 PM, Brant Knudson b...@acm.org wrote: Not too long ago projects added a maximum complexity check to tox.ini, for example keystone has max-complexity=24. Seemed like a good idea at the time, but in a recent attempt to lower the maximum complexity check in keystone[1][2], I found that the maximum complexity check can actually lead to less understandable code. This is because the check includes an embedded function's complexity in the function that it's in. This behavior is expected. Nested functions cannot be unit tested on there own. Part of the issue is that nested functions can access variables scoped to the outer function, so the following function is valid: def outer(): var = outer def inner(): print var inner() Because nested functions cannot easily be unit tested, and can be harder to reason about since they can access variables that are part of the outer function, I don't think they are easier to understand (there are still cases where a nested function makes sense though). The way I would have lowered the complexity of the function in keystone is to extract the complex part into a new function. This can make the existing function much easier to understand for all the reasons that one defines a function for code. Since this new function is obviously only called from the function it's currently in, it makes sense to keep the new function inside the existing function. It's simpler to think about an embedded function because then you know it's only called from one place. The problem is, because of the existing complexity check behavior, this doesn't lower the complexity according to the complexity check, so you wind up putting the function as a new top-level, and now a reader is has to assume that the function could be called from anywhere and has to be much more cautious about changes to the function. Since the complexity check can lead to code that's harder to understand, it must be considered harmful and should be removed, at least until the incorrect behavior is corrected. Why do you think the max complexity check is harmful? because it prevents large amounts of nested functions? [1] https://review.openstack.org/#/c/139835/ [2] https://review.openstack.org/#/c/139836/ [3] https://review.openstack.org/#/c/140188/ - Brant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][docker][containers][qa] nova-docker CI failing a lot on unrelated nova patches
On Fri, Dec 5, 2014 at 11:43 AM, Ian Main im...@redhat.com wrote: Sean Dague wrote: On 12/04/2014 05:38 PM, Matt Riedemann wrote: On 12/4/2014 4:06 PM, Michael Still wrote: +Eric and Ian On Fri, Dec 5, 2014 at 8:31 AM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: This came up in the nova meeting today, I've opened a bug [1] for it. Since this isn't maintained by infra we don't have log indexing so I can't use logstash to see how pervasive it us, but multiple people are reporting the same thing in IRC. Who is maintaining the nova-docker CI and can look at this? It also looks like the log format for the nova-docker CI is a bit weird, can that be cleaned up to be more consistent with other CI log results? [1] https://bugs.launchpad.net/nova-docker/+bug/1399443 -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Also, according to the 3rd party CI requirements [1] I should see nova-docker CI in the third party wiki page [2] so I can get details on who to contact when this fails but that's not done. [1] http://ci.openstack.org/third_party.html#requirements [2] https://wiki.openstack.org/wiki/ThirdPartySystems It's not the 3rd party CI job we are talking about, it's the one in the check queue which is run by infra. But, more importantly, jobs in those queues need shepards that will fix them. Otherwise they will get deleted. Clarkb provided the fix for the log structure right now - https://review.openstack.org/#/c/139237/1 so at least it will look vaguely sane on failures -Sean This is one of the reasons we might like to have this in nova core. Otherwise we will just keep addressing issues as they come up. We would likely be involved doing this if it were part of nova core anyway. While gating on nova-docker will prevent patches that cause nova-docker to break 100% to land, it won't do a lot to prevent transient failures. To fix those we need people dedicated to making sure nova-docker is working. Ian -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [NFV][Telco] pxe-boot
On Wed, Dec 3, 2014 at 1:16 AM, Pasquale Porreca pasquale.porr...@dektech.com.au wrote: The use case we were thinking about is a Network Function (e.g. IMS Nodes) implementation in which the high availability is based on OpenSAF. In this scenario there is an Active/Standby cluster of 2 System Controllers (SC) plus several Payloads (PL) that boot from network, controlled by the SC. The logic of which service to deploy on each payload is inside the SC. In OpenStack both SCs and PLs will be instances running in the cloud, anyway the PLs should still boot from network under the control of the SC. In fact to use Glance to store the image for the PLs and keep the control of the PLs in the SC, the SC should trigger the boot of the PLs with requests to Nova/Glance, but an application running inside an instance should not directly interact with a cloud infrastructure service like Glance or Nova. Why not? This is a fairly common practice. We know that it is yet possible to achieve network booting in OpenStack using an image stored in Glance that acts like a pxe client, anyway this workaround has some drawbacks, mainly due to the fact it won't be possible to choose the specific virtual NIC on which the network boot will happen, causing DHCP requests to flow on networks where they don't belong to and possible delays in the boot of the instances. On 11/27/14 00:32, Steve Gordon wrote: - Original Message - From: Angelo Matarazzo angelo.matara...@dektech.com.au To: OpenStack Development Mailing openstack-dev@lists.openstack.org, openstack-operat...@lists.openstack.org Hi all, my team and I are working on pxe boot feature very similar to the Discless VM one in Active blueprint list[1] The blueprint [2] is no longer active and we created a new spec [3][4]. Nova core reviewers commented our spec and the first and the most important objection is that there is not a compelling reason to provide this kind of feature : booting from network. Aside from the specific implementation, I think that some members of TelcoWorkingGroup could be interested in and provide a use case. I would also like to add this item to the agenda of next meeting Any thought? We did discuss this today, and granted it is listed as a blueprint someone in the group had expressed interest in at a point in time - though I don't believe any further work was done. The general feeling was that there isn't anything really NFV or Telco specific about this over and above the more generic use case of legacy applications. Are you able to further elaborate on the reason it's NFV or Telco specific other than because of who is requesting it in this instance? Thanks! -Steve ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Pasquale Porreca DEK Technologies Via dei Castelli Romani, 22 00040 Pomezia (Roma) Mobile +39 3394823805 Skype paskporr ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][docker][containers][qa] nova-docker CI failing a lot on unrelated nova patches
On Tue, Dec 9, 2014 at 3:18 PM, Eric Windisch e...@windisch.us wrote: While gating on nova-docker will prevent patches that cause nova-docker to break 100% to land, it won't do a lot to prevent transient failures. To fix those we need people dedicated to making sure nova-docker is working. What would be helpful for me is a way to know that our tests are breaking without manually checking Kibana, such as an email. There is also graphite [0], but since the docker-job is running on the check queue the data we are producing is very dirty. Since check jobs often run on broken patches. [0] http://graphite.openstack.org/render/?from=-10daysheight=500until=nowwidth=1200bgcolor=fffgcolor=00yMax=100yMin=0target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.check-tempest-dsvm-docker.FAILURE,sum(stats.zuul.pipeline.check.job.check-tempest-dsvm-docker.{SUCCESS,FAILURE})),%2736hours%27),%20%27check-tempest-dsvm-docker%27),%27orange%27)title=Docker%20Failure%20Rates%20(10%20days)_t=0.3702208176255226 Regards, Eric Windisch ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Neutron][NFV][Third-party] CI for NUMA, SR-IOV, and other features that can't be tested on current infra.
On Tue, Nov 25, 2014 at 2:01 PM, Steve Gordon sgor...@redhat.com wrote: - Original Message - From: Daniel P. Berrange berra...@redhat.com To: Dan Smith d...@danplanet.com On Thu, Nov 13, 2014 at 05:43:14PM +, Daniel P. Berrange wrote: On Thu, Nov 13, 2014 at 09:36:18AM -0800, Dan Smith wrote: That sounds like something worth exploring at least, I didn't know about that kernel build option until now :-) It sounds like it ought to be enough to let us test the NUMA topology handling, CPU pinning and probably huge pages too. Okay. I've been vaguely referring to this as a potential test vector, but only just now looked up the details. That's my bad :) The main gap I'd see is NUMA aware PCI device assignment since the PCI - NUMA node mapping data comes from the BIOS and it does not look like this is fakeable as is. Yeah, although I'd expect that the data is parsed and returned by a library or utility that may be a hook for fakeification. However, it may very well be more trouble than it's worth. I still feel like we should be able to test generic PCI in a similar way (passing something like a USB controller through to the guest, etc). However, I'm willing to believe that the intersection of PCI and NUMA is a higher order complication :) Oh I forgot to mention with PCI device assignment (as well as having a bunch of PCI devices available[1]), the key requirement is an IOMMU. AFAIK, neither Xen or KVM provide any IOMMU emulation, so I think we're out of luck for even basic PCI assignment testing inside VMs. Ok, turns out that wasn't entirely accurate in general. KVM *can* emulate an IOMMU, but it requires that the guest be booted with the q35 machine type, instead of the ancient PIIX4 machine type, and also QEMU must be launched with -machine iommu=on. We can't do this in Nova, so although it is theoretically possible, it is not doable for us in reality :-( Regards, Daniel Is it worth still pursuing virtual testing of the NUMA awareness work you, nikola, and others have been doing? It seems to me it would still be preferable to do this virtually (and ideally in the gate) wherever possible? The more we can test in the gate and without real hardware the better. Thanks, Steve ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] policy on old / virtually abandoned patches
On Dec 5, 2014 7:07 AM, Daniel P. Berrange berra...@redhat.com wrote: On Tue, Nov 18, 2014 at 07:06:59AM -0500, Sean Dague wrote: Nova currently has 197 patches that have seen no activity in the last 4 weeks (project:openstack/nova age:4weeks status:open). On a somewhat related note, nova-specs currently has 17 specs open against specs/juno, most with -2 votes. I think we should just mass-abandon anything still touching the specs/juno directory. If people cared about them still they would have submitted for specs/kilo. So any objection to killing everything in the list below: +1, makes sense to me. +-+---+--+---+-+--+ | URL | Subject | Created | Tests | Reviews | Workflow | +-+---+--+---+-+--+ | https://review.openstack.org/86938 | Add tasks to the v3 API | 237 days | 1| -2 | | | https://review.openstack.org/88334 | Add support for USB controller | 231 days | 1| -2 | | | https://review.openstack.org/89766 | Add useful metrics into utilization based scheduli... | 226 days | 1| -2 | | | https://review.openstack.org/90239 | Blueprint for Cinder Multi attach volumes | 224 days | 1| -2 | | | https://review.openstack.org/90647 | Add utilization based weighers on top of MetricsWe... | 221 days | 1| -2 | | | https://review.openstack.org/96543 | Smart Scheduler (Solver Scheduler) - Constraint ba... | 189 days | 1| -2 | | | https://review.openstack.org/97441 | Add nova spec for bp/isnot-operator | 185 days | 1| -2 | | | https://review.openstack.org/99476 | Dedicate aggregates for specific tenants | 176 days | 1| -2 | | | https://review.openstack.org/99576 | Add client token to CreateServer | 176 days | 1| -2 | | | https://review.openstack.org/101921 | Spec for Neutron migration feature| 164 days | 1| -2 | | | https://review.openstack.org/103617 | Support Identity V3 API | 157 days | 1| -1 | | | https://review.openstack.org/105385 | Leverage the features of IBM GPFS to store cached ... | 150 days | 1| -2 | | | https://review.openstack.org/108582 | Add ironic boot mode filters | 136 days | 1| -2 | | | https://review.openstack.org/110639 | Blueprint for the implementation of Nested Quota D... | 127 days | 1| -2 | | | https://review.openstack.org/111308 | Added VirtProperties object blueprint | 125 days | 1| -2 | | | https://review.openstack.org/111745 | Improve instance boot time | 122 days | 1| -2 | | | https://review.openstack.org/116280 | Add a new filter to implement project isolation fe... | 104 days | 1| -2 | | +-+---+--+---+-+--+ Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Spring cleaning nova-core
On Fri, Dec 5, 2014 at 4:39 PM, Russell Bryant rbry...@redhat.com wrote: On 12/05/2014 08:41 AM, Daniel P. Berrange wrote: On Fri, Dec 05, 2014 at 11:05:28AM +1100, Michael Still wrote: One of the things that happens over time is that some of our core reviewers move on to other projects. This is a normal and healthy thing, especially as nova continues to spin out projects into other parts of OpenStack. However, it is important that our core reviewers be active, as it keeps them up to date with the current ways we approach development in Nova. I am therefore removing some no longer sufficiently active cores from the nova-core group. I’d like to thank the following people for their contributions over the years: * cbehrens: Chris Behrens * vishvananda: Vishvananda Ishaya * dan-prince: Dan Prince * belliott: Brian Elliott * p-draigbrady: Padraig Brady I’d love to see any of these cores return if they find their available time for code reviews increases. What stats did you use to decide whether to cull these reviewers ? Looking at the stats over a 6 month period, I think Padraig Brady is still having a significant positive impact on Nova - on a par with both cerberus and alaski who you've not proposing for cut. I think we should keep Padraig on the team, but probably suggest cutting Markmc instead http://russellbryant.net/openstack-stats/nova-reviewers-180.txt +-+++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +-+++ | berrange ** |1766 26 435 12 1293 35773.9% | 157 ( 8.9%) | | jaypipes ** |1359 11 378 436 534 13371.4% | 109 ( 8.0%) | | jogo ** |1053 131 326 7 589 35356.6% | 47 ( 4.5%) | | danms ** | 921 67 381 23 450 16751.4% | 32 ( 3.5%) | | oomichi ** | 8894 306 55 524 18265.1% | 40 ( 4.5%) | |johngarbutt ** | 808 319 227 10 252 14532.4% | 37 ( 4.6%) | | mriedem ** | 642 27 279 25 311 13652.3% | 17 ( 2.6%) | | klmitch ** | 6061 90 2 513 7085.0% | 67 ( 11.1%) | | ndipanov ** | 588 19 179 10 380 11366.3% | 62 ( 10.5%) | |mikalstill **| 564 31 34 3 496 20788.5% | 20 ( 3.5%) | | cyeoh-0 ** | 546 12 207 30 297 10359.9% | 35 ( 6.4%) | | sdague ** | 511 23 89 6 393 22978.1% | 25 ( 4.9%) | | russellb ** | 4656 83 0 376 15880.9% | 23 ( 4.9%) | | alaski ** | 4151 65 21 328 14984.1% | 24 ( 5.8%) | | cerberus ** | 4056 25 48 326 10292.3% | 33 ( 8.1%) | | p-draigbrady ** | 3762 40 9 325 6488.8% | 49 ( 13.0%) | | markmc ** | 2432 54 3 184 6977.0% | 14 ( 5.8%) | | belliott ** | 2311 68 5 157 3570.1% | 19 ( 8.2%) | |dan-prince **| 1782 48 9 119 2971.9% | 11 ( 6.2%) | | cbehrens ** | 1322 49 2 79 1961.4% |6 ( 4.5%) | |vishvananda ** | 540 5 3 46 1590.7% |5 ( 9.3%) | Yeah, I was pretty surprised to see pbrady on this list, as well. The above was 6 months, but even if you drop it to the most recent 3 months, he's still active ... As you are more then aware of, our policy for removing people from core is to leave that up the the PTL (I believe you wrote that) [0]. And I don't think numbers alone are a good metric for sorting out who to remove. That being said no matter what happens, with our fast track policy, if pbrady is dropped it shouldn't be hard to re-add him. [0] https://wiki.openstack.org/wiki/Nova/CoreTeam#Adding_or_Removing_Members https://wiki.openstack.org/wiki/Nova/CoreTeam#Adding_or_Removing_Members Reviews for the last 90 days in nova ** -- nova-core team member +-+---++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +-+---++ | berrange ** | 708 13 145 1 549 20077.7% | 47 ( 6.6%) | | jogo ** | 594 40 218 4 332 17456.6% | 27 ( 4.5%) | | jaypipes ** | 509 10 180 17 302 7762.7% | 33 ( 6.5%) | | oomichi ** | 3921 136
Re: [openstack-dev] [Nova] Spring cleaning nova-core
On Dec 5, 2014 11:39 AM, Russell Bryant rbry...@redhat.com wrote: On 12/05/2014 11:23 AM, Joe Gordon wrote: As you are more then aware of, our policy for removing people from core is to leave that up the the PTL (I believe you wrote that) [0]. And I don't think numbers alone are a good metric for sorting out who to remove. That being said no matter what happens, with our fast track policy, if pbrady is dropped it shouldn't be hard to re-add him. Yes, I'm aware of and not questioning the policy. Usually drops are pretty obvious. This one wasn't. It seems reasonable to discuss. Maybe we don't have a common set of expectations. Anyway, I'll follow up in private. Agreed -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Cinder] Operations: adding new nodes in disabled state, allowed for test tenant only
On Wed, Dec 3, 2014 at 3:31 PM, Mike Scherbakov mscherba...@mirantis.com wrote: Hi all, enable_new_services in nova.conf seems to allow add new compute nodes in disabled state: https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L507-L508, so it would allow to check everything first, before allowing production workloads host a VM on it. I've filed a bug to Fuel to use this by default when we scale up the env (add more computes) [1]. A few questions: 1. can we somehow enable compute service for test tenant first? So cloud administrator would be able to run test VMs on the node, and after ensuring that everything is fine - to enable service for all tenants Although there may be more then one way to set this up in nova, this can definitely be done via nova host aggregates. Put new compute services into an aggregate that only specific tenants can access (controlled via scheduler filter). 1. 2. What about Cinder? Is there a similar option / ability? 3. What about other OpenStack projects? What is your opinion, how we should approach the problem (if there is a problem)? [1] https://bugs.launchpad.net/fuel/+bug/1398817 -- Mike Scherbakov #mihgen ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Kilo Project Priorities
Hi all, Note: Cross posting with operators After a long double slot summit session, the nova team has come up with its list of efforts to prioritize for Kilo: http://specs.openstack.org/openstack/nova-specs/priorities/kilo-priorities.html What does this mean? * This is a list of items we think are important to accomplish for Kilo * We are trying to prioritize work that fits under those categories. * If you would like to help with one of those priorities, please contact the owner. thoughts, comments and feedback are appreciated. best, Joe Gordon Summit etherpad: https://etherpad.openstack.org/p/kilo-nova-priorities ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Can the Kilo nova controller conduct the Juno compute nodes
On Wed, Dec 3, 2014 at 11:09 AM, Li Junhong lijh.h...@gmail.com wrote: Hi All, Is it possible for Kilo nova controller to control the Juno compute nodes? Is this scenario supported naturally by the nova mechanism in the design and codes level? Yes, We gate on making sure we can run Kilo nova with Juno compute nodes. -- Best Regards! Junhong, Li ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Can the Kilo nova controller conduct the Juno compute nodes
On Wed, Dec 3, 2014 at 11:53 AM, Li Junhong lijh.h...@gmail.com wrote: Hi Joe, Just want to confirm one more question, in the gate testing, is the neutron/cinder/glance Kilo or Juno. Or in another word, is the controller in gate testing an all-in-one controller? Great question. In our current test neutron/cinder/glance are Kilo. But we do want to support the case where neutron/cinder/glance are Juno, as you should be able to upgrade each service independently. While we don't test it, we design around that goal, so with some testing and bug fixing it should work. On Wed, Dec 3, 2014 at 5:49 PM, Li Junhong lijh.h...@gmail.com wrote: Hi Joe, Thank you for your confirmative answer and the wonderful gate testing pipeline. On Wed, Dec 3, 2014 at 5:38 PM, Joe Gordon joe.gord...@gmail.com wrote: On Wed, Dec 3, 2014 at 11:09 AM, Li Junhong lijh.h...@gmail.com wrote: Hi All, Is it possible for Kilo nova controller to control the Juno compute nodes? Is this scenario supported naturally by the nova mechanism in the design and codes level? Yes, We gate on making sure we can run Kilo nova with Juno compute nodes. -- Best Regards! Junhong, Li ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best Regards! Junhong, Li -- Best Regards! Junhong, Li ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [python-novaclient] Status of novaclient V3
On Tue, Dec 2, 2014 at 5:21 PM, Andrey Kurilin akuri...@mirantis.com wrote: Hi! While working on fixing wrong import in novaclient v3 shell, I have found that a lot of commands, which are listed in V3 shell(novaclient.v3.shell), are broken, because appropriate managers are missed from V3 client(novaclient.V3.client.Client). Template of error is ERROR (AttributeError): 'Client' object has no attribute 'attr', where attr can be floating_ip_pools, floating_ip, security_groups, dns_entries and etc. I know that novaclient V3 is not finished yet, and I guess it will be not finished. So the main question is: What we should do with implemented code of novaclient V3 ? Should it be ported to novaclient V2.1 or it can be removed? I think it can be removed, as we are not going forward with the V3 API. But I will defer to Christopher Yeoh/Ken’ichi Ohmichi for the details. -- Best regards, Andrey Kurilin. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo][qa] Moving the hacking project under the QA program
On Tue, Dec 2, 2014 at 9:48 PM, Doug Hellmann d...@doughellmann.com wrote: The hacking project is currently managed by the Oslo team. Normally Joe Gordon handles all of the release work, and so I haven’t bothered to look at how the Launchpad project is set up or how the branches are managed before today. However, today’s issue with oslo.concurrency resulted in a need to hurry a release, and in the process of doing that I realized that it’s not set up at all like the other Oslo projects. When I started thinking about how to get that done, I also realized that maybe it’s a better fit for the QA program anyway, since it has to do with code quality and isn’t really a “library”. One of the outcomes if this whole incident is we just grew the hacking-release team to include qa-release as well. So in case another issue like this arises we just need one of three people to be available to tag a release. I talked to Matt and Joe and they agreed that the QA program would be willing to take over managing hacking. Matt posted a governance change, and this email thread is mostly so we’ll have the thought process behind the move on the record [1]. I don’t really expect any significant changes to the way hacking is managed. From my perspective, the change is more about standardizing the Oslo library management further and less about hacking. Joe is happy to have the core review team largely stay the same, although we should ask members of oslo-core if they want to still be on hacking-core rather than just assuming (talk to jogo on IRC to make sure you’re on the list if you want to be). Yup, if you are currently oslo-core and would like to continue being hacking-core just find me on IRC. Doug [1] https://review.openstack.org/138499 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] deleting the pylint test job
On Mon, Nov 24, 2014 at 9:52 AM, Sean Dague s...@dague.net wrote: The pylint test job has been broken for weeks, no one seemed to care. While waiting for other tests to return today I looked into it and figured out the fix. However, because of nova objects pylint is progressively less and less useful. So the fact that no one else looked at it means that people didn't seem to care that it was provably broken. I think it's better that we just delete the jobs and save a node on every nova patch instead. +1 Project Config Proposed here - https://review.openstack.org/#/c/136846/ If you -1 that you own fixing it, and making nova objects patches sensible in pylint. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Proposal new hacking rules
On Fri, Nov 21, 2014 at 8:57 AM, Sahid Orentino Ferdjaoui sahid.ferdja...@redhat.com wrote: On Thu, Nov 20, 2014 at 02:00:11PM -0800, Joe Gordon wrote: On Thu, Nov 20, 2014 at 9:49 AM, Sahid Orentino Ferdjaoui sahid.ferdja...@redhat.com wrote: This is something we can call nitpiking or low priority. This all seems like nitpicking for very little value. I think there are better things we can be focusing on instead of thinking of new ways to nit pick. So I am -1 on all of these. Yes as written this is low priority but something necessary for a project like Nova it is. Why do you think this is necessary? Considered that I feel sad to take your time. Can I suggest you to take no notice of this and let's others developers working on Nova too do this job ? As the maintainer of openstack-dev/hacking and as a nova core, I don't think this is worth doing at all. Nova already has enough on its plate and doesn't need extra code to review. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Scale out bug-triage by making it easier for people to contribute
On Tue, Nov 18, 2014 at 11:48 PM, Flavio Percoco fla...@redhat.com wrote: On 18/11/14 14:45 -0800, Joe Gordon wrote: On Tue, Nov 18, 2014 at 10:58 AM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Flavio Percoco's message of 2014-11-17 08:46:19 -0800: Greetings, Regardless of how big/small bugs backlog is for each project, I believe this is a common, annoying and difficult problem. At the oslo meeting today, we're talking about how to address our bug triage process and I proposed something that I've seen done in other communities (rust-language [0]) that I consider useful and a good option for OpenStack too. The process consist in a bot that sends an email to every *volunteer* with 10 bugs to review/triage for the week. Each volunteer follows the triage standards, applies tags and provides information on whether the bug is still valid or not. The volunteer doesn't have to fix the bug, just triage it. In openstack, we could have a job that does this and then have people from each team volunteer to help with triage. The benefits I see are: * Interested folks don't have to go through the list and filter the bugs they want to triage. The bot should be smart enough to pick the oldest, most critical, etc. * It's a totally opt-in process and volunteers can obviously ignore emails if they don't have time that week. * It helps scaling out the triage process without poking people around and without having to do a call for volunteers every meeting/cycle/etc The above doesn't solve the problme completely but just like reviews, it'd be an optional, completely opt-in process that people can sign up for. My experience in Ubuntu, where we encouraged non-developers to triage bugs, was that non-developers often ask the wrong questions and sometimes even harm the process by putting something in the wrong priority or state because of a lack of deep understanding. Triage in a hospital is done by experienced nurses and doctors working together, not triagers. This is because it may not always be obvious to somebody just how important a problem is. We have the same set of problems. The most important thing is that developers see it as an important task and take part. New volunteers should be getting involved at every level, not just bug triage. ++, nice analogy. Another problem I have seen, is we need to constantly re-triage bugs, as just because a bug was marked as confirmed 6 months ago doesn't mean it is still valid. Ideally, the script will take care of this. Bugs that haven't been update for more than N months will fall into the to-triage pool for re-triage. I am willing to sign up and give this a try. Flavio I think the best approach to this, like reviews, is to have a place where users can go to drive the triage workload to 0. For instance, the ubuntu server team had this report for triage: http://reqorts.qa.ubuntu.com/reports/ubuntu-server/triage-report.html Sadly, it looks like they're overwhelmed or have abandoned the effort (I hope this doesn't say something about Ubuntu server itself..), but the basic process was to move bugs off these lists. I'm sure if we ask nice the author of that code will share it with us and we could adapt it for OpenStack projects. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- @flaper87 Flavio Percoco ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Proposal new hacking rules
On Thu, Nov 20, 2014 at 9:49 AM, Sahid Orentino Ferdjaoui sahid.ferdja...@redhat.com wrote: This is something we can call nitpiking or low priority. This all seems like nitpicking for very little value. I think there are better things we can be focusing on instead of thinking of new ways to nit pick. So I am -1 on all of these. I would like we introduce 3 new hacking rules to enforce the cohesion and consistency in the base code. Using boolean assertions Some tests are written with equality assertions to validate boolean conditions which is something not clean: assertFalse([]) asserts an empty list assertEqual(False, []) asserts an empty list is equal to the boolean value False which is something not correct. Some changes has been started here but still needs to be appreciated by community: * https://review.openstack.org/#/c/133441/ * https://review.openstack.org/#/c/119366/ Using same order of arguments in equality assertions Most of the code is written with assertEqual(Expected, Observed) but some part are still using the opposite. Even if they provide any real optimisation using the same convention help reviewing and keep a better consistency in the code. assertEqual(Expected, Observed) OK assertEqual(Observed, Expected) KO A change has been started here but still needs to be appreciated by community: * https://review.openstack.org/#/c/119366/ Using LOG.warn instead of LOG.warning - We can see many time reviewers -1ed a patch to ask developer to use 'warn' instead of 'warning'. This will provide no optimisation but let's finally have something clear about what we have to use. LOG.warning: 74 LOG.warn:319 We probably want to use 'warn' Nothing has been started from what I know. Thanks, s. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] policy on old / virtually abandoned patches
On Tue, Nov 18, 2014 at 6:17 AM, Daniel P. Berrange berra...@redhat.com wrote: On Tue, Nov 18, 2014 at 07:33:54AM -0500, Sean Dague wrote: On 11/18/2014 07:29 AM, Daniel P. Berrange wrote: On Tue, Nov 18, 2014 at 07:06:59AM -0500, Sean Dague wrote: Nova currently has 197 patches that have seen no activity in the last 4 weeks (project:openstack/nova age:4weeks status:open). Of these * 108 are currently Jenkins -1 (project:openstack/nova age:4weeks status:open label:Verified=-1,jenkins) * 60 are -2 by a core team member (project:openstack/nova age:4weeks status:open label:Code-Review=-2) (note, those 2 groups sometimes overlap) Regardless, the fact that Nova currently has 792 open reviews, and 1/4 of them seem dead, seems like a cleanup thing we could do. I'd like to propose that we implement our own auto abandon mechanism based on reviews that are either held by a -2, or Jenkins -1 after 4 weeks time. I can write a quick script to abandon with a friendly message about why we are doing it, and to restore it if work is continuing. Yep, purging anything that's older than 4 weeks with negative karma seems like a good idea. It'll make it easier for us to identify those patches which are still maintained and target them for review. That said, there's some edge cases - for example I've got some patches up for review that have a -2 on them, becase we're waiting for blueprint approval. IIRC, previously we would post a warning about pending auto- abandon a week before, and thus give the author the chance to add a comment to prevent auto-abandon taking place. It would be neccessary to have this ability to deal with the case where we're just temporarily blocked on other work. Also sometimes when you have a large patch series, you might have some patches later in the series which (temporarily) fail the jenkins jobs. It often isn't worth fixing those failures until you have dealt with review earlier in the patch series. So I think we should not auto-expire patches which are in the middle of a patch series, unless the preceeding patches in the series are to be expired too. Yes this isn't something you can figure out with a single gerrit query - you'd have to query gerrit for patches and then look at the parent change references. Or just abandon and let people restore. I think handling the logic / policy for the edge cases isn't worth it when the author can very easily hit the restore button to get their patch back (and fresh for another 4w). If it was a large patch series, this wouldn't happen anyway, because every rebase would make it fresh. 4w is really 4w of nothing changing. Ok, that makes sense and is workable I reckon. ++ for bringing back auto abandon in this model. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Scale out bug-triage by making it easier for people to contribute
On Tue, Nov 18, 2014 at 10:58 AM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Flavio Percoco's message of 2014-11-17 08:46:19 -0800: Greetings, Regardless of how big/small bugs backlog is for each project, I believe this is a common, annoying and difficult problem. At the oslo meeting today, we're talking about how to address our bug triage process and I proposed something that I've seen done in other communities (rust-language [0]) that I consider useful and a good option for OpenStack too. The process consist in a bot that sends an email to every *volunteer* with 10 bugs to review/triage for the week. Each volunteer follows the triage standards, applies tags and provides information on whether the bug is still valid or not. The volunteer doesn't have to fix the bug, just triage it. In openstack, we could have a job that does this and then have people from each team volunteer to help with triage. The benefits I see are: * Interested folks don't have to go through the list and filter the bugs they want to triage. The bot should be smart enough to pick the oldest, most critical, etc. * It's a totally opt-in process and volunteers can obviously ignore emails if they don't have time that week. * It helps scaling out the triage process without poking people around and without having to do a call for volunteers every meeting/cycle/etc The above doesn't solve the problme completely but just like reviews, it'd be an optional, completely opt-in process that people can sign up for. My experience in Ubuntu, where we encouraged non-developers to triage bugs, was that non-developers often ask the wrong questions and sometimes even harm the process by putting something in the wrong priority or state because of a lack of deep understanding. Triage in a hospital is done by experienced nurses and doctors working together, not triagers. This is because it may not always be obvious to somebody just how important a problem is. We have the same set of problems. The most important thing is that developers see it as an important task and take part. New volunteers should be getting involved at every level, not just bug triage. ++, nice analogy. Another problem I have seen, is we need to constantly re-triage bugs, as just because a bug was marked as confirmed 6 months ago doesn't mean it is still valid. I think the best approach to this, like reviews, is to have a place where users can go to drive the triage workload to 0. For instance, the ubuntu server team had this report for triage: http://reqorts.qa.ubuntu.com/reports/ubuntu-server/triage-report.html Sadly, it looks like they're overwhelmed or have abandoned the effort (I hope this doesn't say something about Ubuntu server itself..), but the basic process was to move bugs off these lists. I'm sure if we ask nice the author of that code will share it with us and we could adapt it for OpenStack projects. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Dynamic VM Consolidation Agent as part of Nova
On Tue, Nov 18, 2014 at 7:40 AM, Mehdi Sheikhalishahi mehdi.alish...@gmail.com wrote: Hi, I would like to bring Dynamic VM Consolidation capability into Nova. That is I would like to check compute nodes status periodically (let's say every 15 minutes) and consolidate VMs if there is any opportunity to turn off some compute nodes. Any hints on how to get into this development process as part of nova? While I like the idea of having dynamic VM consolidation capabilities somewhere in OpenStack, it doesn't belongs in Nova. This service should live outside of Nova and just consume Nova's REST APIs. If there is some piece of information that this service would need that isn't made available via the REST API, we can fix that. Thanks, Mehdi ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Neutron][NFV][Third-party] CI for NUMA, SR-IOV, and other features that can't be tested on current infra.
On Mon, Nov 17, 2014 at 1:29 PM, Ian Wells ijw.ubu...@cack.org.uk wrote: On 12 November 2014 11:11, Steve Gordon sgor...@redhat.com wrote: NUMA We still need to identify some hardware to run third party CI for the NUMA-related work, and no doubt other things that will come up. It's expected that this will be an interim solution until OPNFV resources can be used (note cdub jokingly replied 1-2 years when asked for a rough estimate - I mention this because based on a later discussion some people took this as a serious estimate). Ian did you have any luck kicking this off? Russell and I are also endeavouring to see what we can do on our side w.r.t. this short term approach - in particular if you find hardware we still need to find an owner to actually setup and manage it as discussed. In theory to get started we need a physical multi-socket box and a virtual machine somewhere on the same network to handle job control etc. I believe the tests themselves can be run in VMs (just not those exposed by existing public clouds) assuming a recent Libvirt and an appropriately crafted Libvirt XML that ensures the VM gets a multi-socket topology etc. (we can assist with this). With apologies for the late reply, but I was off last week. And because I was off last week I've not done anything about this so far. I'm assuming that we'll just set up one physical multisocket box and ensure that we can do a cleanup-deploy cycle so that we can run whatever x86-dependent but otherwise relatively hardware agnostic tests we might need. Seems easier than worrying about what libvirt and KVM do and don't support at a given moment in time. I'll go nag our lab people for the machines. I'm thinking for the cleanup-deploy that I might just try booting the physical machine into a RAM root disk and then running a devstack setup, as it's probably faster than a clean install, but I'm open to options. (There's quite a lot of memory in the servers we have so this is likely to work fine.) That aside, where are the tests going to live? Great question, I am thinking these tests are a good candidate for functional (devstack) based tests that live in the nova tree. -- Ian. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] CI for NUMA, SR-IOV, and other features that can't be tested on current infra.
On Sun, Nov 16, 2014 at 5:31 AM, Irena Berezovsky ire...@mellanox.com wrote: Hi Steve, Regarding SR-IOV testing, at Mellanox we have CI job running on bare metal node with Mellanox SR-IOV NIC. This job is reporting on neutron patches. Currently API tests are executed. The contact person for SRIOV CI job is listed at driverlog: https://github.com/stackforge/driverlog/blob/master/etc/default_data.json#L1439 The following items are in progress: - SR-IOV functional testing Where do you envision these tests living? - Reporting CI job on nova patches Looking forward to it. I assume you will be working with the other people trying to set up assorted CI systems in this space. - Multi-node setup It worth to mention that we want to start the collaboration on SR-IOV testing effort as part of the pci pass-through subteam activity. Please join the weekly meeting if you want to collaborate or have some inputs: https://wiki.openstack.org/wiki/Meetings/Passthrough BR, Irena -Original Message- From: Steve Gordon [mailto:sgor...@redhat.com] Sent: Wednesday, November 12, 2014 9:11 PM To: itai mendelsohn; Adrian Hoban; Russell Bryant; Ian Wells (iawells); Irena Berezovsky; ba...@cisco.com Cc: Nikola Đipanov; Russell Bryant; OpenStack Development Mailing List (not for usage questions) Subject: [Nova][Neutron][NFV][Third-party] CI for NUMA, SR-IOV, and other features that can't be tested on current infra. Hi all, We had some discussions last week - particularly in the Nova NFV design session [1] - on the subject of ensuring that telecommunications and NFV-related functionality has adequate continuous integration testing. In particular the focus here is on functionality that can't easily be tested on the public clouds that back the gate, including: - NUMA (vCPU pinning, vCPU layout, vRAM layout, huge pages, I/O device locality) - SR-IOV with Intel, Cisco, and Mellanox devices (possibly others) In each case we need to confirm where we are at, and the plan going forward, with regards to having: 1) Hardware to run the CI on. 2) Tests that actively exercise the functionality (if not already in existence). 3) Point person for each setup to maintain it and report into the third-party meeting [2]. 4) Getting the jobs operational and reporting [3][4][5][6]. In the Nova session we discussed a goal of having the hardware by K-1 (Dec 18) and having it reporting at least periodically by K-2 (Feb 5). I'm not sure if similar discussions occurred on the Neutron side of the design summit. SR-IOV == Adrian and Irena mentioned they were already in the process of getting up to speed with third party CI for their respective SR-IOV configurations. Robert are you attempting similar with regards to Cisco devices? What is the status of each of these efforts versus the four items I lifted above and what do you need assistance with? NUMA We still need to identify some hardware to run third party CI for the NUMA-related work, and no doubt other things that will come up. It's expected that this will be an interim solution until OPNFV resources can be used (note cdub jokingly replied 1-2 years when asked for a rough estimate - I mention this because based on a later discussion some people took this as a serious estimate). Ian did you have any luck kicking this off? Russell and I are also endeavouring to see what we can do on our side w.r.t. this short term approach - in particular if you find hardware we still need to find an owner to actually setup and manage it as discussed. In theory to get started we need a physical multi-socket box and a virtual machine somewhere on the same network to handle job control etc. I believe the tests themselves can be run in VMs (just not those exposed by existing public clouds) assuming a recent Libvirt and an appropriately crafted Libvirt XML that ensures the VM gets a multi-socket topology etc. (we can assist with this). Thanks, Steve [1] https://etherpad.openstack.org/p/kilo-nova-nfv [2] https://wiki.openstack.org/wiki/Meetings/ThirdParty [3] http://ci.openstack.org/third_party.html [4] http://www.joinfu.com/2014/01/understanding-the-openstack-ci-system/ [5] http://www.joinfu.com/2014/02/setting-up-an-external-openstack-testing-system/ [6] http://www.joinfu.com/2014/02/setting-up-an-openstack-external-testing-system-part-2/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Wednesday Bug Day
Hi All, At the kilo nova meeting up in Paris, We had a lengthy discussion on better bug management. One of the action items was to make every Wednesday a bug day nova in #openstack-nova [0]. So tomorrow will be our first attempt at Bug Wednesday. One of the issues we found with previous bug meeting efforts was holding them off in a meeting room away from most of the nova developers. Instead we are trying to set aside all of Wednesday as the day where nova developers get together and discuss potential bugs, bug fixes in #openstack-nova. So if you found a bug or are working on a bug fix and would like feedback (or a review), join #openstack-nova and we, the nova team, will try to help out. best, Joe P.S. We are not sure if this will work, but at the meetup we agreed it was at least worth a try. [0] https://etherpad.openstack.org/p/kilo-nova-meetup ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Multi-node testing (for redis and others...)
On Mon, Nov 17, 2014 at 1:06 PM, Joshua Harlow harlo...@outlook.com wrote: Hi guys, A recent question came up about how do we test better with redis for tooz. I think this question is also relevant for ceilometer (and other users of redis) and in general applies to the whole of openstack as the larger system is what people run (I hope not everyone just runs devstack on a single-node and that's where they stop, ha). https://review.openstack.org/#/c/106043/23 The basic question is that redis (or zookeeper) have (and typically are) ways to be setup with multi-node instances (for example with redis + sentinel or zookeeper in multi-node configurations, or the newly released redis clustering...). It seems though that our testing infrastructure is setup to do the basics of tests (which isn't bad, but does have its limits), and this got me thinking on what would be needed to actually test these multi-node configurations of things like redis (configured in sentinel mode, or redis in clustering mode) in a realistic manner that tests 'common' failure patterns (net splits for example). I guess we can split it up into 3 or 4 (or more questions). 1. How do we get a multi-node configuration (of say redis) setup in the first place, configured so that all nodes are running and sentinel (for example) is running as expected? 2. How do we then inject failures into this setup to ensure that the applications and clients built ontop of those systems reliably handle these type of injected failures (something like https://github.com/aphyr/jepsen or similar?). 3. How do we analyze those results (for when #2 doesn't turn out to work as expected) in a meaningful manner, so that we can then turn those experiments into more reliable software? Anyone else have any interesting ideas for this? -Josh ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder] [barbican] Stable check of openstack/cinder failed
the release of the new barbicanclient caused another bug as well: https://bugs.launchpad.net/cinder/+bug/1388414, this one is causing all grenade jobs on master to fail. It looks like we have a hole in the gating logic somewhere. On Sat, Nov 1, 2014 at 3:42 PM, Alan Pevec ape...@gmail.com wrote: Hi, cinder juno tests are failing after new barbicanclient release - periodic-cinder-python26-juno http://logs.openstack.org/periodic-stable/periodic-cinder-python26-juno/d660c21 : FAILURE in 11m 37s - periodic-cinder-python27-juno http://logs.openstack.org/periodic-stable/periodic-cinder-python27-juno/d9bf4cb : FAILURE in 9m 04s I've filed https://bugs.launchpad.net/cinder/+bug/1388461 AFACT this affects master too. Cheers, Alan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [All] Finalizing cross-project design summit track
On Thu, Oct 30, 2014 at 4:01 AM, Thierry Carrez thie...@openstack.org wrote: Jay Pipes wrote: On 10/29/2014 09:07 PM, Russell Bryant wrote: On 10/29/2014 06:46 PM, Rochelle Grober wrote: Any chance we could use the opening to move either the Refstack session or the logging session from their current joint (and conflicting) time (15:40)? QA really would be appreciated at both. And I'd really like to be at both. I'd say the Refstack one would go better in the debug slot, as the API stuff is sort of related to the logging. Switching with one of the 14:50 sessions might also work. Just hoping. I really want great participation at all of these sessions. The gate debugging session is most likely going to be dropped at this point. I don't see a big problem with moving the refstack one to that slot (the first time). Anyone else have a strong opinion on this? Sounds good to me. Sounds good. With the gate debugging session being dropped due to being the wrong format to be productive, we now need a new session. After looking over the etherpad of proposed cross project sessions I think there is one glaring omission: the SDK. In the Kilo Cycle Goals Exercise thread [0] having a real SDK was one of the top answers. Many folks had great responses that clearly explained the issues end users are having [1]. As for who could lead a session like this I have two ideas: Monty Taylor, who had one of the most colorful explanations to why this is so critical, or Dean Troyer, one of the few people actually working on this right now. I think it would be embarrassing if we had no cross project session on SDKs, since there appears to be a consensus that the making life easier for the end user is a high priority. The current catch is, the free slot is now at 15:40, so it would compete with 'How to Tackle Technical Debt in Kilo,' a session which I expect to be very popular with the same people who would be interested in attending a SDK session. [0] http://lists.openstack.org/pipermail/openstack-dev/2014-September/044766.html [1] https://etherpad.openstack.org/p/6cWQG9oNsr -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [All] Finalizing cross-project design summit track
On Thu, Oct 30, 2014 at 9:20 AM, Anne Gentle a...@openstack.org wrote: On Thu, Oct 30, 2014 at 10:53 AM, Joe Gordon joe.gord...@gmail.com wrote: On Thu, Oct 30, 2014 at 4:01 AM, Thierry Carrez thie...@openstack.org wrote: Jay Pipes wrote: On 10/29/2014 09:07 PM, Russell Bryant wrote: On 10/29/2014 06:46 PM, Rochelle Grober wrote: Any chance we could use the opening to move either the Refstack session or the logging session from their current joint (and conflicting) time (15:40)? QA really would be appreciated at both. And I'd really like to be at both. I'd say the Refstack one would go better in the debug slot, as the API stuff is sort of related to the logging. Switching with one of the 14:50 sessions might also work. Just hoping. I really want great participation at all of these sessions. The gate debugging session is most likely going to be dropped at this point. I don't see a big problem with moving the refstack one to that slot (the first time). Anyone else have a strong opinion on this? Sounds good to me. Sounds good. With the gate debugging session being dropped due to being the wrong format to be productive, we now need a new session. After looking over the etherpad of proposed cross project sessions I think there is one glaring omission: the SDK. In the Kilo Cycle Goals Exercise thread [0] having a real SDK was one of the top answers. Many folks had great responses that clearly explained the issues end users are having [1]. As for who could lead a session like this I have two ideas: Monty Taylor, who had one of the most colorful explanations to why this is so critical, or Dean Troyer, one of the few people actually working on this right now. I think it would be embarrassing if we had no cross project session on SDKs, since there appears to be a consensus that the making life easier for the end user is a high priority. There are many discussion sessions related to SDKs, they just aren't all in the cross-project slots. Plus these don't require an ATC badge (something users may not have). If we want to make sure the end user has a more uniform experience, having the individual python-*client discussions isn't sufficient. Also, the issue is not lack of user feedback, the issue here is more of a lack of people implementing the feedback. Application Ecosystem Working Group https://wiki.openstack.org/wiki/Application_Ecosystem_Working_Group Monday 2:30 (Degas) Thursday 1:40 (Hyatt) These sessions have pretty broad scopes, and I don't think a discussion on SDKs here is enough, since the issue isn't a lack of feedback. I think we can talk about the real SDK at one of these. There's also: Getting Started with the OpenStack Python SDK Monday 4:20 (Room 242AB) This isn't a a design summit session, so it doesn't really make sense to do future design work here. Anne The current catch is, the free slot is now at 15:40, so it would compete with 'How to Tackle Technical Debt in Kilo,' a session which I expect to be very popular with the same people who would be interested in attending a SDK session. [0] http://lists.openstack.org/pipermail/openstack-dev/2014-September/044766.html [1] https://etherpad.openstack.org/p/6cWQG9oNsr -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][CI] nova-networking or neutron netwokring for CI
On Wed, Oct 29, 2014 at 2:23 AM, Andreas Scheuring scheu...@linux.vnet.ibm.com wrote: Thanks for the feedback. OK, got it, nova networking is not a requirement for a CI. Then I'll see not a single reason to support it. We will investigate in the neutron way for our CI and also for production. I wouldn't go so far as to say there is no reason to support nova-networking in your CI system, and we are working on deprecating it. It hasn't been deprecated yet. Ideally you could test both neutron and nova-network, but if you had to choose one it should be neutron. Now coming back to the Hypervisorsupportmatrix ( https://wiki.openstack.org/wiki/HypervisorSupportMatrix ). I guess the scope of this matrix is only nova and not neutron,cinder,.. isn't it? So in this matrix I have to tick the networking lines (vlan, routing,..) as NOT supported, right? (as scope is neutron-networking, altough we would support it with neutron). Correct, if you don' support nova-network at all, then you should have an 'X' in those boxes. Thanks, Andreas -- Andreas (irc: scheuran) On Tue, 2014-10-28 at 11:06 -0700, Joe Gordon wrote: On Tue, Oct 28, 2014 at 6:44 AM, Dan Smith d...@danplanet.com wrote: Are current nova CI platforms configured with nova-networking or with neutron networking? Or is networking in general not even a part of the nova CI approach? I think we have several that only run on Neutron, so I think it's fine to just do that. Agreed, neutron should be considered required for all of the reasons listed above. --Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] TC election by the numbers
On Wed, Oct 29, 2014 at 6:27 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Eoghan Glynn's message of 2014-10-29 16:37:42 -0700: On Oct 29, 2014, at 3:32 PM, Eoghan Glynn egl...@redhat.com wrote: Folks, I haven't seen the customary number-crunching on the recent TC election, so I quickly ran the numbers myself. Voter Turnout = The turnout rate continues to decline, in this case from 29.7% to 26.7%. Here's how the participation rates have shaped up since the first TC2.0 election: Election | Electorate | Voted | Turnout | Change 10/2013 | 1106 | 342 | 30.9% | -8.0% 04/2014 | 1510 | 448 | 29.7% | -4.1% 10/2014 | 1892 | 506 | 26.7% | -9.9% Overall percentage of the electorate voting is declining, but absolute numbers of voters has increased. And in fact, the electorate has grown more than the turnout has declined. True that, but AFAIK the generally accepted metric on participation rates in elections is turnout as opposed to absolute voter numbers. IIRC, there is no method for removing foundation members. So there are likely a number of people listed who have moved on to other activities and are no longer involved with OpenStack. I'd actually be quite interested to see the turnout numbers with voters who missed the last two elections prior to this one filtered out. Sounds like you need to freshen up on your bylaws ;). There are methods to remove foundation members: Bylaws Appendix 1 Section 3 [0]. Also you have to be an ATC not just a Individual Member to vote, Appendix 4 Section 2 [1] [0] http://www.openstack.org/legal/individual-member-policy/ [1] http://www.openstack.org/legal/technical-committee-member-policy/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][CI] nova-networking or neutron netwokring for CI
On Tue, Oct 28, 2014 at 6:44 AM, Dan Smith d...@danplanet.com wrote: Are current nova CI platforms configured with nova-networking or with neutron networking? Or is networking in general not even a part of the nova CI approach? I think we have several that only run on Neutron, so I think it's fine to just do that. Agreed, neutron should be considered required for all of the reasons listed above. --Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Summit] Coordination between OpenStack lower layer virt stack (libvirt, QEMU/KVM)
On Oct 21, 2014 4:10 AM, Daniel P. Berrange berra...@redhat.com wrote: On Tue, Oct 21, 2014 at 12:58:48PM +0200, Kashyap Chamarthy wrote: I was discussing $subject on #openstack-nova, Nikola Dipanov suggested Sounds like a great idea. it's worthwhile to bring this up on the list. I was looking at http://kilodesignsummit.sched.org/ and noticed there's no specific session (correct me if I'm wrong) that's targeted at coordination between OpenStack - libvirt - QEMU/KVM. At previous summits, Nova has given each virt driver a dedicated session in its track. Those sessions have pretty much just been a walkthrough of the various features each virt team was planning. We always have far more topics to discuss than we have time available, and for this summit we want to change direction to maximise the value extracted from face-to-face meetings. As such any session which is just duplicating stuff that could easily be dealt with over email or irc is being cut, to make room for topics where we really need to have the f2f discussions. So the virt driver general sessions from previous summits are not likely to be on the schedule this time around. Agreed, this mailing list is a great place to kick off the closer libvirt QEMU/KVM discussions. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [novaclient] E12* rules
On Fri, Oct 17, 2014 at 6:40 AM, Andrey Kurilin akuri...@mirantis.com wrote: Hi everyone! I'm working on enabling E12* PEP8 rules in novaclient(status of my work listed below). Imo, PEP8 rules should be ignored only in extreme cases/for important reasons and we should decrease a number of ignored rules. This helps to keep code in more strict, readable form, which is very important when working in community. While working on rule E126, we started discussion with Joe Gordon about demand of these rules. I have no idea about reasons of why they should be ignored, so I want to know: - Why these rules should be ignored? - What do you think about enabling these rules? I found the source of my confusion. See my inline comments in https://review.openstack.org/#/c/122888/10/tox.ini Hopefully this patch should clarify things: https://review.openstack.org/129677 Please, leave your opinion about E12* rules. Already enabled rules: E121,E125 - https://review.openstack.org/#/c/122888/ E122 - https://review.openstack.org/#/c/123830/ E123 - https://review.openstack.org/#/c/123831/ Abandoned rule: E124 - https://review.openstack.org/#/c/123832/ Pending review: E126 - https://review.openstack.org/#/c/123850/ E127 - https://review.openstack.org/#/c/123851/ E128 - https://review.openstack.org/#/c/127559/ E129 - https://review.openstack.org/#/c/123852/ -- Best regards, Andrey Kurilin. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FreeBSD host support
On Sat, Oct 18, 2014 at 10:04 AM, Roman Bogorodskiy rbogorods...@mirantis.com wrote: Hi, In discussion of this spec proposal: https://review.openstack.org/#/c/127827/ it was suggested by Joe Gordon to start a discussion on the mailing list. So I'll share my thoughts and a long term plan on adding FreeBSD host support for OpenStack. An ultimate goal is to allow using libvirt/bhyve as a compute driver. However, I think it would be reasonable to start with libvirt/qemu support first as it will allow to prepare the ground. Before diving into the technical details below, I have one question. Why, What is the benefit of this, besides the obvious 'we not support FreeBSD'? Adding support for a new kernel introduces yet another column in our support matrix, and will require a long term commitment to testing and maintaining OpenStack on FreeBSD. High level overview of what needs to be done: - Nova * linux_net needs to be re-factored to allow to plug in FreeBSD support (that's what the spec linked above is about) * nova.virt.disk.mount needs to be extended to support FreeBSD's mdconfig(8) in a similar way to Linux's losetup - Glance and Keystone These components are fairly free of system specifics. Most likely they will require some small fixes like e.g. I made for Glance https://review.openstack.org/#/c/94100/ - Cinder I didn't look close at Cinder from a porting perspective, tbh. Obviously, it'll need some backend driver that would work on FreeBSD, e.g. ZFS. I've seen some patches floating around for ZFS though. Also, I think it'll need an implementation of iSCSI stack on FreeBSD, because it has its own stack, not stgt. On the other hand, Cinder is not required for a minimal installation and that could be done after adding support of the other components. What about neutron? We are in the process of trying to deprecate nova-network, so any new thing needs to support neutron. Also, it's worth to mention that a discussion on this topic already happened on this maillist: http://lists.openstack.org/pipermail/openstack-dev/2014-March/031431.html Some of the limitations were resolved since then, specifically, libvirt/bhyve has no limitation on count of disk and ethernet devices anymore. Roman Bogorodskiy ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] request_id deprecation strategy question
On Mon, Oct 20, 2014 at 11:12 AM, gordon chung g...@live.ca wrote: The issue I'm highlighting is that those projects using the code now have to update their api-paste.ini files to import from the new location, presumably while giving some warning to operators about the impending removal of the old code. This was the issue i ran into when trying to switch projects to oslo.middleware where i couldn't get jenkins to pass -- grenade tests successfully did their job. we had a discussion on openstack-qa and it was suggested to add a upgrade script to grenade to handle the new reference and document the switch. [1] if there's any issue with this solution, feel free to let us know. Going down this route means every deployment that wishes to upgrade now has an extra step, and should be avoided whenever possible. Why not just have a wrapper in project.openstack.common pointing to the new oslo.middleware library. If that is not a viable solution, we should give operators one full cycle where the oslo-incubator version is deprecated and they can migrate to the new copy outside of the upgrade process itself. Since there is no deprecation warning in Juno [0], We can deprecate the oslo-incubator copy in Kilo and remove in L. [0] first email in this thread [1] http://eavesdrop.openstack.org/irclogs/%23openstack-qa/%23openstack-qa.2014-10-10.log (search for gordc) cheers, *gord* ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Removing nova-bm support within os-cloud-config
On Mon, Oct 20, 2014 at 8:00 PM, Steve Kowalik ste...@wedontsleep.org wrote: With the move to removing nova-baremetal, I'm concerned that portions of os-cloud-config will break once python-novaclient has released with the bits of the nova-baremetal gone -- import errors, and such like. Nova won't be removing nova-baremetal support in the client until Juno is end of lifed. As clients aren't part of the integrated release and need to work with all supported versions. I'm also concerned about backward compatibility -- in that we can't really remove the functionality, because it will break that compatibility. A further concern is that because nova-baremetal is no longer checked in CI, code paths may bitrot. Should we pony up and remove support for talking to nova-baremetal in os-cloud-config? Or any other suggestions? -- Steve If it (dieting) was like a real time strategy game, I'd have loaded a save game from ten years ago. - Greg, Columbia Internet ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] add cyclomatic complexity check to pep8 target
On Thu, Oct 16, 2014 at 8:11 PM, Angus Salkeld asalk...@mirantis.com wrote: Hi all I came across some tools [1] [2] that we could use to make sure we don't increase our code complexity. Has anyone had any experience with these or other tools? Flake8 (and thus hacking) has built in McCabe Complexity checking. flake8 --select=C --max-complexity 10 https://github.com/flintwork/mccabe http://flake8.readthedocs.org/en/latest/warnings.html Example on heat: http://paste.openstack.org/show/121561 Example in nova (max complexity of 20): http://paste.openstack.org/show/121562 radon is the underlying reporting tool and xenon is a monitor - meaning it will fail if a threshold is reached. To save you the time: radon cc -nd heat heat/engine/stack.py M 809:4 Stack.delete - E M 701:4 Stack.update_task - D heat/engine/resources/server.py M 738:4 Server.handle_update - D M 891:4 Server.validate - D heat/openstack/common/jsonutils.py F 71:0 to_primitive - D heat/openstack/common/config/generator.py F 252:0 _print_opt - D heat/tests/v1_1/fakes.py M 240:4 FakeHTTPClient.post_servers_1234_action - F It ranks the complexity from A (best) upwards, the command above (-nd) says only show D or worse. If you look at these methods they are getting out of hand and are becoming difficult to understand. I like the idea of having a threshold that says we are not going to just keep adding to the complexity of these methods. This can be enforced with: xenon --max-absolute E heat ERROR:xenon:block heat/tests/v1_1/fakes.py:240 post_servers_1234_action has a rank of F [1] https://pypi.python.org/pypi/radon [2] https://pypi.python.org/pypi/xenon If people are open to this, I'd like to add these to the test-requirements and trial this in Heat (as part of the pep8 tox target). I think the idea of gating on complexity is a great idea and would like to see nova adopt this as well. But why not just use flake8's built in stuff? Regards Angus ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] add cyclomatic complexity check to pep8 target
On Thu, Oct 16, 2014 at 8:53 PM, Morgan Fainberg morgan.fainb...@gmail.com wrote: I agree we should use flake8 built-in if at all possible. I complexity checking will definitely help us in the long run keeping code maintainable. Well this is scary: ./nova/virt/libvirt/driver.py:3736:1: C901 'LibvirtDriver._get_guest_config' is too complex (67) http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py#n373 http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py#n3736 to *http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py#n4113 http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py#n4113* First step in fixing this, put a cap on it: goog_106984861 https://review.openstack.org/129125 +1 from me. — Morgan Fainberg On October 16, 2014 at 20:45:35, Joe Gordon (joe.gord...@gmail.com) wrote: On Thu, Oct 16, 2014 at 8:11 PM, Angus Salkeld wrote: Hi all I came across some tools [1] [2] that we could use to make sure we don't increase our code complexity. Has anyone had any experience with these or other tools? Flake8 (and thus hacking) has built in McCabe Complexity checking. flake8 --select=C --max-complexity 10 https://github.com/flintwork/mccabe http://flake8.readthedocs.org/en/latest/warnings.html Example on heat: http://paste.openstack.org/show/121561 Example in nova (max complexity of 20): http://paste.openstack.org/show/121562 radon is the underlying reporting tool and xenon is a monitor - meaning it will fail if a threshold is reached. To save you the time: radon cc -nd heat heat/engine/stack.py M 809:4 Stack.delete - E M 701:4 Stack.update_task - D heat/engine/resources/server.py M 738:4 Server.handle_update - D M 891:4 Server.validate - D heat/openstack/common/jsonutils.py F 71:0 to_primitive - D heat/openstack/common/config/generator.py F 252:0 _print_opt - D heat/tests/v1_1/fakes.py M 240:4 FakeHTTPClient.post_servers_1234_action - F It ranks the complexity from A (best) upwards, the command above (-nd) says only show D or worse. If you look at these methods they are getting out of hand and are becoming difficult to understand. I like the idea of having a threshold that says we are not going to just keep adding to the complexity of these methods. This can be enforced with: xenon --max-absolute E heat ERROR:xenon:block heat/tests/v1_1/fakes.py:240 post_servers_1234_action has a rank of F [1] https://pypi.python.org/pypi/radon [2] https://pypi.python.org/pypi/xenon If people are open to this, I'd like to add these to the test-requirements and trial this in Heat (as part of the pep8 tox target). I think the idea of gating on complexity is a great idea and would like to see nova adopt this as well. But why not just use flake8's built in stuff? Regards Angus ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Automatic evacuate
On Mon, Oct 13, 2014 at 1:32 PM, Adam Lawson alaw...@aqorn.com wrote: Looks like this was proposed and denied to be part of Nova for some reason last year. Thoughts on why and is the reasoning (whatever it was) still applicable? Link? *Adam Lawson* AQORN, Inc. 427 North Tatnall Street Ste. 58461 Wilmington, Delaware 19801-2230 Toll-free: (844) 4-AQORN-NOW ext. 101 International: +1 302-387-4660 Direct: +1 916-246-2072 On Mon, Oct 13, 2014 at 1:26 PM, Adam Lawson alaw...@aqorn.com wrote: [switching to openstack-dev] Has anyone automated nova evacuate so that VM's on a failed compute host using shared storage are automatically moved onto a new host or is manually entering *nova compute instance host* required in all cases? If it's manual only or require custom Heat/Ceilometer templates, how hard would it be to enable automatic evacuation within Novs? i.e. (within /etc/nova/nova.conf) auto_evac = true Or is this possible now and I've simply not run across it? *Adam Lawson* AQORN, Inc. 427 North Tatnall Street Ste. 58461 Wilmington, Delaware 19801-2230 Toll-free: (844) 4-AQORN-NOW ext. 101 International: +1 302-387-4660 Direct: +1 916-246-2072 On Sat, Sep 27, 2014 at 12:32 AM, Clint Byrum cl...@fewbar.com wrote: So, what you're looking for is basically the same old IT, but with an API. I get that. For me, the point of this cloud thing is so that server operators can make _reasonable_ guarantees, and application operators can make use of them in an automated fashion. If you start guaranteeing 4 and 5 nines for single VM's, you're right back in the boat of spending a lot on server infrastructure even if your users could live without it sometimes. Compute hosts are going to go down. Networks are going to partition. It is not actually expensive to deal with that at the application layer. In fact when you know your business rules, you'll do a better job at doing this efficiently than some blanket replicate all the things layer might. I know, some clouds are just new ways to chop up these fancy 40 core megaservers that everyone is shipping. I'm sure OpenStack can do it, but I'm saying, I don't think OpenStack _should_ do it. Excerpts from Adam Lawson's message of 2014-09-26 20:30:29 -0700: Generally speaking that's true when you have full control over how you deploy applications as a consumer. As a provider however, cloud resiliency is king and it's generally frowned upon to associate instances directly to the underlying physical hardware for any reason. It's good when instances can come and go as needed, but in a production context, a failed compute host shouldn't take down every instance hosted on it. Otherwise there is no real abstraction going on and the cloud loses immense value. On Sep 26, 2014 4:15 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Adam Lawson's message of 2014-09-26 14:43:40 -0700: Hello fellow stackers. I'm looking for discussions/plans re VM continuity. I.e. Protection for instances using ephemeral storage against host failures or auto-failover capability for instances on hosts where the host suffers from an attitude problem? I know fail-overs are supported and I'm quite certain auto-fail-overs are possible in the event of a host failure (hosting instances not using shared storage). I just can't find where this has been addressed/discussed. Someone help a brother out? ; ) I'm sure some of that is possible, but it's a cloud, so why not do things the cloud way? Spin up redundant bits in disparate availability zones. Replicate only what must be replicated. Use volumes for DR only when replication would be too expensive. Instances are cattle, not pets. Keep them alive just long enough to make your profit. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openst...@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-qa] Post job failures
What about using graphite + logstash to power a post-job /nightly-job/post-merge-periodic (the new thing we talked about in Germany) dashboard? There are a few different use cases for a dashboard for jobs that don't report on gerrit changes. * Track the success an failure rates over time * If I am maintaining a a job that doesn't vote anywhere, I will check this daily * If I am part of the core team of a project where one feature is tested post-merge, I want to periodically check this to see if that feature is being maintained. * Provide links to logs for failed jobs so the cause of the failure can be investigated We can do all this with graphite on logstash. Graphite for the tracking the trends (something like http://jogo.github.io/gate/) and logstash to find the logs for failed jobs (we can get around the 10 day logstash window by saving the results instead of overwriting them every time we regenerate the list of log links) And if we really want some sort of alerts, there are a lot of graphite tools (http://graphite.readthedocs.org/en/latest/tools.html) that can give us alerts on metrics (alert me if the last X runs of job-foo-bar failed). On Wed, Oct 1, 2014 at 9:46 AM, Jeremy Stanley fu...@yuggoth.org wrote: On 2014-10-01 10:39:40 -0400 (-0400), Matthew Treinish wrote: [...] So I actually think as a first pass this would be the best way to handle it. You can leave comments on a closed gerrit changes, [...] Not so easy as it sounds. Jobs in post are running on an arbitrary Git commit (more often than not, a merge commit), and mapping that back to a change in Gerrit is nontrivial. -- Jeremy Stanley ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] gerrit based statistics
Recently there has been a lot of discussion around the development growing pains in nova. Instead of guessing about how bad some of the issues are, I tried to answer a few questions that may help us better understand the issues. Q: How many revisions does it take to merge a patch? Average: 6.76 revisions median: 4.0 revisions Q: How many rechecks/verifies does it take to merge a patch (ignoring rechecks where the same job failed before and after)? Average: 0.749 rechecks per patch revision median: 0.4285 rechecks per patch revision For comparison here are the same results for tempest, which has a lot more gating tests: Average: 1.01591525738 median: 0.6 Q: How long does it take for a patch to get approved? Average: 28 days median: 11 days Q: How long does it take for a patch to get approved that touches 'nova/virt/'? Average: 34 days median: 18 days When looking at these numbers two things stick out out: * We successfully use recheck an awful lot. More then I expected * Patches that touch 'nova/virt' take about 20% more time to land or about 6 days. While that is definitely a difference, its smaller then I expected Dataset: last 800 patches in nova Code: https://github.com/jogo/gerrit-fun ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] gerrit based statistics
On Wed, Oct 8, 2014 at 4:30 PM, Joe Gordon joe.gord...@gmail.com wrote: Recently there has been a lot of discussion around the development growing pains in nova. Instead of guessing about how bad some of the issues are, I tried to answer a few questions that may help us better understand the issues. Q: How many revisions does it take to merge a patch? Average: 6.76 revisions median: 4.0 revisions Q: How many rechecks/verifies does it take to merge a patch (ignoring rechecks where the same job failed before and after)? Average: 0.749 rechecks per patch revision median: 0.4285 rechecks per patch revision For comparison here are the same results for tempest, which has a lot more gating tests: Average: 1.01591525738 median: 0.6 Q: How long does it take for a patch to get approved? Average: 28 days median: 11 days Q: How long does it take for a patch to get approved that touches 'nova/virt/'? Average: 34 days median: 18 days To expand on these numbers, same results for last 6 months of commits: all of nova (1723 patches): Average: 28.8 median: 11.0 nova/virt (476 patches): Average: 34.5 When looking at these numbers two things stick out out: * We successfully use recheck an awful lot. More then I expected * Patches that touch 'nova/virt' take about 20% more time to land or about 6 days. While that is definitely a difference, its smaller then I expected Dataset: last 800 patches in nova Code: https://github.com/jogo/gerrit-fun ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] gerrit based statistics
On Wed, Oct 8, 2014 at 6:12 PM, Michael Still mi...@stillhq.com wrote: On Thu, Oct 9, 2014 at 11:58 AM, Joe Gordon joe.gord...@gmail.com wrote: On Wed, Oct 8, 2014 at 4:30 PM, Joe Gordon joe.gord...@gmail.com wrote: Recently there has been a lot of discussion around the development growing pains in nova. Instead of guessing about how bad some of the issues are, I tried to answer a few questions that may help us better understand the issues. Q: How many revisions does it take to merge a patch? Average: 6.76 revisions median: 4.0 revisions Q: How many rechecks/verifies does it take to merge a patch (ignoring rechecks where the same job failed before and after)? Average: 0.749 rechecks per patch revision median: 0.4285 rechecks per patch revision For comparison here are the same results for tempest, which has a lot more gating tests: Average: 1.01591525738 median: 0.6 Q: How long does it take for a patch to get approved? Average: 28 days median: 11 days Q: How long does it take for a patch to get approved that touches 'nova/virt/'? Average: 34 days median: 18 days To expand on these numbers, same results for last 6 months of commits: all of nova (1723 patches): Average: 28.8 median: 11.0 nova/virt (476 patches): Average: 34.5 I think it would be interesting to break this up by driver directory... Are there drivers which take longer to land code for than others? Like this? subtree: None (1724 patches): Average: 28.7 median: 11.0 subtree: nova/virt/ (476 patches): Average: 34.5 median: 18.0 subtree: nova/virt/hyperv/ (38 patches): Average: 46.8 median: 33.0 subtree: nova/virt/libvirt/ (224 patches): Average: 35.9 median: 18.0 subtree: nova/virt/xenapi/ (72 patches): Average: 39.5 median: 20.0 subtree: nova/virt/vmwareapi/ (134 patches): Average: 38.7 median: 26.0 When looking at these numbers two things stick out out: * We successfully use recheck an awful lot. More then I expected * Patches that touch 'nova/virt' take about 20% more time to land or about 6 days. While that is definitely a difference, its smaller then I expected Dataset: last 800 patches in nova Code: https://github.com/jogo/gerrit-fun ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Resource tracker
On Tue, Oct 7, 2014 at 10:56 AM, Vishvananda Ishaya vishvana...@gmail.com wrote: On Oct 7, 2014, at 6:21 AM, Daniel P. Berrange berra...@redhat.com wrote: On Mon, Oct 06, 2014 at 02:55:20PM -0700, Joe Gordon wrote: On Mon, Oct 6, 2014 at 6:03 AM, Gary Kotton gkot...@vmware.com wrote: Hi, At the moment the resource tracker in Nova ignores that statistics that are returned by the hypervisor and it calculates the values on its own. Not only is this highly error prone but it is also very costly – all of the resources on the host are read from the database. Not only the fact that we are doing something very costly is troubling, the fact that we are over calculating resources used by the hypervisor is also an issue. In my opinion this leads us to not fully utilize hosts at our disposal. I have a number of concerns with this approach and would like to know why we are not using the actual resource reported by the hypervisor. The reason for asking this is that I have added a patch which uses the actual hypervisor resources returned and it lead to a discussion on the particular review (https://review.openstack.org/126237). So it sounds like you have mentioned two concerns here: 1. The current method to calculate hypervisor usage is expensive in terms of database access. 2. Nova ignores that statistics that are returned by the hypervisor and uses its own calculations. To #1, maybe we can doing something better, optimize the query, cache the result etc. As for #2 nova intentionally doesn't use the hypervisor statistics for a few reasons: * Make scheduling more deterministic, make it easier to reproduce issues etc. * Things like memory ballooning and thin provisioning in general, mean that the hypervisor is not reporting how much of the resources can be allocated but rather how much are currently in use (This behavior can vary from hypervisor to hypervisor today AFAIK -- which makes things confusing). So if I don't want to over subscribe RAM, and the hypervisor is using memory ballooning, the hypervisor statistics are mostly useless. I am sure there are more complex schemes that we can come up with that allow us to factor in the properties of thin provisioning, but is the extra complexity worth it? That is just an example of problems with the way Nova virt drivers /currently/ report usage to the schedular. It is easily within the realm of possibility for the virt drivers to be changed so that they report stats which take into account things like ballooning and thin provisioning so that we don't oversubscribe. Ignoring the hypervisor stats entirely and re-doing the calculations in the resource tracker code is just a crude workaround really. It is just swapping one set of problems for a new set of problems. I agree, lets make reported hypervisor stats actually useful for scheduling. This would mean we can have fewer config options (currently the operator has to set aside resources for the underlying OS via a config option). +1 lets make the hypervisors report detailed enough information that we can do it without having to recalculate. Do we have any idea of how expensive recalculating this information is? Vish That being said I am fine with discussing in a spec the idea of adding an option to use the hypervisor reported statistics, as long as it is off by default. I'm against the idea of adding config options to switch between multiple codepaths because it is just punting the problem to the admins who are in an even worse position to decide what is best. It is saying would you rather your cloud have bug A or have bug B. We should be fixing the data the hypervisors report so that the resource tracker doesn't have to ignore them, and give the admins something which just works and avoid having to choose between 2 differently broken options. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Quota management and enforcement across projects
On Fri, Oct 3, 2014 at 10:47 AM, Morgan Fainberg morgan.fainb...@gmail.com wrote: Keeping the enforcement local (same way policy works today) helps limit the fragility, big +1 there. I also agree with Vish, we need a uniform way to talk about quota enforcement similar to how we have a uniform policy language / enforcement model (yes I know it's not perfect, but it's far closer to uniform than quota management is). It sounds like maybe we should have an oslo library for quotas? Somewhere where we can share the code,but keep the operations local to each service. If there is still interest of placing quota in keystone, let's talk about how that will work and what will be needed from Keystone . The previous attempt didn't get much traction and stalled out early in implementation. If we want to revisit this lets make sure we have the resources needed and spec(s) in progress / info on etherpads (similar to how the multitenancy stuff was handled at the last summit) as early as possible. Why not centralize quota management via the python-openstackclient, what is the benefit of getting keystone involved? Cheers, Morgan Sent via mobile On Friday, October 3, 2014, Salvatore Orlando sorla...@nicira.com wrote: Thanks Vish, this seems a very reasonable first step as well - and since most projects would be enforcing quotas in the same way, the shared library would be the logical next step. After all this is quite the same thing we do with authZ. Duncan is expressing valid concerns which in my opinion can be addressed with an appropriate design - and a decent implementation. Salvatore On 3 October 2014 18:25, Vishvananda Ishaya vishvana...@gmail.com wrote: The proposal in the past was to keep quota enforcement local, but to put the resource limits into keystone. This seems like an obvious first step to me. Then a shared library for enforcing quotas with decent performance should be next. The quota calls in nova are extremely inefficient right now and it will only get worse when we try to add hierarchical projects and quotas. Vish On Oct 3, 2014, at 7:53 AM, Duncan Thomas duncan.tho...@gmail.com wrote: Taking quota out of the service / adding remote calls for quota management is going to make things fragile - you've somehow got to deal with the cases where your quota manager is slow, goes away, hiccups, drops connections etc. You'll also need some way of reconciling actual usage against quota usage periodically, to detect problems. On 3 October 2014 15:03, Salvatore Orlando sorla...@nicira.com wrote: Hi, Quota management is currently one of those things where every openstack project does its own thing. While quotas are obviously managed in a similar way for each project, there are subtle differences which ultimately result in lack of usability. I recall that in the past there have been several calls for unifying quota management. The blueprint [1] for instance, hints at the possibility of storing quotas in keystone. On the other hand, the blazar project [2, 3] seems to aim at solving this problem for good enabling resource reservation and therefore potentially freeing openstack projects from managing and enforcing quotas. While Blazar is definetely a good thing to have, I'm not entirely sure we want to make it a required component for every deployment. Perhaps single projects should still be able to enforce quota. On the other hand, at least on paper, the idea of making Keystone THE endpoint for managing quotas, and then letting the various project enforce them, sounds promising - is there any reason for which this blueprint is stalled to the point that it seems forgotten now? I'm coming to the mailing list with these random questions about quota management, for two reasons: 1) despite developing and using openstack on a daily basis I'm still confused by quotas 2) I've found a race condition in neutron quotas and the fix is not trivial. So, rather than start coding right away, it might probably make more sense to ask the community if there is already a known better approach to quota management - and obviously enforcement. Thanks in advance, Salvatore [1] https://blueprints.launchpad.net/keystone/+spec/service-metadata [2] https://wiki.openstack.org/wiki/Blazar [3] https://review.openstack.org/#/q/project:stackforge/blazar,n,z ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Duncan Thomas ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [nova] Resource tracker
On Mon, Oct 6, 2014 at 6:03 AM, Gary Kotton gkot...@vmware.com wrote: Hi, At the moment the resource tracker in Nova ignores that statistics that are returned by the hypervisor and it calculates the values on its own. Not only is this highly error prone but it is also very costly – all of the resources on the host are read from the database. Not only the fact that we are doing something very costly is troubling, the fact that we are over calculating resources used by the hypervisor is also an issue. In my opinion this leads us to not fully utilize hosts at our disposal. I have a number of concerns with this approach and would like to know why we are not using the actual resource reported by the hypervisor. The reason for asking this is that I have added a patch which uses the actual hypervisor resources returned and it lead to a discussion on the particular review (https://review.openstack.org/126237). So it sounds like you have mentioned two concerns here: 1. The current method to calculate hypervisor usage is expensive in terms of database access. 2. Nova ignores that statistics that are returned by the hypervisor and uses its own calculations. To #1, maybe we can doing something better, optimize the query, cache the result etc. As for #2 nova intentionally doesn't use the hypervisor statistics for a few reasons: * Make scheduling more deterministic, make it easier to reproduce issues etc. * Things like memory ballooning and thin provisioning in general, mean that the hypervisor is not reporting how much of the resources can be allocated but rather how much are currently in use (This behavior can vary from hypervisor to hypervisor today AFAIK -- which makes things confusing). So if I don't want to over subscribe RAM, and the hypervisor is using memory ballooning, the hypervisor statistics are mostly useless. I am sure there are more complex schemes that we can come up with that allow us to factor in the properties of thin provisioning, but is the extra complexity worth it? That being said I am fine with discussing in a spec the idea of adding an option to use the hypervisor reported statistics, as long as it is off by default. Thanks Gary ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][tc] governance changes for big tent model
On Fri, Oct 3, 2014 at 6:07 AM, Doug Hellmann d...@doughellmann.com wrote: On Oct 3, 2014, at 12:46 AM, Joe Gordon joe.gord...@gmail.com wrote: On Thu, Oct 2, 2014 at 4:16 PM, Devananda van der Veen devananda@gmail.com wrote: On Thu, Oct 2, 2014 at 2:16 PM, Doug Hellmann d...@doughellmann.com wrote: As promised at this week’s TC meeting, I have applied the various blog posts and mailing list threads related to changing our governance model to a series of patches against the openstack/governance repository [1]. I have tried to include all of the inputs, as well as my own opinions, and look at how each proposal needs to be reflected in our current policies so we do not drop commitments we want to retain along with the processes we are shedding [2]. I am sure we need more discussion, so I have staged the changes as a series rather than one big patch. Please consider the patches together when commenting. There are many related changes, and some incremental steps won’t make sense without the changes that come after (hey, just like code!). Doug [1] https://review.openstack.org/#/q/status:open+project:openstack/governance+branch:master+topic:big-tent,n,z [2] https://etherpad.openstack.org/p/big-tent-notes I've summed up a lot of my current thinking on this etherpad as well (I should really blog, but hey ...) https://etherpad.openstack.org/p/in-pursuit-of-a-new-taxonomy After seeing Jay's idea of making a yaml file modeling things and talking to devananda about this I went ahead and tried to graph the relationships out. repo: https://github.com/jogo/graphing-openstack preliminary YAML file: https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml sample graph: http://i.imgur.com/LwlkE73.png It turns out its really hard to figure out what the relationships are without digging deep into the code for each project, so I am sure I got a few things wrong (along with missing a lot of projects). The relationships are very important for setting up an optimal gate structure. I’m less convinced they are important for setting up the governance structure, and I do not think we want a specific gate configuration embedded in the governance structure at all. That’s why I’ve tried to describe general relationships (“optional inter-project dependences” vs. “strict co-dependent project groups” [1]) up until the very last patch in the series [2], which redefines the integrated release in terms of those other relationships and a base set of projects. I agree the relationships are very important for gate structure and less so for governance. I thought it would be nice to codify the relationships in a machine readable format so we can do things with it, like try making different rules and see how they would work. For example we can already make two groups of things that may be useful for testing: * services that nothing depends on * services that don't depend on other services Latest graph: http://i.imgur.com/y8zmNIM.png Doug [1] https://review.openstack.org/#/c/125785/2/reference/project-testing-policies.rst [2] https://review.openstack.org/#/c/125789/ -Deva ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][tc] governance changes for big tent model
On Fri, Oct 3, 2014 at 9:42 AM, Eoghan Glynn egl...@redhat.com wrote: - Original Message - On Fri, Oct 3, 2014 at 6:07 AM, Doug Hellmann d...@doughellmann.com wrote: On Oct 3, 2014, at 12:46 AM, Joe Gordon joe.gord...@gmail.com wrote: On Thu, Oct 2, 2014 at 4:16 PM, Devananda van der Veen devananda@gmail.com wrote: On Thu, Oct 2, 2014 at 2:16 PM, Doug Hellmann d...@doughellmann.com wrote: As promised at this week’s TC meeting, I have applied the various blog posts and mailing list threads related to changing our governance model to a series of patches against the openstack/governance repository [1]. I have tried to include all of the inputs, as well as my own opinions, and look at how each proposal needs to be reflected in our current policies so we do not drop commitments we want to retain along with the processes we are shedding [2]. I am sure we need more discussion, so I have staged the changes as a series rather than one big patch. Please consider the patches together when commenting. There are many related changes, and some incremental steps won’t make sense without the changes that come after (hey, just like code!). Doug [1] https://review.openstack.org/#/q/status:open+project:openstack/governance+branch:master+topic:big-tent,n,z [2] https://etherpad.openstack.org/p/big-tent-notes I've summed up a lot of my current thinking on this etherpad as well (I should really blog, but hey ...) https://etherpad.openstack.org/p/in-pursuit-of-a-new-taxonomy After seeing Jay's idea of making a yaml file modeling things and talking to devananda about this I went ahead and tried to graph the relationships out. repo: https://github.com/jogo/graphing-openstack preliminary YAML file: https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml sample graph: http://i.imgur.com/LwlkE73.png It turns out its really hard to figure out what the relationships are without digging deep into the code for each project, so I am sure I got a few things wrong (along with missing a lot of projects). The relationships are very important for setting up an optimal gate structure. I’m less convinced they are important for setting up the governance structure, and I do not think we want a specific gate configuration embedded in the governance structure at all. That’s why I’ve tried to describe general relationships (“optional inter-project dependences” vs. “strict co-dependent project groups” [1]) up until the very last patch in the series [2], which redefines the integrated release in terms of those other relationships and a base set of projects. I agree the relationships are very important for gate structure and less so for governance. I thought it would be nice to codify the relationships in a machine readable format so we can do things with it, like try making different rules and see how they would work. For example we can already make two groups of things that may be useful for testing: * services that nothing depends on * services that don't depend on other services Latest graph: http://i.imgur.com/y8zmNIM.png This diagram is missing any relationships for ceilometer. It sure is, the graph is very much a work in progress. Here is the yaml that generates it https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml want to update that to includes ceilometer's relationships? Ceilometer calls APIs provided by: * keystone * nova * glance * neutron * swift Ceilometer consumes notifications from: * keystone * nova * glance * neutron * cinder * ironic * heat * sahara Ceilometer serves incoming API calls from: * heat * horizon Cheers, Eoghan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] What's a dependency (was Re: [all][tc] governance changes for big tent...) model
On Fri, Oct 3, 2014 at 9:51 AM, Chris Dent chd...@redhat.com wrote: On Fri, 3 Oct 2014, Joe Gordon wrote: * services that nothing depends on * services that don't depend on other services Latest graph: http://i.imgur.com/y8zmNIM.png I'm hesitant to open this can but it's just lying there waiting, wiggling like good bait, so: How are you defining dependency in that picture? data is coming from here: https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml and the key is here: https://github.com/jogo/graphing-openstack Note ceilometer has no relationships because I wasn't sure what exactly they were (which were required and which are optional etc.), not because there are none. It turns out its not easy to find this information in an easily digestible format. For example: Many of those services expect[1] to be able to send notifications (or be polled by) ceilometer[2]. We've got an ongoing thread about the need to contractualize notifications. Are those contracts (or the desire for them) a form of dependency? Should they be? So in the case of notifications, I think that is a Ceilometer CAN-USE Nova THROUGH notifications [1] It's not that it is a strict requirement but lots of people involved with the other projects contribute code to ceilometer or make changes in their own[3] project specifically to send info to ceilometer. [2] I'm not trying to defend ceilometer from slings here, just point out a good example, since it has _no_ arrows. [3] their own, that's hateful, let's have less of that. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][tc] governance changes for big tent model
On Thu, Oct 2, 2014 at 4:16 PM, Devananda van der Veen devananda@gmail.com wrote: On Thu, Oct 2, 2014 at 2:16 PM, Doug Hellmann d...@doughellmann.com wrote: As promised at this week’s TC meeting, I have applied the various blog posts and mailing list threads related to changing our governance model to a series of patches against the openstack/governance repository [1]. I have tried to include all of the inputs, as well as my own opinions, and look at how each proposal needs to be reflected in our current policies so we do not drop commitments we want to retain along with the processes we are shedding [2]. I am sure we need more discussion, so I have staged the changes as a series rather than one big patch. Please consider the patches together when commenting. There are many related changes, and some incremental steps won’t make sense without the changes that come after (hey, just like code!). Doug [1] https://review.openstack.org/#/q/status:open+project:openstack/governance+branch:master+topic:big-tent,n,z [2] https://etherpad.openstack.org/p/big-tent-notes I've summed up a lot of my current thinking on this etherpad as well (I should really blog, but hey ...) https://etherpad.openstack.org/p/in-pursuit-of-a-new-taxonomy After seeing Jay's idea of making a yaml file modeling things and talking to devananda about this I went ahead and tried to graph the relationships out. repo: https://github.com/jogo/graphing-openstack preliminary YAML file: https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml sample graph: http://i.imgur.com/LwlkE73.png It turns out its really hard to figure out what the relationships are without digging deep into the code for each project, so I am sure I got a few things wrong (along with missing a lot of projects). -Deva ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Create an instance with a custom uuid
On Wed, Oct 1, 2014 at 8:29 AM, Solly Ross sr...@redhat.com wrote: (response inline) - Original Message - From: Pasquale Porreca pasquale.porr...@dektech.com.au To: openstack-dev@lists.openstack.org Sent: Wednesday, October 1, 2014 11:08:50 AM Subject: Re: [openstack-dev] [nova] Create an instance with a custom uuid Thank you for the answers. I understood the concerns about having the UUID completely user defined and I also understand Nova has no interest in supporting a customized algorithm to generate UUID. Anyway I may have found a solution that will cover my use case and respect the standard for UUID (RFC 4122 http://www.ietf.org/rfc/rfc4122.txt ) . The generation of the UUID in Nova make use of the function uuid4() from the module uuid.py to have an UUID (pseudo)random, according to version 4 described in RFC 4122. Anyway this is not the only algorithm supported in the standard (and implemented yet in uuid.py ). In particular I focused my attention on UUID version 1 and the method uuid1(node=None, clock_seq=None) that allows to pass as parameter a part of the UUID ( node is the field containing the last 12 hexadecimal digits of the UUID). So my idea was to give the chance to the user to set uiid version (1 or 4, with the latter as default) when creating a new instance and in case of version 1 to pass optionally a value for parameter node . I would think that we could just have a node parameter here, and automatically use version 1 if that parameter is passed (if we decided to go the route of changing the current UUID behavior). From what I gather this requested change in API is based on for your blueprint https://blueprints.launchpad.net/nova/+spec/pxe-boot-instance. Since your blueprint is not approved yet discussing further work to improve it is a bit premature. Any thoughts? On 09/30/14 14:07, Andrew Laski wrote: On 09/30/2014 06:53 AM, Pasquale Porreca wrote: Going back to my original question, I would like to know: 1) Is it acceptable to have the UUID passed from client side? In my opinion, no. This opens a door to issues we currently don't need to deal with, and use cases I don't think Nova should support. Another possibility, which I don't like either, would be to pass in some data which could influence the generation of the UUID to satisfy requirements. But there was a suggestion to look into addressing your use case on the QEMU mailing list, which I think would be a better approach. 2) What is the correct way to do it? I started to implement this feature, simply passing it as metadata with key uuid, but I feel that this feature should have a reserved option rather then use metadata. On 09/25/14 17:26, Daniel P. Berrange wrote: On Thu, Sep 25, 2014 at 05:23:22PM +0200, Pasquale Porreca wrote: This is correct Daniel, except that that it is done by the virtual firmware/BIOS of the virtual machine and not by the OS (not yet installed at that time). This is the reason we thought about UUID: it is yet used by the iPXE client to be included in Bootstrap Protocol messages, it is taken from the uuid field in libvirt template and the uuid in libvirt is set by OpenStack; the only missing passage is the chance to set the UUID in OpenStack instead to have it randomly generated. Having another user defined tag in libvirt won't help for our issue, since it won't be included in Bootstrap Protocol messages, not without changes in the virtual BIOS/firmware (as you stated too) and honestly my team doesn't have interest in this (neither the competence). I don't think the configdrive or metadata service would help either: the OS on the instance is not yet installed at that time (the target if the network boot is exactly to install the OS on the instance!), so it won't be able to mount it. Ok, yes, if we're considering the DHCP client inside the iPXE BIOS blob, then I don't see any currently viable options besides UUID. There's no mechanism for passing any other data into iPXE that I am aware of, though if there is a desire todo that it could be raised on the QEMU mailing list for discussion. Regards, Daniel ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Pasquale Porreca DEK Technologies Via dei Castelli Romani, 22 00040 Pomezia (Roma) Mobile +39 3394823805 Skype paskporr ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Best Regards, Solly Ross ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [tc][cross-project-work] What about adding cross-project-spec repo?
On Mon, Sep 29, 2014 at 11:58 AM, Doug Hellmann d...@doughellmann.com wrote: On Sep 29, 2014, at 5:51 AM, Thierry Carrez thie...@openstack.org wrote: Boris Pavlovic wrote: it goes without saying that working on cross-project stuff in OpenStack is quite hard task. Because it's always hard to align something between a lot of people from different project. And when topic start being too HOT the discussion goes in wrong direction and attempt to do cross project change fails, as a result maybe not *ideal* but *good enough* change in OpenStack will be abandoned. The another issue that we have are specs. Projects are asking to make spec for change in their project, and in case of cross project stuff you need to make N similar specs (for every project). That is really hard to manage, and as a result you have N different specs that are describing the similar stuff. To make this process more formal, clear and simple, let's reuse process of specs but do it in one repo /openstack/cross-project-specs. It means that every cross project topic: Unification of python clients, Unification of logging, profiling, debugging api, bunch of others will be discussed in one single place.. I think it's a good idea, as long as we truly limit it to cross-project specs, that is, to concepts that may apply to every project. The examples you mention are good ones. As a counterexample, if we have to sketch a plan to solve communication between Nova and Neutron, I don't think it would belong to that repository (it should live in whatever project would have the most work to do). Process description of cross-project-specs: * PTL - person that mange core team members list and puts workflow +1 on accepted specs * Every project have 1 core position (stackforge projects are included) * Cores are chosen by project team, they task is to advocate project team opinion * No more veto, and -2 votes * If 75% cores +1 spec it's accepted. It means that all project have to accept this change. * Accepted specs gret high priority blueprints in all projects So I'm not sure you can force all projects to accept the change. Ideally, projects should see the benefits of alignment and adopt the common spec. In our recent discussions we are also going towards more freedom to projects, rather than less : imposing common specs to stackforge projects sounds like a step backwards there. Finally, I see some overlap with Oslo, which generally ends up implementing most of the common policy into libraries it encourages usage of. Therefore I'm not sure having a cross-project PTL makes sense, as he would be stuck between the Oslo PTL and the Technical Committee. There is some overlap with Oslo, and we would want to be involved in the discussions — especially if the plan includes any code to land in an Oslo library. I have so far been resisting the idea that oslo-specs is the best home for this, mostly because I didn’t want us to assume everything related to cross-project work is also related to Oslo work. That said, our approval process looks for consensus among all of the participants on the review, in addition to Oslo cores, so we can use oslo-specs and continue incorporating the +1/-1 votes from everyone. One of the key challenges we’ve had is signaling buy-in for cross-project work so having some sort of broader review process would be good, especially to help ensure that all interested parties have a chance to participate in the review. OTOH, a special repo with different voting permission settings also makes sense. I don’t have any good suggestions for who would decide when the voting on a proposal had reached consensus, or what to do if no consensus emerges. Having the TC manage that seems logical, but impractical. Maybe a person designated by the TC would oversee it? Here is a governance patch to propose a openstack-specs repo: https://review.openstack.org/125509 With such simple rules we will simplify cross project work: 1) Fair rules for all projects, as every project has 1 core that has 1 vote. A project is hardly a metric for fairness. Some projects are 50 times bigger than others. What is a project in your mind ? A code repository ? Or more like a program (a collection of code repositories being worked on by the same team ?) So in summary, yes we need a place to discuss truly cross-project specs, but I think it can't force decisions to all projects (especially stackforge ones), and it can live within a larger-scope Oslo effort and/or the Technical Committee. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [all] [tc] Multi-clouds integration by OpenStack cascading
On Tue, Sep 30, 2014 at 6:04 AM, joehuang joehu...@huawei.com wrote: Hello, Dear TC and all, Large cloud operators prefer to deploy multiple OpenStack instances(as different zones), rather than a single monolithic OpenStack instance because of these reasons: 1) Multiple data centers distributed geographically; 2) Multi-vendor business policy; 3) Server nodes scale up modularized from 00's up to million; 4) Fault and maintenance isolation between zones (only REST interface); At the same time, they also want to integrate these OpenStack instances into one cloud. Instead of proprietary orchestration layer, they want to use standard OpenStack framework for Northbound API compatibility with HEAT/Horizon or other 3rd ecosystem apps. We call this pattern as OpenStack Cascading, with proposal described by [1][2]. PoC live demo video can be found[3][4]. Nova, Cinder, Neutron, Ceilometer and Glance (optional) are involved in the OpenStack cascading. Kindly ask for cross program design summit session to discuss OpenStack cascading and the contribution to Kilo. Cross program design summit sessions should be used for things that we are unable to make progress on via this mailing list, and not as a way to begin new conversations. With that in mind, I think this thread is a good place to get initial feedback on the idea and possible make a plan for how to tackle this. Kindly invite those who are interested in the OpenStack cascading to work together and contribute it to OpenStack. (I applied for “other projects” track [5], but it would be better to have a discussion as a formal cross program session, because many core programs are involved ) [1] wiki: https://wiki.openstack.org/wiki/OpenStack_cascading_solution [2] PoC source code: https://github.com/stackforge/tricircle [3] Live demo video at YouTube: https://www.youtube.com/watch?v=OSU6PYRz5qY [4] Live demo video at Youku (low quality, for those who can't access YouTube):http://v.youku.com/v_show/id_XNzkzNDQ3MDg4.html [5] http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg36395.html Best Regards Chaoyi Huang ( Joe Huang ) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Kilo Blueprints and Specs
On Mon, Sep 29, 2014 at 5:23 AM, Gary Kotton gkot...@vmware.com wrote: Hi, Is the process documented anywhere? That is, if say for example I had a spec approved in J and its code did not land, how do we go about kicking the tires for K on that spec. Specs will need be re-submitted once we open up the specs repo for Kilo. The Kilo template will be changing a little bit, so specs will need a little bit of reworking. But I expect the process to approve previously approved specs to be quicker Thanks Gary On 9/29/14, 1:07 PM, John Garbutt j...@johngarbutt.com wrote: On 27 September 2014 00:31, Joe Gordon joe.gord...@gmail.com wrote: On Thu, Sep 25, 2014 at 9:21 AM, John Garbutt j...@johngarbutt.com wrote: On 25 September 2014 14:10, Daniel P. Berrange berra...@redhat.com wrote: The proposal is to keep kilo-1, kilo-2 much the same as juno. Except, we work harder on getting people to buy into the priorities that are set, and actively provoke more debate on their correctness, and we reduce the bar for what needs a blueprint. We can't have 50 high priority blueprints, it doesn't mean anything, right? We need to trim the list down to a manageable number, based on the agreed project priorities. Thats all I mean by slots / runway at this point. I would suggest we don't try to rank high/medium/low as that is too coarse, but rather just an ordered priority list. Then you would not be in the situation of having 50 high blueprints. We would instead naturally just start at the highest priority and work downwards. OK. I guess I was fixating about fitting things into launchpad. I guess having both might be what happens. The runways idea is just going to make me less efficient at reviewing. So I'm very much against it as an idea. This proposal is different to the runways idea, although it certainly borrows aspects of it. I just don't understand how this proposal has all the same issues? The key to the kilo-3 proposal, is about getting better at saying no, this blueprint isn't very likely to make kilo. If we focus on a smaller number of blueprints to review, we should be able to get a greater percentage of those fully completed. I am just using slots/runway-like ideas to help pick the high priority blueprints we should concentrate on, during that final milestone. Rather than keeping the distraction of 15 or so low priority blueprints, with those poor submitters jamming up the check queue, and constantly rebasing, and having to deal with the odd stray review comment they might get lucky enough to get. Maybe you think this bit is overkill, and thats fine. But I still think we need a way to stop wasting so much of peoples time on things that will not make it. The high priority blueprints are going to end up being mostly the big scope changes which take alot of time to review probably go through many iterations. The low priority blueprints are going to end up being the small things that don't consume significant resource to review and are easy to deal with in the time we're waiting for the big items to go through rebases or whatever. So what I don't like about the runways slots idea is that removes the ability to be agile and take the initiative to review approve the low priority stuff that would otherwise never make it through. The idea is more around concentrating on the *same* list of things. Certainly we need to avoid the priority inversion of concentrating only on the big things. Its also why I suggested that for kilo-1 and kilo-2, we allow any blueprint to merge, and only restrict it to a specific list in kilo-3, the idea being to maximise the number of things that get completed, rather than merging some half blueprints, but not getting to the good bits. Do we have to decide this now, or can we see how project priorities go and reevaluate half way through Kilo-2? What we need to decide is not to use the runway idea for kilo-1 and kilo-2. At this point, I guess we have (passively) decided that now. I like the idea of waiting till mid kilo-2. Thats around Spec freeze, which is handy. Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On Wed, Sep 17, 2014 at 8:03 AM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 9/16/2014 1:01 PM, Joe Gordon wrote: On Sep 15, 2014 8:31 PM, Jay Pipes jaypi...@gmail.com mailto:jaypi...@gmail.com wrote: On 09/15/2014 08:07 PM, Jeremy Stanley wrote: On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote: [...] Sometimes it's pretty hard to determine whether something in the E-R check page is due to something in the infra scripts, some transient issue in the upstream CI platform (or part of it), or actually a bug in one or more of the OpenStack projects. [...] Sounds like an NP-complete problem, but if you manage to solve it let me know and I'll turn it into the first line of triage for Infra bugs. ;) LOL, thanks for making me take the last hour reading Wikipedia pages about computational complexity theory! :P No, in all seriousness, I wasn't actually asking anyone to boil the ocean, mathematically. I think doing a couple things just making the categorization more obvious (a UI thing, really) and doing some (hopefully simple?) inspection of some control group of patches that we know do not introduce any code changes themselves and comparing to another group of patches that we know *do* introduce code changes to Nova, and then seeing if there are a set of E-R issues that consistently appear in *both* groups. That set of E-R issues has a higher likelihood of not being due to Nova, right? We use launchpad's affected projects listings on the elastic recheck page to say what may be causing the bug. Tagging projects to bugs is a manual process, but one that works pretty well. UI: The elastic recheck UI definitely could use some improvements. I am very poor at writing UIs, so patches welcome! OK, so perhaps it's not the most scientific or well-thought out plan, but hey, it's a spark for thought... ;) Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I'm not great with UIs either but would a dropdown of the affected projects be helpful and then people can filter on their favorite project and then the page is sorted by top offenders as we have today? There are times when the top bugs are infra issues (pip timeouts for exapmle) so you have to scroll a ways before finding something for your project (nova isn't the only one). I think that would be helpful. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Kilo Blueprints and Specs
On Mon, Sep 29, 2014 at 4:46 PM, Christopher Yeoh cbky...@gmail.com wrote: On Mon, 29 Sep 2014 13:32:57 -0700 Joe Gordon joe.gord...@gmail.com wrote: On Mon, Sep 29, 2014 at 5:23 AM, Gary Kotton gkot...@vmware.com wrote: Hi, Is the process documented anywhere? That is, if say for example I had a spec approved in J and its code did not land, how do we go about kicking the tires for K on that spec. Specs will need be re-submitted once we open up the specs repo for Kilo. The Kilo template will be changing a little bit, so specs will need a little bit of reworking. But I expect the process to approve previously approved specs to be quicker Am biased given I have a spec approved for Juno which we didn't quite fully merge which we want to finish off early in Kilo (most of the patches are very close already to being ready to merge), but I think we should give priority to reviewing specs already approved in Juno and perhaps only require one +2 for re-approval. I like the idea of prioritizing specs that were previously approved and only requiring a single +2 for re-approval if there are no major changes to them. Otherwise we'll end up wasting weeks of development time just when there is lots of review bandwidth available and the CI system is lightly loaded. Honestly, ideally I'd like to just start merging as soon as Kilo opens. Nothing has changed between Juno FF and Kilo opening so there's really no reason that an approved Juno spec should not be reapproved. Chris Thanks Gary On 9/29/14, 1:07 PM, John Garbutt j...@johngarbutt.com wrote: On 27 September 2014 00:31, Joe Gordon joe.gord...@gmail.com wrote: On Thu, Sep 25, 2014 at 9:21 AM, John Garbutt j...@johngarbutt.com wrote: On 25 September 2014 14:10, Daniel P. Berrange berra...@redhat.com wrote: The proposal is to keep kilo-1, kilo-2 much the same as juno. Except, we work harder on getting people to buy into the priorities that are set, and actively provoke more debate on their correctness, and we reduce the bar for what needs a blueprint. We can't have 50 high priority blueprints, it doesn't mean anything, right? We need to trim the list down to a manageable number, based on the agreed project priorities. Thats all I mean by slots / runway at this point. I would suggest we don't try to rank high/medium/low as that is too coarse, but rather just an ordered priority list. Then you would not be in the situation of having 50 high blueprints. We would instead naturally just start at the highest priority and work downwards. OK. I guess I was fixating about fitting things into launchpad. I guess having both might be what happens. The runways idea is just going to make me less efficient at reviewing. So I'm very much against it as an idea. This proposal is different to the runways idea, although it certainly borrows aspects of it. I just don't understand how this proposal has all the same issues? The key to the kilo-3 proposal, is about getting better at saying no, this blueprint isn't very likely to make kilo. If we focus on a smaller number of blueprints to review, we should be able to get a greater percentage of those fully completed. I am just using slots/runway-like ideas to help pick the high priority blueprints we should concentrate on, during that final milestone. Rather than keeping the distraction of 15 or so low priority blueprints, with those poor submitters jamming up the check queue, and constantly rebasing, and having to deal with the odd stray review comment they might get lucky enough to get. Maybe you think this bit is overkill, and thats fine. But I still think we need a way to stop wasting so much of peoples time on things that will not make it. The high priority blueprints are going to end up being mostly the big scope changes which take alot of time to review probably go through many iterations. The low priority blueprints are going to end up being the small things that don't consume significant resource to review and are easy to deal with in the time we're waiting for the big items to go through rebases or whatever. So what I don't like about the runways slots idea is that removes the ability to be agile and take the initiative to review approve the low priority stuff that would otherwise never make it through. The idea is more around concentrating on the *same* list of things. Certainly we need to avoid the priority inversion of concentrating only on the big things. Its also why I suggested that for kilo-1 and kilo-2, we allow any blueprint to merge, and only
Re: [openstack-dev] [nova] Kilo Blueprints and Specs
On Thu, Sep 25, 2014 at 9:21 AM, John Garbutt j...@johngarbutt.com wrote: On 25 September 2014 14:10, Daniel P. Berrange berra...@redhat.com wrote: The proposal is to keep kilo-1, kilo-2 much the same as juno. Except, we work harder on getting people to buy into the priorities that are set, and actively provoke more debate on their correctness, and we reduce the bar for what needs a blueprint. We can't have 50 high priority blueprints, it doesn't mean anything, right? We need to trim the list down to a manageable number, based on the agreed project priorities. Thats all I mean by slots / runway at this point. I would suggest we don't try to rank high/medium/low as that is too coarse, but rather just an ordered priority list. Then you would not be in the situation of having 50 high blueprints. We would instead naturally just start at the highest priority and work downwards. OK. I guess I was fixating about fitting things into launchpad. I guess having both might be what happens. The runways idea is just going to make me less efficient at reviewing. So I'm very much against it as an idea. This proposal is different to the runways idea, although it certainly borrows aspects of it. I just don't understand how this proposal has all the same issues? The key to the kilo-3 proposal, is about getting better at saying no, this blueprint isn't very likely to make kilo. If we focus on a smaller number of blueprints to review, we should be able to get a greater percentage of those fully completed. I am just using slots/runway-like ideas to help pick the high priority blueprints we should concentrate on, during that final milestone. Rather than keeping the distraction of 15 or so low priority blueprints, with those poor submitters jamming up the check queue, and constantly rebasing, and having to deal with the odd stray review comment they might get lucky enough to get. Maybe you think this bit is overkill, and thats fine. But I still think we need a way to stop wasting so much of peoples time on things that will not make it. The high priority blueprints are going to end up being mostly the big scope changes which take alot of time to review probably go through many iterations. The low priority blueprints are going to end up being the small things that don't consume significant resource to review and are easy to deal with in the time we're waiting for the big items to go through rebases or whatever. So what I don't like about the runways slots idea is that removes the ability to be agile and take the initiative to review approve the low priority stuff that would otherwise never make it through. The idea is more around concentrating on the *same* list of things. Certainly we need to avoid the priority inversion of concentrating only on the big things. Its also why I suggested that for kilo-1 and kilo-2, we allow any blueprint to merge, and only restrict it to a specific list in kilo-3, the idea being to maximise the number of things that get completed, rather than merging some half blueprints, but not getting to the good bits. Do we have to decide this now, or can we see how project priorities go and reevaluate half way through Kilo-2? Anyways, it seems like this doesn't hit a middle ground that would gain pre-summit approval. Or at least needs some online chat time to work out something. Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Choice of Series goal of a blueprint
On Thu, Sep 25, 2014 at 7:22 AM, Angelo Matarazzo angelo.matara...@dektech.com.au wrote: Hi all, Can create a blueprint and choose a previous Series goal (eg:Icehouse)? I think that it can be possible but no reviewer or driver will be interested in it. Right? I am not sure what the 'why' is here, but Icehouse is under stable maintenance mode so it is not accepting new features. Best regards, Angelo ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Thoughts on OpenStack Layers and a Big Tent model
On Tue, Sep 23, 2014 at 9:50 AM, Vishvananda Ishaya vishvana...@gmail.com wrote: On Sep 23, 2014, at 8:40 AM, Doug Hellmann d...@doughellmann.com wrote: If we are no longer incubating *programs*, which are the teams of people who we would like to ensure are involved in OpenStack governance, then how do we make that decision? From a practical standpoint, how do we make a list of eligible voters for a TC election? Today we pull a list of committers from the git history from the projects associated with “official programs, but if we are dropping “official programs” we need some other way to build the list. Joe Gordon mentioned an interesting idea to address this (which I am probably totally butchering), which is that we make incubation more similar to the ASF Incubator. In other words make it more lightweight with no promise of governance or infrastructure support. you only slightly butchered it :). From what I gather the Apache Software Foundation primary goals are to: * provide a foundation for open, collaborative software development projects by supplying hardware, communication, and business infrastructure * create an independent legal entity to which companies and individuals can donate resources and be assured that those resources will be used for the public benefit * provide a means for individual volunteers to be sheltered from legal suits directed at the Foundation's projects * protect the 'Apache' brand, as applied to its software products, from being abused by other organizations [0] This roughly translates into: JIRA, SVN, Bugzilla and Confluence etc. for infrastructure resources. So ASF provides infrastructure, legal support, a trademark and some basic oversight. The [Apache] incubator is responsible for: * filtering the proposals about the creation of a new project or sub-project * help the creation of the project and the infrastructure that it needs to operate * supervise and mentor the incubated community in order for them to reach an open meritocratic environment * evaluate the maturity of the incubated project, either promoting it to official project/ sub-project status or by retiring it, in case of failure. It must be noted that the incubator (just like the board) does not perform filtering on the basis of technical issues. This is because the foundation respects and suggests variety of technical approaches. It doesn't fear innovation or even internal confrontation between projects which overlap in functionality. [1] So my idea, which is very similar to Monty's, is to make move all the non-layer 1 projects into something closer to an ASF model where there is still incubation and graduation. But the only things a project receives out of this process is: * Legal support * A trademark * Mentorship * Infrastructure to use * Basic oversight via the incubation/graduation process with respect to the health of the community. They do not get: * Required co-gating or integration with any other projects * People to right there docs for them, etc. * Technical review/oversight * Technical requirements * Evaluation on how the project fits into a bigger picture * Language requirements * etc. Note: this is just an idea, not a fully formed proposal [0] http://www.apache.org/foundation/how-it-works.html#what [1] http://www.apache.org/foundation/how-it-works.html#incubator It is also interesting to consider that we may not need much governance for things outside of layer1. Of course, this may be dancing around the actual problem to some extent, because there are a bunch of projects that are not layer1 that are already a part of the community, and we need a solution that includes them somehow. Vish ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues
On Tue, Sep 23, 2014 at 9:13 AM, Zane Bitter zbit...@redhat.com wrote: On 22/09/14 22:04, Joe Gordon wrote: To me this is less about valid or invalid choices. The Zaqar team is comparing Zaqar to SQS, but after digging into the two of them, zaqar barely looks like SQS. Zaqar doesn't guarantee what IMHO is the most important parts of SQS: the message will be delivered and will never be lost by SQS. I agree that this is the most important feature. Happily, Flavio has clarified this in his other thread[1]: *Zaqar's vision is to provide a cross-cloud interoperable, fully-reliable messaging service at scale that is both, easy and not invasive, for deployers and users.* ... Zaqar aims to be a fully-reliable service, therefore messages should never be lost under any circumstances except for when the message's expiration time (ttl) is reached So Zaqar _will_ guarantee reliable delivery. Zaqar doesn't have the same scaling properties as SQS. This is true. (That's not to say it won't scale, but it doesn't scale in exactly the same way that SQS does because it has a different architecture.) It appears that the main reason for this is the ordering guarantee, which was introduced in response to feedback from users. So this is clearly a different design choice: SQS chose reliability plus effectively infinite scalability, while Zaqar chose reliability plus FIFO. It's not feasible to satisfy all three simultaneously, so the options are: 1) Implement two separate modes and allow the user to decide 2) Continue to choose FIFO over infinite scalability 3) Drop FIFO and choose infinite scalability instead This is one of the key points on which we need to get buy-in from the community on selecting one of these as the long-term strategy. Zaqar is aiming for low latency per message, SQS doesn't appear to be. I've seen no evidence that Zaqar is actually aiming for that. There are waaay lower-latency ways to implement messaging if you don't care about durability (you wouldn't do store-and-forward, for a start). If you see a lot of talk about low latency, it's probably because for a long time people insisted on comparing Zaqar to RabbitMQ instead of SQS. I thought this was why Zaqar uses Falcon and not Pecan/WSME? For an application like Marconi where throughput and latency is of paramount importance, I recommend Falcon over Pecan. https://wiki.openstack.org/wiki/Zaqar/pecan-evaluation#Recommendation Yes that statement mentions throughput as well, but it does mention latency as well. (Let's also be careful not to talk about high latency as if it were a virtue in itself; it's simply something we would happily trade off for other properties. Zaqar _is_ making that trade-off.) So if Zaqar isn't SQS what is Zaqar and why should I use it? If you are a small-to-medium user of an SQS-like service, Zaqar is like SQS but better because not only does it never lose your messages but they always arrive in order, and you have the option to fan them out to multiple subscribers. If you are a very large user along one particular dimension (I believe it's number of messages delivered from a single queue, but probably Gordon will correct me :D) then Zaqar may not _yet_ have a good story for you. cheers, Zane. [1] http://lists.openstack.org/pipermail/openstack-dev/2014- September/046809.html ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues
On Tue, Sep 23, 2014 at 2:40 AM, Flavio Percoco fla...@redhat.com wrote: On 09/23/2014 05:13 AM, Clint Byrum wrote: Excerpts from Joe Gordon's message of 2014-09-22 19:04:03 -0700: [snip] To me this is less about valid or invalid choices. The Zaqar team is comparing Zaqar to SQS, but after digging into the two of them, zaqar barely looks like SQS. Zaqar doesn't guarantee what IMHO is the most important parts of SQS: the message will be delivered and will never be lost by SQS. Zaqar doesn't have the same scaling properties as SQS. Zaqar is aiming for low latency per message, SQS doesn't appear to be. So if Zaqar isn't SQS what is Zaqar and why should I use it? I have to agree. I'd like to see a simple, non-ordered, high latency, high scale messaging service that can be used cheaply by cloud operators and users. What I see instead is a very powerful, ordered, low latency, medium scale messaging service that will likely cost a lot to scale out to the thousands of users level. I don't fully agree :D Let me break the above down into several points: * Zaqar team is comparing Zaqar to SQS: True, we're comparing to the *type* of service SQS is but not *all* the guarantees it gives. We're not working on an exact copy of the service but on a service capable of addressing the same use cases. * Zaqar is not guaranteeing reliability: This is not true. Yes, the current default write concern for the mongodb driver is `acknowledge` but that's a bug, not a feature [0] ;) * Zaqar doesn't have the same scaling properties as SQS: What are SQS scaling properties? We know they have a big user base, we know they have lots of connections, queues and what not but we don't have numbers to compare ourselves with. Here is *a* number 30k messages per second on a single queue: http://java.dzone.com/articles/benchmarking-sqs * Zaqar is aiming for low latency per message: This is not true and I'd be curious to know where did this come from. A couple of things to consider: - First and foremost, low latency is a very relative measure and it depends on each use-case. - The benchmarks Kurt did were purely informative. I believe it's good to do them every once in a while but this doesn't mean the team is mainly focused on that. - Not being focused on 'low-latency' does not mean the team will overlook performance. * Zaqar has FIFO and SQS doesn't: FIFO won't hurt *your use-case* if ordering is not a requirement but the lack of it does when ordering is a must. * Scaling out Zaqar will cost a lot: In terms of what? I'm pretty sure it's not for free but I'd like to understand better this point and figure out a way to improve it, if possible. * If Zaqar isn't SQS then what is it? Why should I use it?: I don't believe Zaqar is SQS as I don't believe nova is EC2. Do they share similar features and provide similar services? Yes, does that mean you can address similar use cases, hence a similar users? Yes. In addition to the above, I believe Zaqar is a simple service, easy to install and to interact with. From a user perspective the semantics are few and the concepts are neither new nor difficult to grasp. From an operators perspective, I don't believe it adds tons of complexity. It does require the operator to deploy a replicated storage environment but I believe all services require that. Cheers, Flavio P.S: Sorry for my late answer or lack of it. I lost *all* my emails yesterday and I'm working on recovering them. [0] https://bugs.launchpad.net/zaqar/+bug/1372335 -- @flaper87 Flavio Percoco ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues
On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter zbit...@redhat.com wrote: On 22/09/14 10:11, Gordon Sim wrote: On 09/19/2014 09:13 PM, Zane Bitter wrote: SQS offers very, very limited guarantees, and it's clear that the reason for that is to make it massively, massively scalable in the way that e.g. S3 is scalable while also remaining comparably durable (S3 is supposedly designed for 11 nines, BTW). Zaqar, meanwhile, seems to be promising the world in terms of guarantees. (And then taking it away in the fine print, where it says that the operator can disregard many of them, potentially without the user's knowledge.) On the other hand, IIUC Zaqar does in fact have a sharding feature (Pools) which is its answer to the massive scaling question. There are different dimensions to the scaling problem. Many thanks for this analysis, Gordon. This is really helpful stuff. As I understand it, pools don't help scaling a given queue since all the messages for that queue must be in the same pool. At present traffic through different Zaqar queues are essentially entirely orthogonal streams. Pooling can help scale the number of such orthogonal streams, but to be honest, that's the easier part of the problem. But I think it's also the important part of the problem. When I talk about scaling, I mean 1 million clients sending 10 messages per second each, not 10 clients sending 1 million messages per second each. When a user gets to the point that individual queues have massive throughput, it's unlikely that a one-size-fits-all cloud offering like Zaqar or SQS is _ever_ going to meet their needs. Those users will want to spin up and configure their own messaging systems on Nova servers, and at that kind of size they'll be able to afford to. (In fact, they may not be able to afford _not_ to, assuming per-message-based pricing.) Running a message queue that has a high guarantee of not loosing a message is hard and SQS promises exactly that, it *will* deliver your message. If a use case can handle occasionally dropping messages then running your own MQ makes more sense. SQS is designed to handle massive queues as well, while I haven't found any examples of queues that have 1 million messages/second being sent or received 30k to 100k messages/second is not unheard of [0][1][2]. [0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s [1] http://java.dzone.com/articles/benchmarking-sqs [2] http://www.slideshare.net/AmazonWebServices/massive-message-processing-with-amazon-sqs-and-amazon-dynamodb-arc301-aws-reinvent-2013-28431182 There is also the possibility of using the sharding capabilities of the underlying storage. But the pattern of use will determine how effective that can be. So for example, on the ordering question, if order is defined by a single sequence number held in the database and atomically incremented for every message published, that is not likely to be something where the databases sharding is going to help in scaling the number of concurrent publications. Though sharding would allow scaling the total number messages on the queue (by distributing them over multiple shards), the total ordering of those messages reduces it's effectiveness in scaling the number of concurrent getters (e.g. the concurrent subscribers in pub-sub) since they will all be getting the messages in exactly the same order. Strict ordering impacts the competing consumers case also (and is in my opinion of limited value as a guarantee anyway). At any given time, the head of the queue is in one shard, and all concurrent claim requests will contend for messages in that same shard. Though the unsuccessful claimants may then move to another shard as the head moves, they will all again try to access the messages in the same order. So if Zaqar's goal is to scale the number of orthogonal queues, and the number of messages held at any time within these, the pooling facility and any sharding capability in the underlying store for a pool would likely be effective even with the strict ordering guarantee. IMHO this is (or should be) the goal - support enormous numbers of small-to-moderate sized queues. If 50,000 messages per second doesn't count as small-to-moderate then Zaqar does not fulfill a major SQS use case. If scaling the number of communicants on a given communication channel is a goal however, then strict ordering may hamper that. If it does, it seems to me that this is not just a policy tweak on the underlying datastore to choose the desired balance between ordering and scale, but a more fundamental question on the internal structure of the queue implementation built on top of the datastore. I agree with your analysis, but I don't think this should be a goal. Note that the user can still implement this themselves using application-level sharding - if you know that in-order delivery is not important to you, then randomly assign clients to a queue
Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues
On Mon, Sep 22, 2014 at 5:47 PM, Zane Bitter zbit...@redhat.com wrote: On 22/09/14 17:06, Joe Gordon wrote: On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter zbit...@redhat.com wrote: On 22/09/14 10:11, Gordon Sim wrote: On 09/19/2014 09:13 PM, Zane Bitter wrote: SQS offers very, very limited guarantees, and it's clear that the reason for that is to make it massively, massively scalable in the way that e.g. S3 is scalable while also remaining comparably durable (S3 is supposedly designed for 11 nines, BTW). Zaqar, meanwhile, seems to be promising the world in terms of guarantees. (And then taking it away in the fine print, where it says that the operator can disregard many of them, potentially without the user's knowledge.) On the other hand, IIUC Zaqar does in fact have a sharding feature (Pools) which is its answer to the massive scaling question. There are different dimensions to the scaling problem. Many thanks for this analysis, Gordon. This is really helpful stuff. As I understand it, pools don't help scaling a given queue since all the messages for that queue must be in the same pool. At present traffic through different Zaqar queues are essentially entirely orthogonal streams. Pooling can help scale the number of such orthogonal streams, but to be honest, that's the easier part of the problem. But I think it's also the important part of the problem. When I talk about scaling, I mean 1 million clients sending 10 messages per second each, not 10 clients sending 1 million messages per second each. When a user gets to the point that individual queues have massive throughput, it's unlikely that a one-size-fits-all cloud offering like Zaqar or SQS is _ever_ going to meet their needs. Those users will want to spin up and configure their own messaging systems on Nova servers, and at that kind of size they'll be able to afford to. (In fact, they may not be able to afford _not_ to, assuming per-message-based pricing.) Running a message queue that has a high guarantee of not loosing a message is hard and SQS promises exactly that, it *will* deliver your message. If a use case can handle occasionally dropping messages then running your own MQ makes more sense. SQS is designed to handle massive queues as well, while I haven't found any examples of queues that have 1 million messages/second being sent or received 30k to 100k messages/second is not unheard of [0][1][2]. [0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s [1] http://java.dzone.com/articles/benchmarking-sqs [2] http://www.slideshare.net/AmazonWebServices/massive- message-processing-with-amazon-sqs-and-amazon- dynamodb-arc301-aws-reinvent-2013-28431182 Thanks for digging those up, that's really helpful input. I think number [1] kind of summed up part of what I'm arguing here though: But once your requirements get above 35k messages per second, chances are you need custom solutions anyway; not to mention that while SQS is cheap, it may become expensive with such loads. If you don't require the reliability guarantees that SQS provides then perhaps. But I would be surprised to hear that a user can set up something with this level of uptime for less: Amazon SQS runs within Amazon’s high-availability data centers, so queues will be available whenever applications need them. To prevent messages from being lost or becoming unavailable, all messages are stored redundantly across multiple servers and data centers. [1] There is also the possibility of using the sharding capabilities of the underlying storage. But the pattern of use will determine how effective that can be. So for example, on the ordering question, if order is defined by a single sequence number held in the database and atomically incremented for every message published, that is not likely to be something where the databases sharding is going to help in scaling the number of concurrent publications. Though sharding would allow scaling the total number messages on the queue (by distributing them over multiple shards), the total ordering of those messages reduces it's effectiveness in scaling the number of concurrent getters (e.g. the concurrent subscribers in pub-sub) since they will all be getting the messages in exactly the same order. Strict ordering impacts the competing consumers case also (and is in my opinion of limited value as a guarantee anyway). At any given time, the head of the queue is in one shard, and all concurrent claim requests will contend for messages in that same shard. Though the unsuccessful claimants may then move to another shard as the head moves, they will all again try to access the messages in the same order. So if Zaqar's goal is to scale the number of orthogonal queues, and the number of messages held at any time within these, the pooling facility and any sharding capability in the underlying store for a pool would likely be effective even
Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues
On Mon, Sep 22, 2014 at 7:04 PM, Joe Gordon joe.gord...@gmail.com wrote: On Mon, Sep 22, 2014 at 5:47 PM, Zane Bitter zbit...@redhat.com wrote: On 22/09/14 17:06, Joe Gordon wrote: On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter zbit...@redhat.com wrote: On 22/09/14 10:11, Gordon Sim wrote: On 09/19/2014 09:13 PM, Zane Bitter wrote: SQS offers very, very limited guarantees, and it's clear that the reason for that is to make it massively, massively scalable in the way that e.g. S3 is scalable while also remaining comparably durable (S3 is supposedly designed for 11 nines, BTW). Zaqar, meanwhile, seems to be promising the world in terms of guarantees. (And then taking it away in the fine print, where it says that the operator can disregard many of them, potentially without the user's knowledge.) On the other hand, IIUC Zaqar does in fact have a sharding feature (Pools) which is its answer to the massive scaling question. There are different dimensions to the scaling problem. Many thanks for this analysis, Gordon. This is really helpful stuff. As I understand it, pools don't help scaling a given queue since all the messages for that queue must be in the same pool. At present traffic through different Zaqar queues are essentially entirely orthogonal streams. Pooling can help scale the number of such orthogonal streams, but to be honest, that's the easier part of the problem. But I think it's also the important part of the problem. When I talk about scaling, I mean 1 million clients sending 10 messages per second each, not 10 clients sending 1 million messages per second each. When a user gets to the point that individual queues have massive throughput, it's unlikely that a one-size-fits-all cloud offering like Zaqar or SQS is _ever_ going to meet their needs. Those users will want to spin up and configure their own messaging systems on Nova servers, and at that kind of size they'll be able to afford to. (In fact, they may not be able to afford _not_ to, assuming per-message-based pricing.) Running a message queue that has a high guarantee of not loosing a message is hard and SQS promises exactly that, it *will* deliver your message. If a use case can handle occasionally dropping messages then running your own MQ makes more sense. SQS is designed to handle massive queues as well, while I haven't found any examples of queues that have 1 million messages/second being sent or received 30k to 100k messages/second is not unheard of [0][1][2]. [0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s [1] http://java.dzone.com/articles/benchmarking-sqs [2] http://www.slideshare.net/AmazonWebServices/massive- message-processing-with-amazon-sqs-and-amazon- dynamodb-arc301-aws-reinvent-2013-28431182 Thanks for digging those up, that's really helpful input. I think number [1] kind of summed up part of what I'm arguing here though: But once your requirements get above 35k messages per second, chances are you need custom solutions anyway; not to mention that while SQS is cheap, it may become expensive with such loads. If you don't require the reliability guarantees that SQS provides then perhaps. But I would be surprised to hear that a user can set up something with this level of uptime for less: Amazon SQS runs within Amazon’s high-availability data centers, so queues will be available whenever applications need them. To prevent messages from being lost or becoming unavailable, all messages are stored redundantly across multiple servers and data centers. [1] There is also the possibility of using the sharding capabilities of the underlying storage. But the pattern of use will determine how effective that can be. So for example, on the ordering question, if order is defined by a single sequence number held in the database and atomically incremented for every message published, that is not likely to be something where the databases sharding is going to help in scaling the number of concurrent publications. Though sharding would allow scaling the total number messages on the queue (by distributing them over multiple shards), the total ordering of those messages reduces it's effectiveness in scaling the number of concurrent getters (e.g. the concurrent subscribers in pub-sub) since they will all be getting the messages in exactly the same order. Strict ordering impacts the competing consumers case also (and is in my opinion of limited value as a guarantee anyway). At any given time, the head of the queue is in one shard, and all concurrent claim requests will contend for messages in that same shard. Though the unsuccessful claimants may then move to another shard as the head moves, they will all again try to access the messages in the same order. So if Zaqar's goal is to scale the number of orthogonal queues, and the number of messages held at any time within these, the pooling facility and any
Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues
On Mon, Sep 22, 2014 at 8:13 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Joe Gordon's message of 2014-09-22 19:04:03 -0700: On Mon, Sep 22, 2014 at 5:47 PM, Zane Bitter zbit...@redhat.com wrote: On 22/09/14 17:06, Joe Gordon wrote: On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter zbit...@redhat.com wrote: On 22/09/14 10:11, Gordon Sim wrote: On 09/19/2014 09:13 PM, Zane Bitter wrote: SQS offers very, very limited guarantees, and it's clear that the reason for that is to make it massively, massively scalable in the way that e.g. S3 is scalable while also remaining comparably durable (S3 is supposedly designed for 11 nines, BTW). Zaqar, meanwhile, seems to be promising the world in terms of guarantees. (And then taking it away in the fine print, where it says that the operator can disregard many of them, potentially without the user's knowledge.) On the other hand, IIUC Zaqar does in fact have a sharding feature (Pools) which is its answer to the massive scaling question. There are different dimensions to the scaling problem. Many thanks for this analysis, Gordon. This is really helpful stuff. As I understand it, pools don't help scaling a given queue since all the messages for that queue must be in the same pool. At present traffic through different Zaqar queues are essentially entirely orthogonal streams. Pooling can help scale the number of such orthogonal streams, but to be honest, that's the easier part of the problem. But I think it's also the important part of the problem. When I talk about scaling, I mean 1 million clients sending 10 messages per second each, not 10 clients sending 1 million messages per second each. When a user gets to the point that individual queues have massive throughput, it's unlikely that a one-size-fits-all cloud offering like Zaqar or SQS is _ever_ going to meet their needs. Those users will want to spin up and configure their own messaging systems on Nova servers, and at that kind of size they'll be able to afford to. (In fact, they may not be able to afford _not_ to, assuming per-message-based pricing.) Running a message queue that has a high guarantee of not loosing a message is hard and SQS promises exactly that, it *will* deliver your message. If a use case can handle occasionally dropping messages then running your own MQ makes more sense. SQS is designed to handle massive queues as well, while I haven't found any examples of queues that have 1 million messages/second being sent or received 30k to 100k messages/second is not unheard of [0][1][2]. [0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s [1] http://java.dzone.com/articles/benchmarking-sqs [2] http://www.slideshare.net/AmazonWebServices/massive- message-processing-with-amazon-sqs-and-amazon- dynamodb-arc301-aws-reinvent-2013-28431182 Thanks for digging those up, that's really helpful input. I think number [1] kind of summed up part of what I'm arguing here though: But once your requirements get above 35k messages per second, chances are you need custom solutions anyway; not to mention that while SQS is cheap, it may become expensive with such loads. If you don't require the reliability guarantees that SQS provides then perhaps. But I would be surprised to hear that a user can set up something with this level of uptime for less: Amazon SQS runs within Amazon’s high-availability data centers, so queues will be available whenever applications need them. To prevent messages from being lost or becoming unavailable, all messages are stored redundantly across multiple servers and data centers. [1] This is pretty easily doable with gearman or even just using Redis directly. But it is still ops for end users. The AWS users I've talked to who use SQS do so because they like that they can use RDS, SQS, and ELB, and have only one type of thing to operate: their app. There is also the possibility of using the sharding capabilities of the underlying storage. But the pattern of use will determine how effective that can be. So for example, on the ordering question, if order is defined by a single sequence number held in the database and atomically incremented for every message published, that is not likely to be something where the databases sharding is going to help in scaling the number of concurrent publications. Though sharding would allow scaling the total number messages on the queue (by distributing them over multiple shards), the total ordering of those messages reduces it's effectiveness in scaling the number of concurrent getters (e.g. the concurrent subscribers in pub-sub) since they will all be getting the messages in exactly the same order
Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues
On Thu, Sep 18, 2014 at 9:02 AM, Devananda van der Veen devananda@gmail.com wrote: On Thu, Sep 18, 2014 at 7:55 AM, Flavio Percoco fla...@redhat.com wrote: On 09/18/2014 04:24 PM, Clint Byrum wrote: Great job highlighting what our friends over at Amazon are doing. It's clear from these snippets, and a few other pieces of documentation for SQS I've read, that the Amazon team approached SQS from a _massive_ scaling perspective. I think what may be forcing a lot of this frustration with Zaqar is that it was designed with a much smaller scale in mind. I think as long as that is the case, the design will remain in question. I'd be comfortable saying that the use cases I've been thinking about are entirely fine with the limitations SQS has. I think these are pretty strong comments with not enough arguments to defend them. Please see my prior email. I agree with Clint's assertions here. Saying that Zaqar was designed with a smaller scale in mid without actually saying why you think so is not fair besides not being true. So please, do share why you think Zaqar was not designed for big scales and provide comments that will help the project to grow and improve. - Is it because the storage technologies that have been chosen? - Is it because of the API? - Is it because of the programing language/framework ? It is not because of the storage technology or because of the programming language. So far, we've just discussed the API semantics and not zaqar's scalability, which makes your comments even more surprising. - guaranteed message order - not distributing work across a configurable number of back ends These are scale-limiting design choices which are reflected in the API's characteristics. I agree with Clint and Devananda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues
On Thu, Sep 18, 2014 at 7:45 AM, Flavio Percoco fla...@redhat.com wrote: On 09/18/2014 04:09 PM, Gordon Sim wrote: On 09/18/2014 12:31 PM, Flavio Percoco wrote: On 09/17/2014 10:36 PM, Joe Gordon wrote: My understanding of Zaqar is that it's like SQS. SQS uses distributed queues, which have a few unusual properties [0]: Message Order Amazon SQS makes a best effort to preserve order in messages, but due to the distributed nature of the queue, we cannot guarantee you will receive messages in the exact order you sent them. If your system requires that order be preserved, we recommend you place sequencing information in each message so you can reorder the messages upon receipt. Zaqar guarantees FIFO. To be more precise, it does that relying on the storage backend ability to do so as well. Depending on the storage used, guaranteeing FIFO may have some performance penalties. Would it be accurate to say that at present Zaqar does not use distributed queues, but holds all queue data in a storage mechanism of some form which may internally distribute that data among servers but provides Zaqar with a consistent data model of some form? I think this is accurate. The queue's distribution depends on the storage ability to do so and deployers will be able to choose what storage works best for them based on this as well. I'm not sure how useful this separation is from a user perspective but I do see the relevance when it comes to implementation details and deployments. [...] As of now, Zaqar fully relies on the storage replication/clustering capabilities to provide data consistency, availability and fault tolerance. Is the replication synchronous or asynchronous with respect to client calls? E.g. will the response to a post of messages be returned only once the replication of those messages is confirmed? Likewise when deleting a message, is the response only returned when replicas of the message are deleted? It depends on the driver implementation and/or storage configuration. For example, in the mongodb driver, we use the default write concern called acknowledged. This means that as soon as the message gets to the master node (note it's not written on disk yet nor replicated) zaqar will receive a confirmation and then send the response back to the client. This is also configurable by the deployer by changing the default write concern in the mongodb uri using `?w=SOME_WRITE_CONCERN`[0]. This means that by default Zaqar cannot guarantee a message will be delivered at all. A message can be acknowledged and then the 'master node' crashes and the message is lost. Zaqar's ability to guarantee delivery is limited by the reliability of a single node, while something like swift can only loose a piece of data if 3 machines crash at the same time. [0] http://docs.mongodb.org/manual/reference/connection-string/#uri.w However, as far as consuming messages is concerned, it can guarantee once-and-only-once and/or at-least-once delivery depending on the message pattern used to consume messages. Using pop or claims guarantees the former whereas streaming messages out of Zaqar guarantees the later. From what I can see, pop provides unreliable delivery (i.e. its similar to no-ack). If the delete call using pop fails while sending back the response, the messages are removed but didn't get to the client. Correct, pop works like no-ack. If you want to have pop+ack, it is possible to claim just 1 message and then delete it. What do you mean by 'streaming messages'? I'm sorry, that went out wrong. I had the browsability term in my head and went with something even worse. By streaming messages I meant polling messages without claiming them. In other words, at-least-once is guaranteed by default, whereas once-and-only-once is guaranteed just if claims are used. [...] Based on our short conversation on IRC last night, I understand you're concerned that FIFO may result in performance issues. That's a valid concern and I think the right answer is that it depends on the storage. If the storage has a built-in FIFO guarantee then there's nothing Zaqar needs to do there. In the other hand, if the storage does not have a built-in support for FIFO, Zaqar will cover it in the driver implementation. In the mongodb driver, each message has a marker that is used to guarantee FIFO. That marker is a sequence number of some kind that is used to provide ordering to queries? Is it generated by the database itself? It's a sequence number to provide ordering to queries, correct. Depending on the driver, it may be generated by Zaqar or the database. In mongodb's case it's generated by Zaqar[0]. [0] https://github.com/openstack/zaqar/blob/master/zaqar/queues/storage/mongodb/queues.py#L103-L185 -- @flaper87 Flavio Percoco ___ OpenStack-dev mailing list
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
On Tue, Sep 16, 2014 at 8:02 AM, Kurt Griffiths kurt.griffi...@rackspace.com wrote: Right, graphing those sorts of variables has always been part of our test plan. What I’ve done so far was just some pilot tests, and I realize now that I wasn’t very clear on that point. I wanted to get a rough idea of where the Redis driver sat in case there were any obvious bug fixes that needed to be taken care of before performing more extensive testing. As it turns out, I did find one bug that has since been fixed. Regarding latency, saying that it is not important” is an exaggeration; it is definitely important, just not the* only *thing that is important. I have spoken with a lot of prospective Zaqar users since the inception of the project, and one of the common threads was that latency needed to be reasonable. For the use cases where they see Zaqar delivering a lot of value, requests don't need to be as fast as, say, ZMQ, but they do need something that isn’t horribly *slow,* either. They also want HTTP, multi-tenant, auth, durability, etc. The goal is to find a reasonable amount of latency given our constraints and also, obviously, be able to deliver all that at scale. Can you further quantify what you would consider too slow, is it 100ms too slow. In any case, I’ve continue working through the test plan and will be publishing further test results shortly. graph latency versus number of concurrent active tenants By tenants do you mean in the sense of OpenStack Tenants/Project-ID's or in the sense of “clients/workers”? For the latter case, the pilot tests I’ve done so far used multiple clients (though not graphed), but in the former case only one “project” was used. multiple Tenant/Project-IDs From: Joe Gordon joe.gord...@gmail.com Reply-To: OpenStack Dev openstack-dev@lists.openstack.org Date: Friday, September 12, 2014 at 1:45 PM To: OpenStack Dev openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2) If zaqar is like amazon SQS, then the latency for a single message and the throughput for a single tenant is not important. I wouldn't expect anyone who has latency sensitive work loads or needs massive throughput to use zaqar, as these people wouldn't use SQS either. The consistency of the latency (shouldn't change under load) and zaqar's ability to scale horizontally mater much more. What I would be great to see some other things benchmarked instead: * graph latency versus number of concurrent active tenants * graph latency versus message size * How throughput scales as you scale up the number of assorted zaqar components. If one of the benefits of zaqar is its horizontal scalability, lets see it. * How does this change with message batching? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues
Hi All, My understanding of Zaqar is that it's like SQS. SQS uses distributed queues, which have a few unusual properties [0]: Message Order Amazon SQS makes a best effort to preserve order in messages, but due to the distributed nature of the queue, we cannot guarantee you will receive messages in the exact order you sent them. If your system requires that order be preserved, we recommend you place sequencing information in each message so you can reorder the messages upon receipt. At-Least-Once Delivery Amazon SQS stores copies of your messages on multiple servers for redundancy and high availability. On rare occasions, one of the servers storing a copy of a message might be unavailable when you receive or delete the message. If that occurs, the copy of the message will not be deleted on that unavailable server, and you might get that message copy again when you receive messages. Because of this, you must design your application to be idempotent (i.e., it must not be adversely affected if it processes the same message more than once). Message Sample The behavior of retrieving messages from the queue depends whether you are using short (standard) polling, the default behavior, or long polling. For more information about long polling, see Amazon SQS Long Polling http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html . With short polling, when you retrieve messages from the queue, Amazon SQS samples a subset of the servers (based on a weighted random distribution) and returns messages from just those servers. This means that a particular receive request might not return all your messages. Or, if you have a small number of messages in your queue (less than 1000), it means a particular request might not return any of your messages, whereas a subsequent request will. If you keep retrieving from your queues, Amazon SQS will sample all of the servers, and you will receive all of your messages. The following figure shows short polling behavior of messages being returned after one of your system components makes a receive request. Amazon SQS samples several of the servers (in gray) and returns the messages from those servers (Message A, C, D, and B). Message E is not returned to this particular request, but it would be returned to a subsequent request. Presumably SQS has these properties because it makes the system scalable, if so does Zaqar have the same properties (not just making these same guarantees in the API, but actually having these properties in the backends)? And if not, why? I looked on the wiki [1] for information on this, but couldn't find anything. [0] http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/DistributedQueues.html [1] https://wiki.openstack.org/wiki/Zaqar ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] PostgreSQL jobs slow in the gate
Postgres is also logging a lot of errors: http://logs.openstack.org/63/122263/1/check/check-tempest-dsvm-postgres-full/2f27252/logs/postgres.txt.gz On Wed, Sep 17, 2014 at 4:49 PM, Clark Boylan cboy...@sapwetik.org wrote: Hello, Recent sampling of test run times shows that our tempest jobs run against clouds using PostgreSQL are significantly slower than jobs run against clouds using MySQL. (check|gate)-tempest-dsvm-full has an average run time of 52.9 minutes (stddev 5.92 minutes) over 516 runs. (check|gate)-tempest-dsvm-postgres-full has an average run time of 73.78 minutes (stddev 11.01 minutes) over 493 runs. I think this is a bug and and an important one to solve prior to release if we want to continue to care and feed for PostgreSQL support. I haven't filed a bug in LP because I am not sure where the slowness is and creating a bug against all the projects is painful. (If there are suggestions for how to do this in a non painful way I will happily go file a proper bug). Is there interest in fixing this? If not we should probably reconsider removing these PostgreSQL jobs from the gate. ++ to getting someone to own and fix this or drop it from the gate. Note, a quick spot check indicates the increase in job time is not related to job setup. Total time before running tempest appears to be just over 18 minutes in the jobs I checked. Thank you, Clark ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Its time to start identifying release critical bugs
On Tue, Sep 16, 2014 at 2:58 AM, Michael Still mi...@stillhq.com wrote: Hi. Is time to start identifying (and working on) release critical bugs in nova before we ship RC1. My initial position is that any critical bug is release critical. There are currently critical bugs not targeted to rc1, but that should change in the next day or so. If we're not interested in fixing a critical bug in rc1, I think we should start to question if it is really critical. I'd also like help in deciding what other bugs are critical to be fixed before release. Please use this thread to suggest such things. I went through the top nova induced gate bugs, and made sure they are marked as critical and targeted [0]. As of the writing of this email we have 5 critical bugs, none of which have an assignee. Of those 5, 4 are gate bugs, with the two worst being: * Bug 1323658 - Nova resize/restart results in guest ending up in inconsistent state Top gate bug * Bug 1357578 - Unit test: nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_terminate_sigterm timing out in gate [0] https://bugs.launchpad.net/nova/+bugs?field.searchtext=orderby=-importancefield.status%3Alist=NEWfield.status%3Alist=CONFIRMEDfield.status%3Alist=TRIAGEDfield.status%3Alist=INPROGRESSfield.status%3Alist=INCOMPLETE_WITH_RESPONSEfield.status%3Alist=INCOMPLETE_WITHOUT_RESPONSEfield.importance%3Alist=CRITICALassignee_option=anyfield.assignee=field.bug_reporter=field.bug_commenter=field.subscriber=field.structural_subscriber=field.tag=field.tags_combinator=ANYfield.has_cve.used=field.omit_dupes.used=field.omit_dupes=onfield.affects_me.used=field.has_patch.used=field.has_branches.used=field.has_branches=onfield.has_no_branches.used=field.has_no_branches=onfield.has_blueprints.used=field.has_blueprints=onfield.has_no_blueprints.used=field.has_no_blueprints=onsearch=Search Thanks, Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] What's holding nova development back?
On Sep 15, 2014 8:31 PM, Jay Pipes jaypi...@gmail.com wrote: On 09/15/2014 08:07 PM, Jeremy Stanley wrote: On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote: [...] Sometimes it's pretty hard to determine whether something in the E-R check page is due to something in the infra scripts, some transient issue in the upstream CI platform (or part of it), or actually a bug in one or more of the OpenStack projects. [...] Sounds like an NP-complete problem, but if you manage to solve it let me know and I'll turn it into the first line of triage for Infra bugs. ;) LOL, thanks for making me take the last hour reading Wikipedia pages about computational complexity theory! :P No, in all seriousness, I wasn't actually asking anyone to boil the ocean, mathematically. I think doing a couple things just making the categorization more obvious (a UI thing, really) and doing some (hopefully simple?) inspection of some control group of patches that we know do not introduce any code changes themselves and comparing to another group of patches that we know *do* introduce code changes to Nova, and then seeing if there are a set of E-R issues that consistently appear in *both* groups. That set of E-R issues has a higher likelihood of not being due to Nova, right? We use launchpad's affected projects listings on the elastic recheck page to say what may be causing the bug. Tagging projects to bugs is a manual process, but one that works pretty well. UI: The elastic recheck UI definitely could use some improvements. I am very poor at writing UIs, so patches welcome! OK, so perhaps it's not the most scientific or well-thought out plan, but hey, it's a spark for thought... ;) Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)
On Tue, Sep 9, 2014 at 12:19 PM, Kurt Griffiths kurt.griffi...@rackspace.com wrote: Hi folks, In this second round of performance testing, I benchmarked the new Redis driver. I used the same setup and tests as in Round 1 to make it easier to compare the two drivers. I did not test Redis in master-slave mode, but that likely would not make a significant difference in the results since Redis replication is asynchronous[1]. As always, the usual benchmarking disclaimers apply (i.e., take these numbers with a grain of salt; they are only intended to provide a ballpark reference; you should perform your own tests, simulating your specific scenarios and using your own hardware; etc.). ## Setup ## Rather than VMs, I provisioned some Rackspace OnMetal[3] servers to mitigate noisy neighbor when running the performance tests: * 1x Load Generator * Hardware * 1x Intel Xeon E5-2680 v2 2.8Ghz * 32 GB RAM * 10Gbps NIC * 32GB SATADOM * Software * Debian Wheezy * Python 2.7.3 * zaqar-bench * 1x Web Head * Hardware * 1x Intel Xeon E5-2680 v2 2.8Ghz * 32 GB RAM * 10Gbps NIC * 32GB SATADOM * Software * Debian Wheezy * Python 2.7.3 * zaqar server * storage=mongodb * partitions=4 * MongoDB URI configured with w=majority * uWSGI + gevent * config: http://paste.openstack.org/show/100592/ * app.py: http://paste.openstack.org/show/100593/ * 3x MongoDB Nodes * Hardware * 2x Intel Xeon E5-2680 v2 2.8Ghz * 128 GB RAM * 10Gbps NIC * 2x LSI Nytro WarpDrive BLP4-1600[2] * Software * Debian Wheezy * mongod 2.6.4 * Default config, except setting replSet and enabling periodic logging of CPU and I/O * Journaling enabled * Profiling on message DBs enabled for requests over 10ms * 1x Redis Node * Hardware * 2x Intel Xeon E5-2680 v2 2.8Ghz * 128 GB RAM * 10Gbps NIC * 2x LSI Nytro WarpDrive BLP4-1600[2] * Software * Debian Wheezy * Redis 2.4.14 * Default config (snapshotting and AOF enabled) * One process As in Round 1, Keystone auth is disabled and requests go over HTTP, not HTTPS. The latency introduced by enabling these is outside the control of Zaqar, but should be quite minimal (speaking anecdotally, I would expect an additional 1-3ms for cached tokens and assuming an optimized TLS termination setup). For generating the load, I again used the zaqar-bench tool. I would like to see the team complete a large-scale Tsung test as well (including a full HA deployment with Keystone and HTTPS enabled), but decided not to wait for that before publishing the results for the Redis driver using zaqar-bench. CPU usage on the Redis node peaked at around 75% for the one process. To better utilize the hardware, a production deployment would need to run multiple Redis processes and use Zaqar's backend pooling feature to distribute queues across the various instances. Several different messaging patterns were tested, taking inspiration from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar) Each test was executed three times and the best time recorded. A ~1K sample message (1398 bytes) was used for all tests. ## Results ## ### Event Broadcasting (Read-Heavy) ### OK, so let's say you have a somewhat low-volume source, but tons of event observers. In this case, the observers easily outpace the producer, making this a read-heavy workload. Options * 1 producer process with 5 gevent workers * 1 message posted per request * 2 observer processes with 25 gevent workers each * 5 messages listed per request by the observers * Load distributed across 4[6] queues * 10-second duration 10 seconds is way too short Results * Redis * Producer: 1.7 ms/req, 585 req/sec * Observer: 1.5 ms/req, 1254 req/sec * Mongo * Producer: 2.2 ms/req, 454 req/sec * Observer: 1.5 ms/req, 1224 req/sec If zaqar is like amazon SQS, then the latency for a single message and the throughput for a single tenant is not important. I wouldn't expect anyone who has latency sensitive work loads or needs massive throughput to use zaqar, as these people wouldn't use SQS either. The consistency of the latency (shouldn't change under load) and zaqar's ability to scale horizontally mater much more. What I would be great to see some other things benchmarked instead: * graph latency versus number of concurrent active tenants * graph latency versus message size * How throughput scales as you scale up the number of assorted zaqar components. If one of the benefits of zaqar is its horizontal scalability, lets see it. * How does this change
Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 11, 2014 at 2:18 AM, Daniel P. Berrange berra...@redhat.com wrote: On Thu, Sep 11, 2014 at 09:23:34AM +1000, Michael Still wrote: On Thu, Sep 11, 2014 at 8:11 AM, Jay Pipes jaypi...@gmail.com wrote: a) Sorting out the common code is already accounted for in Dan B's original proposal -- it's a prerequisite for the split. Its a big prerequisite though. I think we're talking about a release worth of work to get that right. I don't object to us doing that work, but I think we need to be honest about how long its going to take. It will also make the core of nova less agile, as we'll find it hard to change the hypervisor driver interface over time. Do we really think its ready to be stable? Yes, in my proposal I explicitly said we'd need to have Kilo for all the prep work to clean up the virt API, before only doing the split in Lx. The actual nova/virt/driver.py has been more stable over the past few releases than I thought it would be. In terms of APIs we're not really modified existing APIs, mostly added new ones. Where we did modify existing APIs, we could have easily taken the approach of adding a new API in parallel and deprecating the old entry point to maintain compat. The big change which isn't visible directly is the conversion of internal nova code to use objects. Finishing this conversion is clearly a pre-requisite to any such split, since we'd need to make sure all data passed into the nova virt APIs as parameters is stable well defined. As an alternative approach... What if we pushed most of the code for a driver into a library? Imagine a library which controls the low level operations of a hypervisor -- create a vm, attach a NIC, etc. Then the driver would become a shim around that which was relatively thin, but owned the interface into the nova core. The driver handles the nova specific things like knowing how to create a config drive, or how to orchestrate with cinder, but hands over all the hypervisor operations to the library. If we found a bug in the library we just pin our dependancy on the version we know works whilst we fix things. In fact, the driver inside nova could be a relatively generic library driver, and we could have multiple implementations of the library, one for each hypervisor. I don't think that particularly solves the problem, particularly the ones you are most concerned about above of API stability. The naive impl of any library for the virt driver would pretty much mirror the nova virt API. The virt driver impls would thus have to do the job of taking the Nova objects passed in as parameters and turning them into something stable to pass to the library. Except now instead of us only having to figure out a stable API in one place, every single driver has to reinvent the wheel defining their own stable interface objects. I'd also be concerned that ongoing work on drivers is still going to require alot of patches to Nova to update the shims all the time, so we're still going to contend on resource fairly highly. b) The conflict Dan is speaking of is around the current situation where we have a limited core review team bandwidth and we have to pick and choose which virt driver-specific features we will review. This leads to bad feelings and conflict. The way this worked in the past is we had cores who were subject matter experts in various parts of the code -- there is a clear set of cores who get xen or libivrt for example and I feel like those drivers get reasonable review times. What's happened though is that we've added a bunch of drivers without adding subject matter experts to core to cover those drivers. Those newer drivers therefore have a harder time getting things reviewed and approved. FYI, for Juno at least I really don't consider that even the libvirt driver got acceptable review times in any sense. The pain of waiting for reviews in libvirt code I've submitted this cycle is what prompted me to start this thread. All the virt drivers are suffering way more than they should be, but those without core team representation suffer Can't you replace the word 'libvirt code' with 'nova code' and this would still be true? Do you think landing virt driver code is harder then landing non virt driver code? If so do you have any numbers to back this up? If the issue here is 'landing code in nova is too painful', then we should discuss solving that more generalized issue first, and maybe we conclude that pulling out the virt drivers gets us the most bang for our buck. But unless we have that more general discussion, saying the right fix for that is to spend a large amount of time working specifically on virt driver related issues seems premature. to an even greater degree. And this is ignoring the point Jay I were making about how the use of a single team means that there is always contention for feature approval, so much
Re: [openstack-dev] Kilo Cycle Goals Exercise
On Wed, Sep 3, 2014 at 8:37 AM, Joe Gordon joe.gord...@gmail.com wrote: As you all know, there has recently been several very active discussions around how to improve assorted aspects of our development process. One idea that was brought up is to come up with a list of cycle goals/project priorities for Kilo [0]. To that end, I would like to propose an exercise as discussed in the TC meeting yesterday [1]: Have anyone interested (especially TC members) come up with a list of what they think the project wide Kilo cycle goals should be and post them on this thread by end of day Wednesday, September 10th. After which time we can begin discussing the results. The goal of this exercise is to help us see if our individual world views align with the greater community, and to get the ball rolling on a larger discussion of where as a project we should be focusing more time. 1. Strengthen our north bound APIs * API micro-versioning * Improved CLI's and SDKs * Better capability discovery * Hide usability issues with client side logic * Improve reliability As others have said in this thread trying to use OpenStack as a user is a very frustrating experience. For a long time now we have focused on southbound APIs such as drivers, configuration options, supported architectures etc. But as a project we have not spent nearly enough time on the end user experience. If our northbound APIs aren't something developers want to use, our southbound API work doesn't matter. 2. 'Fix' our development process * openstack-specs. Currently we don't have any good way to work on big entire-project efforts, hopefully something like a openstack-specs repo (with liasons from each core-team reviewing it) will help make it possible for us to tackle these issues. I see us addressing the API micro-versioning and capability discovery issues here. * functional testing and post merge testing. As discussed elsewhere in this thread our current testing model isn't meeting our current requirements. 3. Pay down technical debt This is the one I am actually least sure about, as I can really only speak for nova on this one. In our constant push forward we have accumulated a lot of technical debt. The debt manifests itself as hard to maintain code, bugs (nova had over 1000 open bugs until yesterday), performance/scaling issues and missing basic features. I think its time for us to take inventory if our technical debt and fix some of the biggest issues. best, Joe Gordon [0] http://lists.openstack.org/pipermail/openstack-dev/2014-August/041929.html [1] http://eavesdrop.openstack.org/meetings/tc/2014/tc.2014-09-02-20.04.log.html ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] memory usage in devstack-gate (the oom-killer strikes again)
Hi All, We have recently started seeing assorted memory issues in the gate including the oom-killer [0] and libvirt throwing memory errors [1]. Luckily we run ps and dstat on every devstack run so we have some insight into why we are running out of memory. Based on the output from job taken at random [2][3] a typical run consists of: * 68 openstack api processes alone * the following services are running 8 processes (number of CPUs on test nodes) * nova-api (we actually run 24 of these, 8 compute, 8 EC2, 8 metadata) * nova-conductor * cinder-api * glance-api * trove-api * glance-registry * trove-conductor * together nova-api, nova-conductor, cinder-api alone take over 45 %MEM (note: some of that is memory usage is counted multiple times as RSS includes shared libraries) * based on dstat numbers, it looks like we don't use that much memory before tempest runs, and after tempest runs we use a lot of memory. Based on this information I have two categories of questions: 1) Should we explicitly set the number of workers that services use in devstack? Why have so many workers in a small all-in-one environment? What is the right balance here? 2) Should we be worried that some OpenStack services such as nova-api, nova-conductor and cinder-api take up so much memory? Does there memory usage keep growing over time, does anyone have any numbers to answer this? Why do these processes take up so much memory? best, Joe [0] http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwib29tLWtpbGxlclwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiIxNzI4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDEwMjExMjA5NzY3fQ== [1] https://bugs.launchpad.net/nova/+bug/1366931 [2] http://paste.openstack.org/show/108458/ [3] http://logs.openstack.org/83/119183/4/check/check-tempest-dsvm-full/ea576e7/logs/screen-dstat.txt.gz ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
On Fri, Sep 5, 2014 at 4:05 AM, Nikola Đipanov ndipa...@redhat.com wrote: On 09/04/2014 10:25 PM, Solly Ross wrote: Anyway, I think it would be useful to have some sort of page where people could say I'm an SME in X, ask me for reviews and then patch submitters could go and say, oh, I need an someone to review my patch about storage backends, let me ask sross. This is a good point - I've been thinking along similar lines that we really could have a huge win in terms of the review experience by building a tool (maybe a social network looking one :)) that relates reviews to people being able to do them, visualizes reviewer karma and other things that can help make the code submissions and reviews more human friendly. Dan seems to dismiss the idea of improved tooling as something that can get us only thus far, but I am not convinced. However - this will require even more manpower and we are already ridiculously short on that so... I have previously toyed with idea of making such a tool, and if someone else wants to work on it I would be happy to help. N. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][FFE] Feature Freeze exception for juno-slaveification
On Fri, Sep 5, 2014 at 9:11 AM, Mike Wilson geekinu...@gmail.com wrote: Hi all, I am requesting an exception for the juno-slaveification blueprint. There is a single outstanding patch [1] which has already been approved before, but needed to be re-spun due to gate failures which then necessitated a rebase. For those not familiar with the work, the spec[2] can shed some more light on the scope of work for Juno. All the other patches from this blueprint have merged, the only remaining patch really just needs a +W as it has been extensively reviewed and already approved previously. This may be an easy candidate since Andrew Laski, Jay Pipes and Dan Smith have reviewed and +2'd this already. I am happy to sponsor this, as this is the last patch needed to finish a BP and the patch was approved once bit needed a rebase over some objects changes. The change from the approved version is pretty minor. Thanks, Mike Wilson [1] https://review.openstack.org/#/c/103064/ [2] http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/juno/juno-slaveification.rst ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [feature freeze exception] Move to oslo.db
On Wed, Sep 3, 2014 at 11:30 PM, Michael Still mi...@stillhq.com wrote: I'm good with this one too, so that makes three if Joe is ok with this. I am ok with this, I hope the move to oslo.db will fix a few bugs for us and the nova patch to review isn't too bad. @Josh -- can you please take a look at the TH failures? Thanks, Michael On Wed, Sep 3, 2014 at 8:10 PM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 9/3/2014 5:08 PM, Andrey Kurilin wrote: Hi All! I'd like to ask for a feature freeze exception for porting nova to use oslo.db. This change not only removes 3k LOC, but fixes 4 bugs(see commit message for more details) and provides relevant, stable common db code. Main maintainers of oslo.db(Roman Podoliaka and Victor Sergeyev) are OK with this. Joe Gordon and Matt Riedemann are already signing up, so we need one more vote from Core developer. By the way a lot of core projects are using already oslo.db for a while: keystone, cinder, glance, ceilometer, ironic, heat, neutron and sahara. So migration to oslo.db won’t produce any unexpected issues. Patch is here: https://review.openstack.org/#/c/101901/ -- Best regards, Andrey Kurilin. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Just re-iterating my agreement to sponsor this. I'm waiting for the latest patch set to pass Jenkins and for Roman to review after his comments from the previous patch set and -1. Otherwise I think this is nearly ready to go. The turbo-hipster failures on the change appear to be infra issues in t-h rather than problems with the code. -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
On Thu, Sep 4, 2014 at 2:23 AM, John Garbutt j...@johngarbutt.com wrote: Sorry for another top post, but I like how Nikola has pulled this problem apart, and wanted to respond directly to his response. On 3 September 2014 10:50, Nikola Đipanov ndipa...@redhat.com wrote: The reason many features including my own may not make the FF is not because there was not enough buy in from the core team (let's be completely honest - I have 3+ other core members working for the same company that are by nature of things easier to convince), but because of any of the following: * Crippling technical debt in some of the key parts of the code +1 We have problems that need solving. One of the ideas behind the slots proposal is to encourage work on the urgent technical debt, before related features are even approved. * that we have not been acknowledging as such for a long time -1 We keep saying thats cool, but we have to fix/finish XXX first. But... we have been very bad at: * remembering that, and recording that * actually fixing those problems * which leads to proposed code being arbitrarily delayed once it makes the glaring flaws in the underlying infra apparent Sometimes we only spot this stuff in code reviews, where you throw up reading all the code around the change, and see all the extra complexity being added to a fragile bit of the code, and well, then you really don't want to be the person who clicks approve on that. We need to track this stuff better. Every time it happens, we should try make a not to go back there and do more tidy ups. * and that specs process has been completely and utterly useless in helping uncover (not that process itself is useless, it is very useful for other things) Yeah, it hasn't helped for this. I don't think we should do this, but I keep thinking about making specs two step: * write generally direction doc * go write the code, maybe upload as WIP * write the documentation part of the spec * get docs merged before any code I am almost positive we can turn this rather dire situation around easily in a matter of months, but we need to start doing it! It will not happen through pinning arbitrary numbers to arbitrary processes. +1 This is ongoing, but there are some major things, I feel we should stop and fix in kilo. ...and that will make getting features in much worse for a little while, but it will be much better on the other side. I will follow up with a more detailed email about what I believe we are missing, once the FF settles and I have applied some soothing creme to my burnout wounds Awesome, please catch up with jogo who was also trying to build this list. I would love to continue to contribute to that too. I am not actually trying to build that list yet, right now I am trying to get consensus on the idea of having project priorities: https://review.openstack.org/#/c/112733/ Once that patch lands I was going to start iterating on a kilo priorities patch so we have something written down (in nova-specs) that we can go off for summit planning. Might be working moving into here: https://etherpad.openstack.org/p/kilo-nova-summit-topics The idea was/is to use that list to decide what fills up the majority of code slots in Juno. but currently my sentiment is: Contributing features to Nova nowadays SUCKS!!1 (even as a core reviewer) We _have_ to change that! Agreed. In addition, our bug list would suggest our users are seeing the impact of this technical debt. My personal feeling is we also need to tidy up our testing debt too: * document major bits that are NOT tested, so users are clear * document what combinations and features we actually see tested up stream * support different levels of testing: on-demand+daily vs every commit * making it easier to interested parties to own and maintain some testing * plan for removing the untested code paths in L * allow for untested code to enter the tree, as experimental, with the expectation it gets removed in the following release if not tested, and architected so that is possible (note this means supporting experimental APIs that can be ripped out at a later date.) We have started doing some of the above work. But I think we need to hold ALL code to the same standard. It seems it will take time to agree on that standard, but the above is an attempt to compromise between speed of innovation and stability. Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 4, 2014 at 3:24 AM, Daniel P. Berrange berra...@redhat.com wrote: Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the reviews in a more efficient manner than plain gerrit allows for. These certainly help, but they can't ever solve the problem on their own - just make it slightly more bearable. And this is not even considering that core team members might have useful contributions to make in ways beyond just code review. Ultimately the workload is just too high to sustain the levels of review required, so core team members will eventually burn out (as they have done many times already). Even if one person attempts to take the initiative to heavily invest in review of certain features it is often to no avail. Unless a second dedicated core reviewer can be found to 'tag team' it is hard for one person to make a difference. The end result is that a patch is +2d and then sits idle for weeks or more until a merge conflict requires it to be reposted at which point even that one +2 is lost. This is a pretty demotivating outcome for both reviewers the patch contributor. New core team talent It can't escape attention that the Nova core team does not grow in size very often. When Nova was younger and its code base was smaller, it was easier for contributors to get onto core because the base level of knowledge required was that much smaller. To get onto core today requires a major investment in learning Nova over a year or more. Even people who potentially have the latent skills may not have the time available to invest in learning the entire of Nova. With the number of reviews proposed to Nova, the core team should probably be at least double its current size[1]. There is plenty of expertize in the project as a whole but it is typically focused into specific areas of the codebase. There is nowhere we can find 20 more people with broad knowledge of the codebase who could be promoted even over the next year, let alone today. This is ignoring that many existing members of core are relatively inactive due to burnout and so need replacing. That means we really need another 25-30 people for core. That's not going to happen. Code review delays -- The obvious result of having too much work for too few reviewers is that code contributors face major delays in getting their work reviewed and merged. From personal experience, during Juno, I've probably spent 1 week in aggregate on actual code development vs 8 weeks on waiting on code review. You have to constantly be on alert for review comments because unless you can respond quickly (and repost) while you still have the attention of the reviewer, they may not be look again for days/weeks. The length of time to get work merged serves as a demotivator to actually do work in the first place. I've personally avoided doing alot of code refactoring cleanup work that would improve the maintainability of the libvirt driver in the long term, because I can't face the battle to get it reviewed
[openstack-dev] Kilo Cycle Goals Exercise
As you all know, there has recently been several very active discussions around how to improve assorted aspects of our development process. One idea that was brought up is to come up with a list of cycle goals/project priorities for Kilo [0]. To that end, I would like to propose an exercise as discussed in the TC meeting yesterday [1]: Have anyone interested (especially TC members) come up with a list of what they think the project wide Kilo cycle goals should be and post them on this thread by end of day Wednesday, September 10th. After which time we can begin discussing the results. The goal of this exercise is to help us see if our individual world views align with the greater community, and to get the ball rolling on a larger discussion of where as a project we should be focusing more time. best, Joe Gordon [0] http://lists.openstack.org/pipermail/openstack-dev/2014-August/041929.html [1] http://eavesdrop.openstack.org/meetings/tc/2014/tc.2014-09-02-20.04.log.html ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [infra][qa][neutron] Neutron full job, advanced services, and the integrated gate
On Tue, Aug 26, 2014 at 4:47 PM, Salvatore Orlando sorla...@nicira.com wrote: TL; DR A few folks are proposing to stop running tests for neutron advanced services [ie: (lb|vpn|fw)aas] in the integrated gate, and run them only on the neutron gate. Reason: projects like nova are 100% orthogonal to neutron advanced services. Also, there have been episodes in the past of unreliability of tests for these services, and it would be good to limit affected projects considering that more api tests and scenarios are being added. - So far the neutron full job runs tests (api and scenarios) for neutron core functionality as well as neutron advanced services, which run as neutron service plugin. It's highly unlikely, if not impossible, that changes in projects such as nova, glance or ceilometer can have an impact on the stability of these services. On the other hand, instability in these services can trigger gate failures in unrelated projects as long as tests for these services are run in the neutron full job in the integrated gate. There have already been several gate-breaking bugs in lbaas scenario tests are firewall api tests. Admittedly, advanced services do not have the same level of coverage as core neutron functionality. Therefore as more tests are being added, there is an increased possibility of unearthing dormant bugs. I support this split but for slightly different reasons. I am under the impression that neutron advanced services are not ready for prime time. If that is correct I don't think we should be gating on things that aren't ready. For this reason we are proposing to not run anymore tests for neutron advanced services in the integrated gate, but keep them running on the neutron gate. This means we will have two neutron jobs: 1) check-tempest-dsvm-neutron-full which will run only core neutron functionality 2) check-tempest-dsvm-neutron-full-ext which will be what the neutron full job is today. Using my breakdown, the extended job would include experimental neutron features. The former will be part of the integrated gate, the latter will be part of the neutron gate. Considering that other integrating services should not have an impact on neutron advanced services, this should not make gate testing asymmetric. However, there might be exceptions for: - orchestration project like heat which in the future might leverage capabilities like load balancing - oslo-* libraries, as changes in them might have an impact on neutron advanced services, since they consume those libraries Once another service starts consuming an advanced feature I think it makes sense to move it to the main neutron-full job. Especially if we assume that things will only depend on neutron features that are not too experimental. Another good question is whether extended tests should be performed as part of functional or tempest checks. My take on this is that scenario tests should always be part of tempest. On the other hand I reckon API tests should exclusively be part of functional tests, but as so far tempest is running a gazillion of API tests, this is probably a discussion for the medium/long term. In order to add this new job there are a few patches under review: [1] and [2] Introduces the 'full-ext' job and devstack-gate support for it. [3] Are the patches implementing a blueprint which will enable us to specify for which extensions test should be executed. Finally, one more note about smoketests. Although we're planning to get rid of them soon, we still have failures in the pg job because of [4]. For this reasons smoketests are still running for postgres in the integrated gate. As load balancing and firewall API tests are part of it, they should be removed from the smoke test executed on the integrated gate ([5], [6]). This is a temporary measure until the postgres issue is fixed. ++ Regards, Salvatore [1] https://review.openstack.org/#/c/114933/ [2] https://review.openstack.org/#/c/114932/ [3] https://review.openstack.org/#/q/status:open+branch:master+topic:bp/branchless-tempest-extensions,n,z [4] https://bugs.launchpad.net/nova/+bug/1305892 [5] https://review.openstack.org/#/c/115022/ [6] https://review.openstack.org/#/c/115023/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] libvirt version_cap, a postmortem
letting situations fester. Thanks, and sorry for being a windbag, Mark. --- = July 1 = The starting point is this review: https://review.openstack.org/103923 Dan Smith proposes a policy that the libvirt driver may not use libvirt features until they have been available in Ubuntu or Fedora for at least 30 days. The commit message mentions: broken us in the past when we add a new feature that requires a newer libvirt than we test with, and we discover that it's totally broken when we upgrade in the gate. which AIUI is a reference to the libvirt live snapshot issue the previous week, which is described here: https://review.openstack.org/102643 where upgrading to Ubuntu Trusty meant the libvirt version in use in the gate went from 0.9.8 to 1.2.2, which caused the live snapshot code paths in Nova for the first time, which appeared to be related to some serious gate instability (although the exact root cause wasn't identified). Some background on the libvirt version upgrade can be seen here: http://lists.openstack.org/pipermail/openstack-dev/2014-March/thread.html#30284 = July 1 - July 8 = Back and forth debate mostly between Dan Smith and Dan Berrange. Sean votes +2, Dan Berrange votes -2. = July 14 = Russell adds his support to Dan Berrange's position, votes -2. Some debate between Dan and Dan continues. Joe Gordon votes +2. Matt Riedemann expresses support-in-principal for Dan Smith's approach. = July 15 = Debate continues ... 16:12 - I -2 the patch and attempt to take a step back and think about how we could have prevented (or at least mitigated against) the live snapshot issue and suggest the idea of adding a new configuration option: [libvirt] version_cap = 1.2.2 which would mean we would not automatically start using new libvirt features in the gate because of a libvirt version upgrade, but instead the new features would only begin to be used when we merge a change to the default value of version_cap. 16:31 - I leave a separate comment addressing the broader debate about our functional test coverage requirements. 16:46 - Dan Berrange likes the version_cap idea 15:37 - Dan Berrange posts an implementation of version_cap: https://review.openstack.org/107119 and links to it from in Dan Smith's libvirt testing policy review (#103923) 21:49 - Matt expresses some support for the config option, but worries about the precedent being set. 23:14 - Dan Berrange explains his point of view that a test all the things rule must mean test all the things which can be practically tested by our current CI system. = July 16 = 08:04 - I +2 the version_cap patch after Dan fixes up some issues I pointed out. 13:44 - 14:28 - Sean and John Garbutt add further thoughts to the libvirt testing policy review without making any comment on the version_cap idea. Sean takes the debate to the mailing list: http://lists.openstack.org/pipermail/openstack-dev/2014-July/040421.html Debate continues in the thread, largely around the mechanics of how to allow a newer version of libvirt be used in the gate. 15:08 - I mention the version_cap proposal on the thread for the first time: http://lists.openstack.org/pipermail/openstack-dev/2014-July/040436.html and the point I make is the configuration option makes it easier for operators to run only code paths that are tested by the gate. 16:44 - Johannes notes that multiple issues with code paths not tested in the gate may need to be fixed as part of a future review to increase the default value of version_cap. http://lists.openstack.org/pipermail/openstack-dev/2014-July/040456.html 18:50 - Russell approves the version_cap patch. https://review.openstack.org/107119 = July 17 = 05:38 - The version_cap patch merges. 13:09 - Somewhat related, Dan Berrange and I explain we won't be at the mid-cycle issue for any test policy discussions. Sean makes a point that the discussion is best had on email/IRC where there is a permanent record. 14:33 - Johannes expresses concern in gerrit that version_cap got merged too quickly. https://review.openstack.org/107119 15:17 - Dan Berrange responds to Johannes in gerrit, saying that he thinks version_cap is useful irrespective of the broader testing discussion. 15:28 - Johannes disagrees, asks for a response to his concerns on the mailing list. 15:40 - Dan Berrange responds on the mailing list to Johannes version_cap concerns. http://lists.openstack.org/pipermail/openstack-dev/2014-July/040576.html 18:15 - Russell also responds to Johannes. http://lists.openstack.org/pipermail/openstack-dev/2014-July/040597.html 18:31 - Johannes responds to Dan. http://lists.openstack.org/pipermail/openstack-dev/2014-July/040602.html 18:39 - Russell responds to Johannes again. http://lists.openstack.org/pipermail/openstack-dev/2014-July/040604.html 19:13 - Johannes responds again. http://lists.openstack.org
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
On Wed, Sep 3, 2014 at 2:50 AM, Nikola Đipanov ndipa...@redhat.com wrote: On 09/02/2014 09:23 PM, Michael Still wrote: On Tue, Sep 2, 2014 at 1:40 PM, Nikola Đipanov ndipa...@redhat.com wrote: On 09/02/2014 08:16 PM, Michael Still wrote: Hi. We're soon to hit feature freeze, as discussed in Thierry's recent email. I'd like to outline the process for requesting a freeze exception: * your code must already be up for review * your blueprint must have an approved spec * you need three (3) sponsoring cores for an exception to be granted Can core reviewers who have features up for review have this number lowered to two (2) sponsoring cores, as they in reality then need four (4) cores (since they themselves are one (1) core but cannot really vote) making it an order of magnitude more difficult for them to hit this checkbox? That's a lot of numbers in that there paragraph. Let me re-phrase your question... Can a core sponsor an exception they themselves propose? I don't have a problem with someone doing that, but you need to remember that does reduce the number of people who have agreed to review the code for that exception. Michael has correctly picked up on a hint of snark in my email, so let me explain where I was going with that: The reason many features including my own may not make the FF is not because there was not enough buy in from the core team (let's be completely honest - I have 3+ other core members working for the same company that are by nature of things easier to convince), but because of any of the following: I find the statement about having multiple cores at the same company very concerning. To quote Mark McLoughlin, It is assumed that all core team members are wearing their upstream hat and aren't there merely to represent their employers interests [0]. Your statement appears to be in direct conflict with Mark's idea of what core reviewer is, and idea that IMHO is one of the basic tenants of OpenStack development. [0] http://lists.openstack.org/pipermail/openstack-dev/2013-July/012073.html * Crippling technical debt in some of the key parts of the code * that we have not been acknowledging as such for a long time * which leads to proposed code being arbitrarily delayed once it makes the glaring flaws in the underlying infra apparent * and that specs process has been completely and utterly useless in helping uncover (not that process itself is useless, it is very useful for other things) I am almost positive we can turn this rather dire situation around easily in a matter of months, but we need to start doing it! It will not happen through pinning arbitrary numbers to arbitrary processes. Nova is big and complex enough that I don't think any one person is able to identify what we need to work on to make things better. That is one of the reasons why I have the project priorities patch [1] up. I would like to see nova as a team discuss and come up with what we think we need to focus on to get us back on track. [1] https://review.openstack.org/#/c/112733/ I will follow up with a more detailed email about what I believe we are missing, once the FF settles and I have applied some soothing creme to my burnout wounds, but currently my sentiment is: Contributing features to Nova nowadays SUCKS!!1 (even as a core reviewer) We _have_ to change that! Yes, I can agree with you on this part, things in nova land are not good. N. Michael * exceptions must be granted before midnight, Friday this week (September 5) UTC * the exception is valid until midnight Friday next week (September 12) UTC when all exceptions expire For reference, our rc1 drops on approximately 25 September, so the exception period needs to be short to maximise stabilization time. John Garbutt and I will both be granting exceptions, to maximise our timezone coverage. We will grant exceptions as they come in and gather the required number of cores, although I have also carved some time out in the nova IRC meeting this week for people to discuss specific exception requests. Michael ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno
On Wed, Sep 3, 2014 at 8:57 AM, Solly Ross sr...@redhat.com wrote: I will follow up with a more detailed email about what I believe we are missing, once the FF settles and I have applied some soothing creme to my burnout wounds, but currently my sentiment is: Contributing features to Nova nowadays SUCKS!!1 (even as a core reviewer) We _have_ to change that! I think this is *very* important. rant For instance, I have/had two patch series up. One is of length 2 and is relatively small. It's basically sitting there with one +2 on each patch. I will now most likely have to apply for a FFE to get it merged, not because there's more changes to be made before it can get merged (there was one small nit posted yesterday) or because it's a huge patch that needs a lot of time to review, but because it just took a while to get reviewed by cores, and still only appears to have been looked at by one core. For the other patch series (which is admittedly much bigger), it was hard just to get reviews (and it was something where I actually *really* wanted several opinions, because the patch series touched a couple of things in a very significant way). Now, this is not my first contribution to OpenStack, or to Nova, for that matter. I know things don't always get in. It's frustrating, however, when it seems like the reason something didn't get in wasn't because it was fundamentally flawed, but instead because it didn't get reviews until it was too late to actually take that feedback into account, or because it just didn't get much attention review-wise at all. If I were a new contributor to Nova who had successfully gotten a major blueprint approved and the implemented, only to see it get rejected like this, I might get turned off of Nova, and go to work on one of the other OpenStack projects that seemed to move a bit faster. /rant So, it's silly to rant without actually providing any ideas on how to fix it. One suggestion would be, for each approved blueprint, to have one or two cores explicitly marked as being responsible for providing at least some feedback on that patch. This proposal has issues, since we have a lot of blueprints and only twenty cores, who also have their own stuff to work on. However, I think the general idea of having guaranteed reviewers is not unsound by itself. Perhaps we should have a loose tier of reviewers between core and everybody else. These reviewers would be known good reviewers who would follow the implementation of particular blueprints if a core did not have the time. Then, when those reviewers gave the +1 to all the patches in a series, they could ping a core, who could feel more comfortable giving a +2 without doing a deep inspection of the code. That's just one suggestion, though. Whatever the solution may be, this is a problem that we need to fix. While I enjoyed going through the blueprint process this cycle (not sarcastic -- I actually enjoyed the whole structured feedback thing), the follow up to that was not the most pleasant. One final note: the specs referenced above didn't get approved until Spec Freeze, which seemed to leave me with less time to implement things. In fact, it seemed that a lot of specs didn't get approved until spec freeze. Perhaps if we had more staggered approval of specs, we'd have more staggered submission of patches, and thus less of a sudden influx of patches in the couple weeks before feature proposal freeze. While you raise some good points, albeit not new ones just long standing issues that we really need to address, Nikola appears to not be commenting on the shortage of reviews but rather on the amount of technical debt Nova has. Best Regards, Solly Ross - Original Message - From: Nikola Đipanov ndipa...@redhat.com To: openstack-dev@lists.openstack.org Sent: Wednesday, September 3, 2014 5:50:09 AM Subject: Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno On 09/02/2014 09:23 PM, Michael Still wrote: On Tue, Sep 2, 2014 at 1:40 PM, Nikola Đipanov ndipa...@redhat.com wrote: On 09/02/2014 08:16 PM, Michael Still wrote: Hi. We're soon to hit feature freeze, as discussed in Thierry's recent email. I'd like to outline the process for requesting a freeze exception: * your code must already be up for review * your blueprint must have an approved spec * you need three (3) sponsoring cores for an exception to be granted Can core reviewers who have features up for review have this number lowered to two (2) sponsoring cores, as they in reality then need four (4) cores (since they themselves are one (1) core but cannot really vote) making it an order of magnitude more difficult for them to hit this checkbox? That's a lot of numbers in that there paragraph. Let me re-phrase your question... Can a core sponsor an exception they
Re: [openstack-dev] [nova] Is the BP approval process broken?
On Aug 29, 2014 10:42 AM, Dugger, Donald D donald.d.dug...@intel.com wrote: Well, I think that there is a sign of a broken (or at least bent) process and that's what I'm trying to expose. Especially given the ongoing conversations over Gantt it seems wrong that ultimately it was rejected due to silence. Maybe rejecting the BP was the right decision but the way the decision was made was just wrong. Note that dealing with silence is `really` difficult. You point out that maybe silence means people don't agree with the BP but how do I know? Maybe it means no one has time, maybe no one has an opinion, maybe it got lost in the shuffle, maybe I'm being too obnoxious - who knows. A simple -1 with a one sentence explanation would helped a lot. How is this: -1, we already have too many approved blueprints in Juno and it sounds like there are still concerns about the Gantt split in general. Hopefully after trunk is open for Kilo we can revisit the Gantt idea. I'm thinking yet another ML thread outlining why and how to get there. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Jay Pipes [mailto:jaypi...@gmail.com] Sent: Friday, August 29, 2014 10:43 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [nova] Is the BP approval process broken? On 08/29/2014 12:25 PM, Zane Bitter wrote: On 28/08/14 17:02, Jay Pipes wrote: I understand your frustration about the silence, but the silence from core team members may actually be a loud statement about where their priorities are. I don't know enough about the Nova review situation to say if the process is broken or not. But I can say that if passive-aggressively ignoring people is considered a primary communication channel, something is definitely broken. Nobody is ignoring anyone. There have ongoing conversations about the scheduler and Gantt, and those conversations haven't resulted in all the decisions that Don would like. That is unfortunate, but it's not a sign of a broken process. -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] gate debugging
On Thu, Aug 28, 2014 at 10:17 AM, Sean Dague s...@dague.net wrote: On 08/28/2014 12:48 PM, Doug Hellmann wrote: On Aug 27, 2014, at 5:56 PM, Sean Dague s...@dague.net wrote: On 08/27/2014 05:27 PM, Doug Hellmann wrote: On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote: Note: thread intentionally broken, this is really a different topic. On 08/27/2014 02:30 PM, Doug Hellmann wrote: On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote: On Wed, 27 Aug 2014, Doug Hellmann wrote: I have found it immensely helpful, for example, to have a written set of the steps involved in creating a new library, from importing the git repo all the way through to making it available to other projects. Without those instructions, it would have been much harder to split up the work. The team would have had to train each other by word of mouth, and we would have had constant issues with inconsistent approaches triggering different failures. The time we spent building and verifying the instructions has paid off to the extent that we even had one developer not on the core team handle a graduation for us. +many more for the relatively simple act of just writing stuff down Write it down.” is my theme for Kilo. I definitely get the sentiment. Write it down is also hard when you are talking about things that do change around quite a bit. OpenStack as a whole sees 250 - 500 changes a week, so the interaction pattern moves around enough that it's really easy to have *very* stale information written down. Stale information is even more dangerous than no information some times, as it takes people down very wrong paths. I think we break down on communication when we get into a conversation of I want to learn gate debugging because I don't quite know what that means, or where the starting point of understanding is. So those intentions are well meaning, but tend to stall. The reality was there was no road map for those of us that dive in, it's just understanding how OpenStack holds together as a whole and where some of the high risk parts are. And a lot of that comes with days staring at code and logs until patterns emerge. Maybe if we can get smaller more targeted questions, we can help folks better? I'm personally a big fan of answering the targeted questions because then I also know that the time spent exposing that information was directly useful. I'm more than happy to mentor folks. But I just end up finding the I want to learn at the generic level something that's hard to grasp onto or figure out how we turn it into action. I'd love to hear more ideas from folks about ways we might do that better. You and a few others have developed an expertise in this important skill. I am so far away from that level of expertise that I don’t know the questions to ask. More often than not I start with the console log, find something that looks significant, spend an hour or so tracking it down, and then have someone tell me that it is a red herring and the issue is really some other thing that they figured out very quickly by looking at a file I never got to. I guess what I’m looking for is some help with the patterns. What made you think to look in one log file versus another? Some of these jobs save a zillion little files, which ones are actually useful? What tools are you using to correlate log entries across all of those files? Are you doing it by hand? Is logstash useful for that, or is that more useful for finding multiple occurrences of the same issue? I realize there’s not a way to write a how-to that will live forever. Maybe one way to deal with that is to write up the research done on bugs soon after they are solved, and publish that to the mailing list. Even the retrospective view is useful because we can all learn from it without having to live through it. The mailing list is a fairly ephemeral medium, and something very old in the archives is understood to have a good chance of being out of date so we don’t have to keep adding disclaimers. Sure. Matt's actually working up a blog post describing the thing he nailed earlier in the week. Yes, I appreciate that both of you are responding to my questions. :-) I have some more specific questions/comments below. Please take all of this in the spirit of trying to make this process easier by pointing out where I’ve found it hard, and not just me complaining. I’d like to work on fixing any of these things that can be fixed, by writing or reviewing patches for early in kilo. Here is my off the cuff set of guidelines: #1 - is it a test failure or a setup failure This should be pretty easy to figure out. Test failures come at the end of console log and say that tests failed (after you see a bunch of passing tempest tests). Always start at *the end* of files and work backwards. That’s
Re: [openstack-dev] [nova] Is the BP approval process broken?
On Thu, Aug 28, 2014 at 2:40 AM, Daniel P. Berrange berra...@redhat.com wrote: On Thu, Aug 28, 2014 at 01:04:57AM +, Dugger, Donald D wrote: I'll try and not whine about my pet project but I do think there is a problem here. For the Gantt project to split out the scheduler there is a crucial BP that needs to be implemented ( https://review.openstack.org/#/c/89893/ ) and, unfortunately, the BP has been rejected and we'll have to try again for Kilo. My question is did we do something wrong or is the process broken? Note that we originally proposed the BP on 4/23/14, went through 10 iterations to the final version on 7/25/14 and the final version got three +1s and a +2 by 8/5. Unfortunately, even after reaching out to specific people, we didn't get the second +2, hence the rejection. I see at that it did not even get one +2 at the time of the feature proposal approval freeze. You then successfully requested an exception and after a couple more minor updates got a +2 from John but from no one else. I do think this shows a flaw in our (core teams) handling of the blueprint. When we agreed upon the freeze exception, that should have included a firm commitment for a least 2 core devs to review it. IOW I think it is reasonable to say that either your feature should have ended up with two +2s and +A, or you should have seen a -1 from another core dev. I don't think it is acceptable that after the exception was approved it only got feedback from one core dev. I actually thought that when approving exceptions, we always got 2 cores to agree to review the item to avoid this, so I'm not sure why we failed here. I understand that reviews are a burden and very hard but it seems wrong that a BP with multiple positive reviews and no negative reviews is dropped because of what looks like indifference. Given that there is still time to review the actual code patches it seems like there should be a simpler way to get a BP approved. Without an approved BP it's difficult to even start the coding process. So the question is the BP approval process broken doesn't have a simple answer. There are definitely things we should change, but in this case I think the process sort of worked. The problem you hit is we just don't have enough people doing reviews. Your blueprint didn't get approved in part because the ratio of reviews needed to reviewers is off. If we don't even have enough bandwidth to approve this spec we certainly don't have enough bandwidth to review the code associated with the spec. I see 2 possibilities here: 1) This is an isolated case specific to this BP. If so, there's no need to change the procedures but I would like to know what we should be doing differently. We got a +2 review on 8/4 and then silence for 3 weeks. 2) This is a process problem that other people encounter. Maybe there are times when silence means assent. Something like a BP with multiple +1s and at least one +2 should automatically be accepted if no one reviews it 2 weeks after the +2 is given. My two thoughts are - When we approve something for exception should actively monitor progress of review to ensure it gets the neccessary attention to either approve or reject it. It makes no sense to approve an exception and then let it lie silently waiting for weeks with no attention. I'd expect that any time exceptions are approved we should babysit them and actively review their status in the weekly meeting to ensure they are followed up on. - Core reviewers should prioritize reviews of things which already have a +2 on them. I wrote about this in the context of code reviews last week, but all my points apply equally to spec reviews I believe. http://lists.openstack.org/pipermail/openstack-dev/2014-August/043657.html Also note that in Kilo the process will be slightly less heavyweight in that we're going to try allow some features changes into tree without first requiring a spec/blueprint to be written. I can't say offhand whether this particular feature would have qualifying for the lighter process, but in general by reducing need for specs for the more trivial items, we'll have more time available for review of things which do require specs. Under the proposed changes to the spec/blueprint process, this would still need a spec. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___
Re: [openstack-dev] [nova] Is the BP approval process broken?
On Thu, Aug 28, 2014 at 2:43 PM, Alan Kavanagh alan.kavan...@ericsson.com wrote: I share Donald's points here, I believe what would help is to clearly describe in the Wiki the process and workflow for the BP approval process and build in this process how to deal with discrepancies/disagreements and build timeframes for each stage and process of appeal etc. The current process would benefit from some fine tuning and helping to build safe guards and time limits/deadlines so folks can expect responses within a reasonable time and not be left waiting in the cold. This is a resource problem, the nova team simply does not have enough people doing enough reviews to make this possible. My 2cents! /Alan -Original Message- From: Dugger, Donald D [mailto:donald.d.dug...@intel.com] Sent: August-28-14 10:43 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [nova] Is the BP approval process broken? I would contend that that right there is an indication that there's a problem with the process. You submit a BP and then you have no idea of what is happening and no way of addressing any issues. If the priority is wrong I can explain why I think the priority should be higher, getting stonewalled leaves me with no idea what's wrong and no way to address any problems. I think, in general, almost everyone is more than willing to adjust proposals based upon feedback. Tell me what you think is wrong and I'll either explain why the proposal is correct or I'll change it to address the concerns. Trying to deal with silence is really hard and really frustrating. Especially given that we're not supposed to spam the mailing it's really hard to know what to do. I don't know the solution but we need to do something. More core team members would help, maybe something like an automatic timeout where BPs/patches with no negative scores and no activity for a week get flagged for special handling. I feel we need to change the process somehow. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Jay Pipes [mailto:jaypi...@gmail.com] Sent: Thursday, August 28, 2014 1:44 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [nova] Is the BP approval process broken? On 08/27/2014 09:04 PM, Dugger, Donald D wrote: I'll try and not whine about my pet project but I do think there is a problem here. For the Gantt project to split out the scheduler there is a crucial BP that needs to be implemented ( https://review.openstack.org/#/c/89893/ ) and, unfortunately, the BP has been rejected and we'll have to try again for Kilo. My question is did we do something wrong or is the process broken? Note that we originally proposed the BP on 4/23/14, went through 10 iterations to the final version on 7/25/14 and the final version got three +1s and a +2 by 8/5. Unfortunately, even after reaching out to specific people, we didn't get the second +2, hence the rejection. I understand that reviews are a burden and very hard but it seems wrong that a BP with multiple positive reviews and no negative reviews is dropped because of what looks like indifference. I would posit that this is not actually indifference. The reason that there may not have been 1 +2 from a core team member may very well have been that the core team members did not feel that the blueprint's priority was high enough to put before other work, or that the core team members did have the time to comment on the spec (due to them not feeling the blueprint had the priority to justify the time to do a full review). Note that I'm not a core drivers team member. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Kilo Specs Schedule
We just finished discussing when to open up Kilo specs at the nova meeting today [0], and Kilo specs will open right after we cut Juno RC1 (around Sept 25th [1]). Additionally, the spec template will most likely be revised. We still have a huge amount of work to do for Juno and the nova team is mostly concerned with the 50 blueprints we have up for review [2] and the 1000 open bugs [3] (186 of which have patches up for review). The RC1 timeframe is the right fit for when we can start to move our focus out to upcoming kilo items. [0] http://eavesdrop.openstack.org/meetings/nova/2014/nova.2014-08-28-21.01.log.html [1] https://wiki.openstack.org/wiki/Juno_Release_Schedule [2] https://blueprints.launchpad.net/nova/juno [3] http://54.201.139.117/nova-bugs.html best, Joe ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev