Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells – summit recap and move forward

2014-12-12 Thread Joe Gordon
On Fri, Dec 12, 2014 at 6:50 AM, Russell Bryant rbry...@redhat.com wrote:

 On 12/11/2014 12:55 PM, Andrew Laski wrote:
  Cells can handle a single API on top of globally distributed DCs.  I
  have spoken with a group that is doing exactly that.  But it requires
  that the API is a trusted part of the OpenStack deployments in those
  distributed DCs.

 And the way the rest of the components fit into that scenario is far
 from clear to me.  Do you consider this more of a if you can make it
 work, good for you, or something we should aim to be more generally
 supported over time?  Personally, I see the globally distributed
 OpenStack under a single API case much more complex, and worth
 considering out of scope for the short to medium term, at least.

 For me, this discussion boils down to ...

 1) Do we consider these use cases in scope at all?

 2) If we consider it in scope, is it enough of a priority to warrant a
 cross-OpenStack push in the near term to work on it?

 3) If yes to #2, how would we do it?  Cascading, or something built
 around cells?

 I haven't worried about #3 much, because I consider #2 or maybe even #1
 to be a show stopper here.


Agreed



 --
 Russell Bryant

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [NFV][Telco] pxe-boot

2014-12-12 Thread Joe Gordon
On Fri, Dec 12, 2014 at 1:48 AM, Pasquale Porreca 
pasquale.porr...@dektech.com.au wrote:

 From my point of view it is not advisable to base some functionalities
 of the instances on direct calls to Openstack API. This for 2 main
 reasons, the first one: if the Openstack code changes (and we know
 Openstack code does change) it will be required to change the code of
 the software running in the instance too; the second one: if in the
 future one wants to pass to another cloud infrastructure it will be more
 difficult to achieve it.



Thoughts on your two reasons:

1) What happens if OpenStack code changes?

While OpenStack is under very active development we have stable APIs,
especially around something like booting an instance. So the API call to
boot an instance with a specific image *should not* change as you upgrade
OpenStack (unless we deprecate an API, but this will be a slow multi year
process).

2) if in the future one wants to pass to another cloud infrastructure it
will be more difficult to achieve it.

Why not use something like apache jcloud to make this easier?
https://jclouds.apache.org/






 On 12/12/14 01:20, Joe Gordon wrote:
  On Wed, Dec 10, 2014 at 7:42 AM, Pasquale Porreca 
  pasquale.porr...@dektech.com.au wrote:
 
Well, one of the main reason to choose an open source product is to
 avoid
   vendor lock-in. I think it is not
   advisable to embed in the software running in an instance a call to
   OpenStack specific services.
  
  I'm sorry I don't follow the logic here, can you elaborate.
 
 

 --
 Pasquale Porreca

 DEK Technologies
 Via dei Castelli Romani, 22
 00040 Pomezia (Roma)

 Mobile +39 3394823805
 Skype paskporr

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Kilo specs review day

2014-12-11 Thread Joe Gordon
On Thu, Dec 11, 2014 at 7:30 AM, Andrew Laski andrew.la...@rackspace.com
wrote:


 On 12/10/2014 04:41 PM, Michael Still wrote:

 Hi,

 at the design summit we said that we would not approve specifications
 after the kilo-1 deadline, which is 18 December. Unfortunately, we’ve
 had a lot of specifications proposed this cycle (166 to my count), and
 haven’t kept up with the review workload.

 Therefore, I propose that Friday this week be a specs review day. We
 need to burn down the queue of specs needing review, as well as
 abandoning those which aren’t getting regular updates based on our
 review comments.

 I’d appreciate nova-specs-core doing reviews on Friday, but its always
 super helpful when non-cores review as well. A +1 for a developer or
 operator gives nova-specs-core a good signal of what might be ready to
 approve, and that helps us optimize our review time.

 For reference, the specs to review may be found at:

  https://review.openstack.org/#/q/project:openstack/nova-
 specs+status:open,n,z

 Thanks heaps,
 Michael


 It will be nice to have a good push before we hit the deadline.

 I would like to remind priority owners to update their list of any
 outstanding specs at https://etherpad.openstack.
 org/p/kilo-nova-priorities-tracking so they can be targeted during the
 review day.



In preparation, I put together a nova-specs dashboard:

https://review.openstack.org/141137

https://review.openstack.org/#/dashboard/?foreach=project%3A%5Eopenstack%2Fnova-specs+status%3Aopen+NOT+owner%3Aself+NOT+label%3AWorkflow%3C%3D-1+label%3AVerified%3E%3D1%252cjenkins+NOT+label%3ACode-Review%3E%3D-2%252cself+branch%3Amastertitle=Nova+SpecsYour+are+a+reviewer%2C+but+haven%27t+voted+in+the+current+revision=reviewer%3AselfNeeds+final+%2B2=label%3ACode-Review%3E%3D2+NOT%28reviewerin%3Anova-specs-core+label%3ACode-Review%3C%3D-1%29+limit%3A100Passed+Jenkins%2C+Positive+Nova-Core+Feedback=NOT+label%3ACode-Review%3E%3D2+%28reviewerin%3Anova-core+label%3ACode-Review%3E%3D1%29+NOT%28reviewerin%3Anova-core+label%3ACode-Review%3C%3D-1%29+limit%3A100Passed+Jenkins%2C+No+Positive+Nova-Core+Feedback%2C+No+Negative+Feedback=NOT+label%3ACode-Review%3C%3D-1+NOT+label%3ACode-Review%3E%3D2+NOT%28reviewerin%3Anova-core+label%3ACode-Review%3E%3D1%29+limit%3A100Wayward+Changes+%28Changes+with+no+code+review+in+the+last+7+days%29=NOT+label%3ACode-Review%3C%3D2+age%3A7dSome+negative+feedback%2C+might+still+be+worth+commenting=label%3ACode-Review%3D-1+NOT+label%3ACode-Review%3D-2+limit%3A100Dead+Specs=label%3ACode-Review%3C%3D-2




 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells - summit recap and move forward

2014-12-11 Thread Joe Gordon
On Thu, Dec 11, 2014 at 1:02 AM, joehuang joehu...@huawei.com wrote:

 Hello, Russell,

 Many thanks for your reply. See inline comments.

 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: Thursday, December 11, 2014 5:22 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells - summit
 recap and move forward

  On Fri, Dec 5, 2014 at 8:23 AM, joehuang joehu...@huawei.com wrote:
  Dear all  TC  PTL,
 
  In the 40 minutes cross-project summit session Approaches for
  scaling out[1], almost 100 peoples attended the meeting, and the
  conclusion is that cells can not cover the use cases and
  requirements which the OpenStack cascading solution[2] aim to
  address, the background including use cases and requirements is also
  described in the mail.

 I must admit that this was not the reaction I came away with the
 discussion with.
 There was a lot of confusion, and as we started looking closer, many (or
 perhaps most)
 people speaking up in the room did not agree that the requirements being
 stated are
 things we want to try to satisfy.

 [joehuang] Could you pls. confirm your opinion: 1) cells can not cover the
 use cases and requirements which the OpenStack cascading solution aim to
 address. 2) Need further discussion whether to satisfy the use cases and
 requirements.

 On 12/05/2014 06:47 PM, joehuang wrote:
  Hello, Davanum,
 
  Thanks for your reply.
 
  Cells can't meet the demand for the use cases and requirements
 described in the mail.

 You're right that cells doesn't solve all of the requirements you're
 discussing.
 Cells addresses scale in a region.  My impression from the summit session
  and other discussions is that the scale issues addressed by cells are
 considered
  a priority, while the global API bits are not.

 [joehuang] Agree cells is in the first class priority.

  1. Use cases
  a). Vodafone use case[4](OpenStack summit speech video from 9'02
  to 12'30 ), establishing globally addressable tenants which result
  in efficient services deployment.

  Keystone has been working on federated identity.
 That part makes sense, and is already well under way.

 [joehuang] The major challenge for VDF use case is cross OpenStack
 networking for tenants. The tenant's VM/Volume may be allocated in
 different data centers geographically, but virtual network
 (L2/L3/FW/VPN/LB) should be built for each tenant automatically and
 isolated between tenants. Keystone federation can help authorization
 automation, but the cross OpenStack network automation challenge is still
 there.
 Using prosperity orchestration layer can solve the automation issue, but
 VDF don't like prosperity API in the north-bound, because no ecosystem is
 available. And other issues, for example, how to distribute image, also
 cannot be solved by Keystone federation.

  b). Telefonica use case[5], create virtual DC( data center) cross
  multiple physical DCs with seamless experience.

 If we're talking about multiple DCs that are effectively local to each
 other
 with high bandwidth and low latency, that's one conversation.
 My impression is that you want to provide a single OpenStack API on top of
 globally distributed DCs.  I honestly don't see that as a problem we
 should
 be trying to tackle.  I'd rather continue to focus on making OpenStack
 work
 *really* well split into regions.
  I think some people are trying to use cells in a geographically
 distributed way,
  as well.  I'm not sure that's a well understood or supported thing,
 though.
  Perhaps the folks working on the new version of cells can comment
 further.

 [joehuang] 1) Splited region way cannot provide cross OpenStack networking
 automation for tenant. 2) exactly, the motivation for cascading is single
 OpenStack API on top of globally distributed DCs. Of course, cascading can
 also be used for DCs close to each other with high bandwidth and low
 latency. 3) Folks comment from cells are welcome.
 .

  c). ETSI NFV use cases[6], especially use case #1, #2, #3, #5, #6,
  8#. For NFV cloud, it's in nature the cloud will be distributed but
  inter-connected in many data centers.

 I'm afraid I don't understand this one.  In many conversations about NFV,
 I haven't heard this before.

 [joehuang] This is the ETSI requirement and use cases specification for
 NFV. ETSI is the home of the Industry Specification Group for NFV. In
 Figure 14 (virtualization of EPC) of this document, you can see that the
 operator's  cloud including many data centers to provide connection service
 to end user by inter-connected VNFs. The requirements listed in (
 https://wiki.openstack.org/wiki/TelcoWorkingGroup) is mainly about the
 requirements from specific VNF(like IMS, SBC, MME, HSS, S/P GW etc) to run
 over cloud, eg. migrate the traditional telco. APP from prosperity hardware
 to cloud. Not all NFV requirements have been covered yet. Forgive me there
 are so many telco terms here.

 
  

Re: [openstack-dev] [NFV][Telco] pxe-boot

2014-12-11 Thread Joe Gordon
On Wed, Dec 10, 2014 at 7:42 AM, Pasquale Porreca 
pasquale.porr...@dektech.com.au wrote:

  Well, one of the main reason to choose an open source product is to avoid
 vendor lock-in. I think it is not
 advisable to embed in the software running in an instance a call to
 OpenStack specific services.


I'm sorry I don't follow the logic here, can you elaborate.



 On 12/10/14 00:20, Joe Gordon wrote:


 On Wed, Dec 3, 2014 at 1:16 AM, Pasquale Porreca 
 pasquale.porr...@dektech.com.au wrote:

 The use case we were thinking about is a Network Function (e.g. IMS
 Nodes) implementation in which the high availability is based on OpenSAF.
 In this scenario there is an Active/Standby cluster of 2 System Controllers
 (SC) plus several Payloads (PL) that boot from network, controlled by the
 SC. The logic of which service to deploy on each payload is inside the SC.

 In OpenStack both SCs and PLs will be instances running in the cloud,
 anyway the PLs should still boot from network under the control of the SC.
 In fact to use Glance to store the image for the PLs and keep the control
 of the PLs in the SC, the SC should trigger the boot of the PLs with
 requests to Nova/Glance, but an application running inside an instance
 should not directly interact with a cloud infrastructure service like
 Glance or Nova.


  Why not? This is a fairly common practice.


 --
 Pasquale Porreca

 DEK Technologies
 Via dei Castelli Romani, 22
 00040 Pomezia (Roma)

 Mobile +39 3394823805
 Skype paskporr


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells - summit recap and move forward

2014-12-11 Thread Joe Gordon
On Thu, Dec 11, 2014 at 6:25 PM, joehuang joehu...@huawei.com wrote:

  Hello, Joe



 Thank you for your good question.



 Question:

 How would something like flavors work across multiple vendors. The
 OpenStack API doesn't have any hard coded names and sizes for flavors. So a
 flavor such as m1.tiny may actually be very different vendor to vendor.



 Answer:

 The flavor is defined by Cloud Operator from the cascading OpenStack. And
 Nova-proxy ( which is the driver for “Nova as hypervisor” ) will sync the
 flavor to the cascaded OpenStack when it was first used in the cascaded
 OpenStack. If flavor was changed before a new VM is booted, the changed
 flavor will also be updated to the cascaded OpenStack just before the new
 VM booted request. Through this synchronization mechanism, all flavor used
 in multi-vendor’s cascaded OpenStack will be kept the same as what used in
 the cascading level, provide a consistent view for flavor.


I don't think this is sufficient. If the underlying hardware the between
multiple vendors is different setting the same values for a flavor will
result in different performance characteristics.  For example, nova allows
for setting VCPUs, but nova doesn't provide an easy way to define how
powerful a VCPU is.   Also flavors are commonly hardware dependent, take
what rackspace offers:

http://www.rackspace.com/cloud/public-pricing#cloud-servers

Rackspace has I/O Optimized flavors

* High-performance, RAID 10-protected SSD storage
* Option of booting from Cloud Block Storage (additional charges apply for
Cloud Block Storage)
* Redundant 10-Gigabit networking
* Disk I/O scales with the number of data disks up to ~80,000 4K random
read IOPS and ~70,000 4K random write IOPS.*

How would cascading support something like this?




 Best Regards



 Chaoyi Huang ( joehuang )



 *From:* Joe Gordon [mailto:joe.gord...@gmail.com]
 *Sent:* Friday, December 12, 2014 8:17 AM
 *To:* OpenStack Development Mailing List (not for usage questions)

 *Subject:* Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells -
 summit recap and move forward







 On Thu, Dec 11, 2014 at 1:02 AM, joehuang joehu...@huawei.com wrote:

 Hello, Russell,

 Many thanks for your reply. See inline comments.

 -Original Message-
 From: Russell Bryant [mailto:rbry...@redhat.com]
 Sent: Thursday, December 11, 2014 5:22 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [all] [tc] [PTL] Cascading vs. Cells – summit
 recap and move forward

  On Fri, Dec 5, 2014 at 8:23 AM, joehuang joehu...@huawei.com wrote:
  Dear all  TC  PTL,
 
  In the 40 minutes cross-project summit session “Approaches for
  scaling out”[1], almost 100 peoples attended the meeting, and the
  conclusion is that cells can not cover the use cases and
  requirements which the OpenStack cascading solution[2] aim to
  address, the background including use cases and requirements is also
  described in the mail.

 I must admit that this was not the reaction I came away with the
 discussion with.
 There was a lot of confusion, and as we started looking closer, many (or
 perhaps most)
 people speaking up in the room did not agree that the requirements being
 stated are
 things we want to try to satisfy.

 [joehuang] Could you pls. confirm your opinion: 1) cells can not cover the
 use cases and requirements which the OpenStack cascading solution aim to
 address. 2) Need further discussion whether to satisfy the use cases and
 requirements.

 On 12/05/2014 06:47 PM, joehuang wrote:
  Hello, Davanum,
 
  Thanks for your reply.
 
  Cells can't meet the demand for the use cases and requirements
 described in the mail.

 You're right that cells doesn't solve all of the requirements you're
 discussing.
 Cells addresses scale in a region.  My impression from the summit session
  and other discussions is that the scale issues addressed by cells are
 considered
  a priority, while the global API bits are not.

 [joehuang] Agree cells is in the first class priority.

  1. Use cases
  a). Vodafone use case[4](OpenStack summit speech video from 9'02
  to 12'30 ), establishing globally addressable tenants which result
  in efficient services deployment.

  Keystone has been working on federated identity.
 That part makes sense, and is already well under way.

 [joehuang] The major challenge for VDF use case is cross OpenStack
 networking for tenants. The tenant's VM/Volume may be allocated in
 different data centers geographically, but virtual network
 (L2/L3/FW/VPN/LB) should be built for each tenant automatically and
 isolated between tenants. Keystone federation can help authorization
 automation, but the cross OpenStack network automation challenge is still
 there.
 Using prosperity orchestration layer can solve the automation issue, but
 VDF don't like prosperity API in the north-bound, because no ecosystem is
 available. And other issues, for example, how to distribute image, also
 cannot be solved by Keystone federation.

  b

Re: [openstack-dev] [nova] Kilo specs review day

2014-12-10 Thread Joe Gordon
On Wed, Dec 10, 2014 at 1:41 PM, Michael Still mi...@stillhq.com wrote:

 Hi,

 at the design summit we said that we would not approve specifications
 after the kilo-1 deadline, which is 18 December. Unfortunately, we’ve
 had a lot of specifications proposed this cycle (166 to my count), and
 haven’t kept up with the review workload.

 Therefore, I propose that Friday this week be a specs review day. We
 need to burn down the queue of specs needing review, as well as
 abandoning those which aren’t getting regular updates based on our
 review comments.

 I’d appreciate nova-specs-core doing reviews on Friday, but its always
 super helpful when non-cores review as well. A +1 for a developer or
 operator gives nova-specs-core a good signal of what might be ready to
 approve, and that helps us optimize our review time.

 For reference, the specs to review may be found at:


 https://review.openstack.org/#/q/project:openstack/nova-specs+status:open,n,z


++, count me in!




 Thanks heaps,
 Michael

 --
 Rackspace Australia

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [hacking] hacking package upgrade dependency with setuptools

2014-12-09 Thread Joe Gordon
On Tue, Dec 9, 2014 at 11:05 AM, Surojit Pathak suro.p...@gmail.com wrote:

 Hi all,

 On a RHEL system, as I upgrade hacking package from 0.8.0 to 0.9.5, I see
 flake8 stops working. Upgrading setuptools resolves the issue. But I do not
 see a change in version for pep8 or setuptools, with the upgrade in
 setuptools.

 Any issue in packaging? Any explanation of this behavior?

 Snippet -
 [suro@poweredsoured ~]$ pip list | grep hacking
 hacking (0.8.0)
 [suro@poweredsoured ~]$
 [suro@poweredsoured app]$ sudo pip install hacking==0.9.5
 ... Successful installation
 [suro@poweredsoured app]$ flake8 neutron/
 ...
   File /usr/lib/python2.6/site-packages/pkg_resources.py, line 546, in
 resolve
 raise DistributionNotFound(req)
 pkg_resources.DistributionNotFound: pep8=1.4.6
 [suro@poweredsoured app]$ pip list | grep pep8
 pep8 (1.5.6)
 [suro@poweredsoured app]$ pip list | grep setuptools
 setuptools (0.6c11)
 [suro@poweredsoured app]$ sudo pip install -U setuptools
 ...
 Successfully installed setuptools
 Cleaning up...
 [suro@poweredsoured app]$ pip list | grep pep8
 pep8 (1.5.6)
 [suro@poweredsoured app]$ pip list | grep setuptools
 setuptools (0.6c11)
 [suro@poweredsoured app]$ flake8 neutron/
 [suro@poweredsoured app]$


Could this be pbr related?

-pbr=0.5.21,1.0
+pbr=0.6,!=0.7,1.0





 --
 Regards,
 Surojit Pathak


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa] How to delete a VM which is in ERROR state?

2014-12-09 Thread Joe Gordon
On Sat, Dec 6, 2014 at 5:08 PM, Danny Choi (dannchoi) dannc...@cisco.com
wrote:

  Hi,

  I have a VM which is in ERROR state.


 +--+--+++-++

 | ID   | Name
 | Status | Task State | Power State | Networks   |


 +--+--+++-++

 | 1cb5bf96-619c-4174-baae-dd0d8c3d40c5 |
 cirros--1cb5bf96-619c-4174-baae-dd0d8c3d40c5 | ERROR  | -  |
 NOSTATE ||

  I tried in both CLI “nova delete” and Horizon “terminate instance”.
 Both accepted the delete command without any error.
 However, the VM never got deleted.

  Is there a way to remove the VM?


What version of nova are you using? This is definitely a serious bug, you
should be able to delete an instance in error state. Can you file a bug
that includes steps on how to reproduce the bug along with all relevant
logs.

bugs.launchpad.net/nova



  Thanks,
 Danny

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [keystone][all] Max Complexity Check Considered Harmful

2014-12-09 Thread Joe Gordon
On Mon, Dec 8, 2014 at 5:03 PM, Brant Knudson b...@acm.org wrote:


 Not too long ago projects added a maximum complexity check to tox.ini, for
 example keystone has max-complexity=24. Seemed like a good idea at the
 time, but in a recent attempt to lower the maximum complexity check in
 keystone[1][2], I found that the maximum complexity check can actually lead
 to less understandable code. This is because the check includes an embedded
 function's complexity in the function that it's in.


This behavior is expected.

Nested functions cannot be unit tested on there own.  Part of the issue is
that nested functions can access variables scoped to the outer function, so
the following function is valid:

 def outer():
var = outer
def inner():
print var
inner()


Because nested functions cannot easily be unit tested, and can be harder to
reason about since they can access variables that are part of the outer
function, I don't think they are easier to understand (there are still
cases where a nested function makes sense though).


 The way I would have lowered the complexity of the function in keystone is
 to extract the complex part into a new function. This can make the existing
 function much easier to understand for all the reasons that one defines a
 function for code. Since this new function is obviously only called from
 the function it's currently in, it makes sense to keep the new function
 inside the existing function. It's simpler to think about an embedded
 function because then you know it's only called from one place. The problem
 is, because of the existing complexity check behavior, this doesn't lower
 the complexity according to the complexity check, so you wind up putting
 the function as a new top-level, and now a reader is has to assume that the
 function could be called from anywhere and has to be much more cautious
 about changes to the function.


 Since the complexity check can lead to code that's harder to understand,
 it must be considered harmful and should be removed, at least until the
 incorrect behavior is corrected.


Why do you think the max complexity check is harmful? because it prevents
large amounts of nested functions?




 [1] https://review.openstack.org/#/c/139835/
 [2] https://review.openstack.org/#/c/139836/
 [3] https://review.openstack.org/#/c/140188/

 - Brant


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][docker][containers][qa] nova-docker CI failing a lot on unrelated nova patches

2014-12-09 Thread Joe Gordon
On Fri, Dec 5, 2014 at 11:43 AM, Ian Main im...@redhat.com wrote:

 Sean Dague wrote:
  On 12/04/2014 05:38 PM, Matt Riedemann wrote:
  
  
   On 12/4/2014 4:06 PM, Michael Still wrote:
   +Eric and Ian
  
   On Fri, Dec 5, 2014 at 8:31 AM, Matt Riedemann
   mrie...@linux.vnet.ibm.com wrote:
   This came up in the nova meeting today, I've opened a bug [1] for it.
   Since
   this isn't maintained by infra we don't have log indexing so I can't
 use
   logstash to see how pervasive it us, but multiple people are
   reporting the
   same thing in IRC.
  
   Who is maintaining the nova-docker CI and can look at this?
  
   It also looks like the log format for the nova-docker CI is a bit
   weird, can
   that be cleaned up to be more consistent with other CI log results?
  
   [1] https://bugs.launchpad.net/nova-docker/+bug/1399443
  
   --
  
   Thanks,
  
   Matt Riedemann
  
  
   ___
   OpenStack-dev mailing list
   OpenStack-dev@lists.openstack.org
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
  
  
  
   Also, according to the 3rd party CI requirements [1] I should see
   nova-docker CI in the third party wiki page [2] so I can get details on
   who to contact when this fails but that's not done.
  
   [1] http://ci.openstack.org/third_party.html#requirements
   [2] https://wiki.openstack.org/wiki/ThirdPartySystems
 
  It's not the 3rd party CI job we are talking about, it's the one in the
  check queue which is run by infra.
 
  But, more importantly, jobs in those queues need shepards that will fix
  them. Otherwise they will get deleted.
 
  Clarkb provided the fix for the log structure right now -
  https://review.openstack.org/#/c/139237/1 so at least it will look
  vaguely sane on failures
 
-Sean

 This is one of the reasons we might like to have this in nova core.
 Otherwise
 we will just keep addressing issues as they come up.  We would likely be
 involved doing this if it were part of nova core anyway.


While gating on nova-docker will prevent patches that cause nova-docker to
break 100% to land, it won't do a lot to prevent transient failures. To fix
those we need people dedicated to making sure nova-docker is working.



 Ian

  --
  Sean Dague
  http://dague.net
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [NFV][Telco] pxe-boot

2014-12-09 Thread Joe Gordon
On Wed, Dec 3, 2014 at 1:16 AM, Pasquale Porreca 
pasquale.porr...@dektech.com.au wrote:

 The use case we were thinking about is a Network Function (e.g. IMS Nodes)
 implementation in which the high availability is based on OpenSAF. In this
 scenario there is an Active/Standby cluster of 2 System Controllers (SC)
 plus several Payloads (PL) that boot from network, controlled by the SC.
 The logic of which service to deploy on each payload is inside the SC.

 In OpenStack both SCs and PLs will be instances running in the cloud,
 anyway the PLs should still boot from network under the control of the SC.
 In fact to use Glance to store the image for the PLs and keep the control
 of the PLs in the SC, the SC should trigger the boot of the PLs with
 requests to Nova/Glance, but an application running inside an instance
 should not directly interact with a cloud infrastructure service like
 Glance or Nova.


Why not? This is a fairly common practice.



 We know that it is yet possible to achieve network booting in OpenStack
 using an image stored in Glance that acts like a pxe client, anyway this
 workaround has some drawbacks, mainly due to the fact it won't be possible
 to choose the specific virtual NIC on which the network boot will happen,
 causing DHCP requests to flow on networks where they don't belong to and
 possible delays in the boot of the instances.


 On 11/27/14 00:32, Steve Gordon wrote:

 - Original Message -

 From: Angelo Matarazzo angelo.matara...@dektech.com.au
 To: OpenStack Development Mailing openstack-dev@lists.openstack.org,
 openstack-operat...@lists.openstack.org


 Hi all,
 my team and I are working on pxe boot feature very similar to the
 Discless VM one  in Active blueprint list[1]
 The blueprint [2] is no longer active and we created a new spec [3][4].

 Nova core reviewers commented our spec and the first and the most
 important objection is that there is not a compelling reason to
 provide this kind of feature : booting from network.

 Aside from the specific implementation, I think that some members of
 TelcoWorkingGroup could be interested in  and provide a use case.
 I would also like to add this item to the agenda of next meeting

 Any thought?

 We did discuss this today, and granted it is listed as a blueprint
 someone in the group had expressed interest in at a point in time - though
 I don't believe any further work was done. The general feeling was that
 there isn't anything really NFV or Telco specific about this over and above
 the more generic use case of legacy applications. Are you able to further
 elaborate on the reason it's NFV or Telco specific other than because of
 who is requesting it in this instance?

 Thanks!

 -Steve

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 --
 Pasquale Porreca

 DEK Technologies
 Via dei Castelli Romani, 22
 00040 Pomezia (Roma)

 Mobile +39 3394823805
 Skype paskporr



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][docker][containers][qa] nova-docker CI failing a lot on unrelated nova patches

2014-12-09 Thread Joe Gordon
On Tue, Dec 9, 2014 at 3:18 PM, Eric Windisch e...@windisch.us wrote:


 While gating on nova-docker will prevent patches that cause nova-docker
 to break 100% to land, it won't do a lot to prevent transient failures. To
 fix those we need people dedicated to making sure nova-docker is working.



 What would be helpful for me is a way to know that our tests are breaking
 without manually checking Kibana, such as an email.



There is also graphite [0], but since the docker-job is running on the
check queue the data we are producing is very dirty. Since check jobs often
run on broken patches.

[0]
http://graphite.openstack.org/render/?from=-10daysheight=500until=nowwidth=1200bgcolor=fffgcolor=00yMax=100yMin=0target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.check-tempest-dsvm-docker.FAILURE,sum(stats.zuul.pipeline.check.job.check-tempest-dsvm-docker.{SUCCESS,FAILURE})),%2736hours%27),%20%27check-tempest-dsvm-docker%27),%27orange%27)title=Docker%20Failure%20Rates%20(10%20days)_t=0.3702208176255226



 Regards,
 Eric Windisch


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Neutron][NFV][Third-party] CI for NUMA, SR-IOV, and other features that can't be tested on current infra.

2014-12-09 Thread Joe Gordon
On Tue, Nov 25, 2014 at 2:01 PM, Steve Gordon sgor...@redhat.com wrote:

 - Original Message -
  From: Daniel P. Berrange berra...@redhat.com
  To: Dan Smith d...@danplanet.com
 
  On Thu, Nov 13, 2014 at 05:43:14PM +, Daniel P. Berrange wrote:
   On Thu, Nov 13, 2014 at 09:36:18AM -0800, Dan Smith wrote:
 That sounds like something worth exploring at least, I didn't know
 about that kernel build option until now :-) It sounds like it
 ought
 to be enough to let us test the NUMA topology handling, CPU pinning
 and probably huge pages too.
   
Okay. I've been vaguely referring to this as a potential test vector,
but only just now looked up the details. That's my bad :)
   
 The main gap I'd see is NUMA aware PCI device assignment since the
 PCI - NUMA node mapping data comes from the BIOS and it does not
 look like this is fakeable as is.
   
Yeah, although I'd expect that the data is parsed and returned by a
library or utility that may be a hook for fakeification. However, it
 may
very well be more trouble than it's worth.
   
I still feel like we should be able to test generic PCI in a similar
 way
(passing something like a USB controller through to the guest, etc).
However, I'm willing to believe that the intersection of PCI and
 NUMA is
a higher order complication :)
  
   Oh I forgot to mention with PCI device assignment (as well as having a
   bunch of PCI devices available[1]), the key requirement is an IOMMU.
   AFAIK, neither Xen or KVM provide any IOMMU emulation, so I think we're
   out of luck for even basic PCI assignment testing inside VMs.
 
  Ok, turns out that wasn't entirely accurate in general.
 
  KVM *can* emulate an IOMMU, but it requires that the guest be booted
  with the q35 machine type, instead of the ancient PIIX4 machine type,
  and also QEMU must be launched with -machine iommu=on. We can't do
  this in Nova, so although it is theoretically possible, it is not
  doable for us in reality :-(
 
  Regards,
  Daniel

 Is it worth still pursuing virtual testing of the NUMA awareness work you,
 nikola, and others have been doing? It seems to me it would still be
 preferable to do this virtually (and ideally in the gate) wherever possible?


The more we can test in the gate and without real hardware the better.



 Thanks,

 Steve

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] policy on old / virtually abandoned patches

2014-12-05 Thread Joe Gordon
On Dec 5, 2014 7:07 AM, Daniel P. Berrange berra...@redhat.com wrote:

 On Tue, Nov 18, 2014 at 07:06:59AM -0500, Sean Dague wrote:
  Nova currently has 197 patches that have seen no activity in the last 4
  weeks (project:openstack/nova age:4weeks status:open).

 On a somewhat related note, nova-specs currently has 17 specs
 open against specs/juno, most with -2 votes. I think we should
 just mass-abandon anything still touching the specs/juno directory.
 If people cared about them still they would have submitted for
 specs/kilo.

 So any objection to killing everything in the list below:

+1, makes sense to me.



+-+---+--+---+-+--+
 | URL | Subject
 | Created  | Tests | Reviews | Workflow |

+-+---+--+---+-+--+
 | https://review.openstack.org/86938  | Add tasks to the v3 API
 | 237 days |  1| -2  |  |
 | https://review.openstack.org/88334  | Add support for USB controller
| 231 days |  1| -2  |  |
 | https://review.openstack.org/89766  | Add useful metrics into
utilization based scheduli... | 226 days |  1| -2  |  |
 | https://review.openstack.org/90239  | Blueprint for Cinder Multi attach
volumes | 224 days |  1| -2  |  |
 | https://review.openstack.org/90647  | Add utilization based weighers on
top of MetricsWe... | 221 days |  1| -2  |  |
 | https://review.openstack.org/96543  | Smart Scheduler (Solver
Scheduler) - Constraint ba... | 189 days |  1| -2  |  |
 | https://review.openstack.org/97441  | Add nova spec for
bp/isnot-operator   | 185 days |  1| -2  |
|
 | https://review.openstack.org/99476  | Dedicate aggregates for specific
tenants  | 176 days |  1| -2  |  |
 | https://review.openstack.org/99576  | Add client token to CreateServer
| 176 days |  1| -2  |  |
 | https://review.openstack.org/101921 | Spec for Neutron migration
feature| 164 days |  1| -2  |  |
 | https://review.openstack.org/103617 | Support Identity V3 API
 | 157 days |  1| -1  |  |
 | https://review.openstack.org/105385 | Leverage the features of IBM GPFS
to store cached ... | 150 days |  1| -2  |  |
 | https://review.openstack.org/108582 | Add ironic boot mode filters
| 136 days |  1| -2  |  |
 | https://review.openstack.org/110639 | Blueprint for the implementation
of Nested Quota D... | 127 days |  1| -2  |  |
 | https://review.openstack.org/111308 | Added VirtProperties object
blueprint | 125 days |  1| -2  |  |
 | https://review.openstack.org/111745 | Improve instance boot time
| 122 days |  1| -2  |  |
 | https://review.openstack.org/116280 | Add a new filter to implement
project isolation fe... | 104 days |  1| -2  |  |

+-+---+--+---+-+--+


 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
:|
 |: http://libvirt.org  -o- http://virt-manager.org
:|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
:|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
:|

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Spring cleaning nova-core

2014-12-05 Thread Joe Gordon
On Fri, Dec 5, 2014 at 4:39 PM, Russell Bryant rbry...@redhat.com wrote:

 On 12/05/2014 08:41 AM, Daniel P. Berrange wrote:
  On Fri, Dec 05, 2014 at 11:05:28AM +1100, Michael Still wrote:
  One of the things that happens over time is that some of our core
  reviewers move on to other projects. This is a normal and healthy
  thing, especially as nova continues to spin out projects into other
  parts of OpenStack.
 
  However, it is important that our core reviewers be active, as it
  keeps them up to date with the current ways we approach development in
  Nova. I am therefore removing some no longer sufficiently active cores
  from the nova-core group.
 
  I’d like to thank the following people for their contributions over the
 years:
 
  * cbehrens: Chris Behrens
  * vishvananda: Vishvananda Ishaya
  * dan-prince: Dan Prince
  * belliott: Brian Elliott
  * p-draigbrady: Padraig Brady
 
  I’d love to see any of these cores return if they find their available
  time for code reviews increases.
 
  What stats did you use to decide whether to cull these reviewers ?
 Looking
  at the stats over a 6 month period, I think Padraig Brady is still having
  a significant positive impact on Nova - on a par with both cerberus and
  alaski who you've not proposing for cut. I think we should keep Padraig
  on the team, but probably suggest cutting Markmc instead
 
http://russellbryant.net/openstack-stats/nova-reviewers-180.txt
 
 
 +-+++
  |   Reviewer  | Reviews   -2  -1  +1  +2  +A+/- %  |
 Disagreements* |
 
 +-+++
  | berrange ** |1766   26 435  12 1293 35773.9%
 |  157 (  8.9%)  |
  | jaypipes ** |1359   11 378 436 534 13371.4%
 |  109 (  8.0%)  |
  |   jogo **   |1053  131 326   7 589 35356.6%
 |   47 (  4.5%)  |
  |   danms **  | 921   67 381  23 450 16751.4%
 |   32 (  3.5%)  |
  |  oomichi ** | 8894 306  55 524 18265.1%
 |   40 (  4.5%)  |
  |johngarbutt **   | 808  319 227  10 252 14532.4%
 |   37 (  4.6%)  |
  |  mriedem ** | 642   27 279  25 311 13652.3%
 |   17 (  2.6%)  |
  |  klmitch ** | 6061  90   2 513  7085.0%
 |   67 ( 11.1%)  |
  | ndipanov ** | 588   19 179  10 380 11366.3%
 |   62 ( 10.5%)  |
  |mikalstill **| 564   31  34   3 496 20788.5%
 |   20 (  3.5%)  |
  |  cyeoh-0 ** | 546   12 207  30 297 10359.9%
 |   35 (  6.4%)  |
  |  sdague **  | 511   23  89   6 393 22978.1%
 |   25 (  4.9%)  |
  | russellb ** | 4656  83   0 376 15880.9%
 |   23 (  4.9%)  |
  |  alaski **  | 4151  65  21 328 14984.1%
 |   24 (  5.8%)  |
  | cerberus ** | 4056  25  48 326 10292.3%
 |   33 (  8.1%)  |
  |   p-draigbrady **   | 3762  40   9 325  6488.8%
 |   49 ( 13.0%)  |
  |  markmc **  | 2432  54   3 184  6977.0%
 |   14 (  5.8%)  |
  | belliott ** | 2311  68   5 157  3570.1%
 |   19 (  8.2%)  |
  |dan-prince **| 1782  48   9 119  2971.9%
 |   11 (  6.2%)  |
  | cbehrens ** | 1322  49   2  79  1961.4%
 |6 (  4.5%)  |
  |vishvananda **   |  540   5   3  46  1590.7%
 |5 (  9.3%)  |
 

 Yeah, I was pretty surprised to see pbrady on this list, as well.  The
 above was 6 months, but even if you drop it to the most recent 3 months,
 he's still active ...


As you are more then aware of, our policy for removing people from core is
to leave that up the the PTL (I believe you wrote that) [0]. And I don't
think numbers alone are a good metric for sorting out who to remove.  That
being said no matter what happens, with our fast track policy, if pbrady is
dropped it shouldn't be hard to re-add him.


[0]
https://wiki.openstack.org/wiki/Nova/CoreTeam#Adding_or_Removing_Members
https://wiki.openstack.org/wiki/Nova/CoreTeam#Adding_or_Removing_Members




  Reviews for the last 90 days in nova
  ** -- nova-core team member
 
 +-+---++
  |   Reviewer  | Reviews   -2  -1  +1  +2  +A+/- % |
 Disagreements* |
 
 +-+---++
  | berrange ** | 708   13 145   1 549 20077.7% |
  47 (  6.6%)  |
  |   jogo **   | 594   40 218   4 332 17456.6% |
  27 (  4.5%)  |
  | jaypipes ** | 509   10 180  17 302  7762.7% |
  33 (  6.5%)  |
  |  oomichi ** | 3921 136  

Re: [openstack-dev] [Nova] Spring cleaning nova-core

2014-12-05 Thread Joe Gordon
On Dec 5, 2014 11:39 AM, Russell Bryant rbry...@redhat.com wrote:

 On 12/05/2014 11:23 AM, Joe Gordon wrote:
  As you are more then aware of, our policy for removing people from core
  is to leave that up the the PTL (I believe you wrote that) [0]. And I
  don't think numbers alone are a good metric for sorting out who to
  remove.  That being said no matter what happens, with our fast track
  policy, if pbrady is dropped it shouldn't be hard to re-add him.

 Yes, I'm aware of and not questioning the policy.  Usually drops are
 pretty obvious.  This one wasn't.  It seems reasonable to discuss.
 Maybe we don't have a common set of expectations.  Anyway, I'll follow
 up in private.


Agreed

 --
 Russell Bryant

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Cinder] Operations: adding new nodes in disabled state, allowed for test tenant only

2014-12-04 Thread Joe Gordon
On Wed, Dec 3, 2014 at 3:31 PM, Mike Scherbakov mscherba...@mirantis.com
wrote:

 Hi all,
 enable_new_services in nova.conf seems to allow add new compute nodes in
 disabled state:

 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L507-L508,
 so it would allow to check everything first, before allowing production
 workloads host a VM on it. I've filed a bug to Fuel to use this by default
 when we scale up the env (add more computes) [1].

 A few questions:

1. can we somehow enable compute service for test tenant first? So
cloud administrator would be able to run test VMs on the node, and after
ensuring that everything is fine - to enable service for all tenants


Although there may be more then one way to set this up in nova, this can
definitely be done via nova host aggregates. Put new compute services into
an aggregate that only specific tenants can access (controlled via
 scheduler filter).



1.
2. What about Cinder? Is there a similar option / ability?
3. What about other OpenStack projects?

 What is your opinion, how we should approach the problem (if there is a
 problem)?

 [1] https://bugs.launchpad.net/fuel/+bug/1398817
 --
 Mike Scherbakov
 #mihgen


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Kilo Project Priorities

2014-12-04 Thread Joe Gordon
Hi all,

Note: Cross posting with operators


After a long double slot summit session, the nova team has come up with its
list of efforts to prioritize for Kilo:
http://specs.openstack.org/openstack/nova-specs/priorities/kilo-priorities.html

What does this mean?

* This is a list of items we think are important to accomplish for Kilo
* We are trying to prioritize work that fits under those categories.
* If you would like to help with one of those priorities, please contact
the owner.


thoughts, comments and feedback are appreciated.

best,
Joe Gordon

Summit etherpad: https://etherpad.openstack.org/p/kilo-nova-priorities
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Can the Kilo nova controller conduct the Juno compute nodes

2014-12-03 Thread Joe Gordon
On Wed, Dec 3, 2014 at 11:09 AM, Li Junhong lijh.h...@gmail.com wrote:

 Hi All,

 Is it possible for Kilo nova controller to control the Juno compute nodes?
 Is this scenario supported naturally by the nova mechanism in the design
 and codes level?


Yes,

We gate on making sure we can run Kilo nova with Juno compute nodes.




 --
 Best Regards!
 
 Junhong, Li

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Can the Kilo nova controller conduct the Juno compute nodes

2014-12-03 Thread Joe Gordon
On Wed, Dec 3, 2014 at 11:53 AM, Li Junhong lijh.h...@gmail.com wrote:

 Hi Joe,

 Just want to confirm one more question, in the gate testing, is the
 neutron/cinder/glance Kilo or Juno. Or in another word, is the controller
 in gate testing an all-in-one controller?


Great question. In our current test neutron/cinder/glance are Kilo. But we
do want to support the case where neutron/cinder/glance are Juno, as you
should be able to upgrade each service independently. While we don't test
it, we design around that goal, so with some testing and bug fixing it
should work.



 On Wed, Dec 3, 2014 at 5:49 PM, Li Junhong lijh.h...@gmail.com wrote:

 Hi Joe,

 Thank you for your confirmative answer and the wonderful gate testing
 pipeline.

 On Wed, Dec 3, 2014 at 5:38 PM, Joe Gordon joe.gord...@gmail.com wrote:



 On Wed, Dec 3, 2014 at 11:09 AM, Li Junhong lijh.h...@gmail.com wrote:

 Hi All,

 Is it possible for Kilo nova controller to control the Juno compute
 nodes? Is this scenario supported naturally by the nova mechanism in the
 design and codes level?


 Yes,

 We gate on making sure we can run Kilo nova with Juno compute nodes.




 --
 Best Regards!
 
 Junhong, Li

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




 --
 Best Regards!
 
 Junhong, Li




 --
 Best Regards!
 
 Junhong, Li

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [python-novaclient] Status of novaclient V3

2014-12-03 Thread Joe Gordon
On Tue, Dec 2, 2014 at 5:21 PM, Andrey Kurilin akuri...@mirantis.com
wrote:

 Hi!

 While working on fixing wrong import in novaclient v3 shell, I have found
 that a lot of commands, which are listed in V3 shell(novaclient.v3.shell),
 are broken, because appropriate managers are missed from V3
 client(novaclient.V3.client.Client).

 Template of error is ERROR (AttributeError): 'Client' object has no
 attribute 'attr', where attr can be floating_ip_pools,
 floating_ip, security_groups, dns_entries and etc.

 I know that novaclient V3 is not finished yet, and I guess it will be not
 finished. So the main question is:
  What we should do with implemented code of novaclient V3 ? Should it be
 ported to novaclient V2.1 or it can be removed?


I think it can be removed, as we are not going forward with the V3 API. But
I will defer to Christopher Yeoh/Ken’ichi Ohmichi for the details.




 --
 Best regards,
 Andrey Kurilin.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][qa] Moving the hacking project under the QA program

2014-12-02 Thread Joe Gordon
On Tue, Dec 2, 2014 at 9:48 PM, Doug Hellmann d...@doughellmann.com wrote:

 The hacking project is currently managed by the Oslo team. Normally Joe
 Gordon handles all of the release work, and so I haven’t bothered to look
 at how the Launchpad project is set up or how the branches are managed
 before today. However, today’s issue with oslo.concurrency resulted in a
 need to hurry a release, and in the process of doing that I realized that
 it’s not set up at all like the other Oslo projects. When I started
 thinking about how to get that done, I also realized that maybe it’s a
 better fit for the QA program anyway, since it has to do with code quality
 and isn’t really a “library”.


One of the outcomes if this whole incident is we just grew the
hacking-release team to include qa-release as well.  So in case another
issue like this arises we just need one of three people to be available to
tag a release.



 I talked to Matt and Joe and they agreed that the QA program would be
 willing to take over managing hacking. Matt posted a governance change, and
 this email thread is mostly so we’ll have the thought process behind the
 move on the record [1].

 I don’t really expect any significant changes to the way hacking is
 managed. From my perspective, the change is more about standardizing the
 Oslo library management further and less about hacking. Joe is happy to
 have the core review team largely stay the same, although we should ask
 members of oslo-core if they want to still be on hacking-core rather than
 just assuming (talk to jogo on IRC to make sure you’re on the list if you
 want to be).


Yup, if you are currently oslo-core and would like to continue being
hacking-core just find me on IRC.



 Doug

 [1] https://review.openstack.org/138499



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] deleting the pylint test job

2014-11-24 Thread Joe Gordon
On Mon, Nov 24, 2014 at 9:52 AM, Sean Dague s...@dague.net wrote:

 The pylint test job has been broken for weeks, no one seemed to care.
 While waiting for other tests to return today I looked into it and
 figured out the fix.

 However, because of nova objects pylint is progressively less and less
 useful. So the fact that no one else looked at it means that people
 didn't seem to care that it was provably broken. I think it's better
 that we just delete the jobs and save a node on every nova patch instead.


+1



 Project Config Proposed here - https://review.openstack.org/#/c/136846/

 If you -1 that you own fixing it, and making nova objects patches
 sensible in pylint.

 -Sean

 --
 Sean Dague
 http://dague.net

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal new hacking rules

2014-11-21 Thread Joe Gordon
On Fri, Nov 21, 2014 at 8:57 AM, Sahid Orentino Ferdjaoui 
sahid.ferdja...@redhat.com wrote:

 On Thu, Nov 20, 2014 at 02:00:11PM -0800, Joe Gordon wrote:
  On Thu, Nov 20, 2014 at 9:49 AM, Sahid Orentino Ferdjaoui 
  sahid.ferdja...@redhat.com wrote:
 
   This is something we can call nitpiking or low priority.
  
 
  This all seems like nitpicking for very little value. I think there are
  better things we can be focusing on instead of thinking of new ways to
 nit
  pick. So I am -1 on all of these.

 Yes as written this is low priority but something necessary for a
 project like Nova it is.


Why do you think this is necessary?


 Considered that I feel sad to take your time. Can I suggest you to
 take no notice of this and let's others developers working on Nova too
 do this job ?


As the maintainer of openstack-dev/hacking and as a nova core, I don't
think this is worth doing at all. Nova already has enough on its plate and
doesn't need extra code to review.


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Scale out bug-triage by making it easier for people to contribute

2014-11-21 Thread Joe Gordon
On Tue, Nov 18, 2014 at 11:48 PM, Flavio Percoco fla...@redhat.com wrote:

 On 18/11/14 14:45 -0800, Joe Gordon wrote:



 On Tue, Nov 18, 2014 at 10:58 AM, Clint Byrum cl...@fewbar.com wrote:

Excerpts from Flavio Percoco's message of 2014-11-17 08:46:19 -0800:
 Greetings,

 Regardless of how big/small bugs backlog is for each project, I
 believe this is a common, annoying and difficult problem. At the oslo
 meeting today, we're talking about how to address our bug triage
 process and I proposed something that I've seen done in other
 communities (rust-language [0]) that I consider useful and a good
 option for OpenStack too.

 The process consist in a bot that sends an email to every *volunteer*
 with 10 bugs to review/triage for the week. Each volunteer follows
 the
 triage standards, applies tags and provides information on whether
 the
 bug is still valid or not. The volunteer doesn't have to fix the bug,
 just triage it.

 In openstack, we could have a job that does this and then have people
 from each team volunteer to help with triage. The benefits I see are:

 * Interested folks don't have to go through the list and filter the
 bugs they want to triage. The bot should be smart enough to pick the
 oldest, most critical, etc.

 * It's a totally opt-in process and volunteers can obviously ignore
 emails if they don't have time that week.

 * It helps scaling out the triage process without poking people
 around
 and without having to do a call for volunteers every
 meeting/cycle/etc

 The above doesn't solve the problme completely but just like reviews,
 it'd be an optional, completely opt-in process that people can sign
 up
 for.


My experience in Ubuntu, where we encouraged non-developers to triage
bugs, was that non-developers often ask the wrong questions and
sometimes even harm the process by putting something in the wrong
priority or state because of a lack of deep understanding.

Triage in a hospital is done by experienced nurses and doctors working
together, not triagers. This is because it may not always be obvious
to somebody just how important a problem is. We have the same set of
problems. The most important thing is that developers see it as an
important task and take part. New volunteers should be getting involved
at every level, not just bug triage.


 ++, nice analogy.

 Another problem I have seen, is we need to constantly re-triage bugs, as
 just
 because a bug was marked as confirmed 6 months ago doesn't mean it is
 still
 valid.


 Ideally, the script will take care of this. Bugs that haven't been
 update for more than N months will fall into the to-triage pool for
 re-triage.


I am willing to sign up and give this a try.



 Flavio





I think the best approach to this, like reviews, is to have a place
where users can go to drive the triage workload to 0. For instance, the
ubuntu server team had this report for triage:

http://reqorts.qa.ubuntu.com/reports/ubuntu-server/triage-report.html

Sadly, it looks like they're overwhelmed or have abandoned the effort
(I hope this doesn't say something about Ubuntu server itself..), but
the basic process was to move bugs off these lists. I'm sure if we ask
nice the author of that code will share it with us and we could adapt
it for OpenStack projects.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



  ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 --
 @flaper87
 Flavio Percoco

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal new hacking rules

2014-11-20 Thread Joe Gordon
On Thu, Nov 20, 2014 at 9:49 AM, Sahid Orentino Ferdjaoui 
sahid.ferdja...@redhat.com wrote:

 This is something we can call nitpiking or low priority.


This all seems like nitpicking for very little value. I think there are
better things we can be focusing on instead of thinking of new ways to nit
pick. So I am -1 on all of these.



 I would like we introduce 3 new hacking rules to enforce the cohesion
 and consistency in the base code.


 Using boolean assertions
 

 Some tests are written with equality assertions to validate boolean
 conditions which is something not clean:

   assertFalse([]) asserts an empty list
   assertEqual(False, []) asserts an empty list is equal to the boolean
   value False which is something not correct.

 Some changes has been started here but still needs to be appreciated
 by community:

  * https://review.openstack.org/#/c/133441/
  * https://review.openstack.org/#/c/119366/


 Using same order of arguments in equality assertions
 

 Most of the code is written with assertEqual(Expected, Observed) but
 some part are still using the opposite. Even if they provide any real
 optimisation using the same convention help reviewing and keep a
 better consistency in the code.

   assertEqual(Expected, Observed) OK
   assertEqual(Observed, Expected) KO

 A change has been started here but still needs to be appreciated by
 community:

  * https://review.openstack.org/#/c/119366/


 Using LOG.warn instead of LOG.warning
 -

 We can see many time reviewers -1ed a patch to ask developer to use
 'warn' instead of 'warning'. This will provide no optimisation
 but let's finally have something clear about what we have to use.

   LOG.warning: 74
   LOG.warn:319

 We probably want to use 'warn'

 Nothing has been started from what I know.


 Thanks,
 s.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] policy on old / virtually abandoned patches

2014-11-18 Thread Joe Gordon
On Tue, Nov 18, 2014 at 6:17 AM, Daniel P. Berrange berra...@redhat.com
wrote:

 On Tue, Nov 18, 2014 at 07:33:54AM -0500, Sean Dague wrote:
  On 11/18/2014 07:29 AM, Daniel P. Berrange wrote:
   On Tue, Nov 18, 2014 at 07:06:59AM -0500, Sean Dague wrote:
   Nova currently has 197 patches that have seen no activity in the last
 4
   weeks (project:openstack/nova age:4weeks status:open).
  
   Of these
* 108 are currently Jenkins -1 (project:openstack/nova age:4weeks
   status:open label:Verified=-1,jenkins)
* 60 are -2 by a core team member (project:openstack/nova age:4weeks
   status:open label:Code-Review=-2)
  
   (note, those 2 groups sometimes overlap)
  
   Regardless, the fact that Nova currently has 792 open reviews, and 1/4
   of them seem dead, seems like a cleanup thing we could do.
  
   I'd like to propose that we implement our own auto abandon mechanism
   based on reviews that are either held by a -2, or Jenkins -1 after 4
   weeks time. I can write a quick script to abandon with a friendly
   message about why we are doing it, and to restore it if work is
 continuing.
   Yep, purging anything that's older than 4 weeks with negative karma
   seems like a good idea.  It'll make it easier for us to identify those
   patches which are still maintained and target them for review.
  
   That said, there's some edge cases - for example I've got some patches
   up for review that have a -2 on them, becase we're waiting for
 blueprint
   approval. IIRC, previously we would post a warning about pending auto-
   abandon a week before, and thus give the author the chance to add a
   comment to prevent auto-abandon taking place. It would be neccessary to
   have this ability to deal with the case where we're just temporarily
   blocked on other work.
  
   Also sometimes when you have a large patch series, you might have some
   patches later in the series which (temporarily) fail the jenkins jobs.
   It often isn't worth fixing those failures until you have dealt with
   review earlier in the patch series. So I think we should not
 auto-expire
   patches which are in the middle of a patch series, unless the
 preceeding
   patches in the series are to be expired too.  Yes this isn't something
   you can figure out with a single gerrit query - you'd have to query
   gerrit for patches and then look at the parent change references.
  Or just abandon and let people restore. I think handling the logic /
  policy for the edge cases isn't worth it when the author can very easily
  hit the restore button to get their patch back (and fresh for another
 4w).
 
  If it was a large patch series, this wouldn't happen anyway, because
  every rebase would make it fresh. 4w is really 4w of nothing changing.

 Ok, that makes sense and is workable I reckon.


++ for bringing back auto abandon in this model.



 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
 :|
 |: http://libvirt.org  -o- http://virt-manager.org
 :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
 :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
 :|

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Scale out bug-triage by making it easier for people to contribute

2014-11-18 Thread Joe Gordon
On Tue, Nov 18, 2014 at 10:58 AM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from Flavio Percoco's message of 2014-11-17 08:46:19 -0800:
  Greetings,
 
  Regardless of how big/small bugs backlog is for each project, I
  believe this is a common, annoying and difficult problem. At the oslo
  meeting today, we're talking about how to address our bug triage
  process and I proposed something that I've seen done in other
  communities (rust-language [0]) that I consider useful and a good
  option for OpenStack too.
 
  The process consist in a bot that sends an email to every *volunteer*
  with 10 bugs to review/triage for the week. Each volunteer follows the
  triage standards, applies tags and provides information on whether the
  bug is still valid or not. The volunteer doesn't have to fix the bug,
  just triage it.
 
  In openstack, we could have a job that does this and then have people
  from each team volunteer to help with triage. The benefits I see are:
 
  * Interested folks don't have to go through the list and filter the
  bugs they want to triage. The bot should be smart enough to pick the
  oldest, most critical, etc.
 
  * It's a totally opt-in process and volunteers can obviously ignore
  emails if they don't have time that week.
 
  * It helps scaling out the triage process without poking people around
  and without having to do a call for volunteers every meeting/cycle/etc
 
  The above doesn't solve the problme completely but just like reviews,
  it'd be an optional, completely opt-in process that people can sign up
  for.
 

 My experience in Ubuntu, where we encouraged non-developers to triage
 bugs, was that non-developers often ask the wrong questions and
 sometimes even harm the process by putting something in the wrong
 priority or state because of a lack of deep understanding.

 Triage in a hospital is done by experienced nurses and doctors working
 together, not triagers. This is because it may not always be obvious
 to somebody just how important a problem is. We have the same set of
 problems. The most important thing is that developers see it as an
 important task and take part. New volunteers should be getting involved
 at every level, not just bug triage.


++, nice analogy.

Another problem I have seen, is we need to constantly re-triage bugs, as
just because a bug was marked as confirmed 6 months ago doesn't mean it is
still valid.



 I think the best approach to this, like reviews, is to have a place
 where users can go to drive the triage workload to 0. For instance, the
 ubuntu server team had this report for triage:

 http://reqorts.qa.ubuntu.com/reports/ubuntu-server/triage-report.html

 Sadly, it looks like they're overwhelmed or have abandoned the effort
 (I hope this doesn't say something about Ubuntu server itself..), but
 the basic process was to move bugs off these lists. I'm sure if we ask
 nice the author of that code will share it with us and we could adapt
 it for OpenStack projects.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Dynamic VM Consolidation Agent as part of Nova

2014-11-18 Thread Joe Gordon
On Tue, Nov 18, 2014 at 7:40 AM, Mehdi Sheikhalishahi 
mehdi.alish...@gmail.com wrote:

 Hi,

 I would like to bring Dynamic VM Consolidation capability into Nova. That
 is I would like to check compute nodes status periodically (let's say every
 15 minutes) and consolidate VMs if there is any opportunity to turn off
 some compute nodes.

 Any hints on how to get into this development process as part of nova?


While I like the idea of having dynamic VM consolidation capabilities
somewhere in OpenStack, it doesn't belongs in Nova. This service should
live outside of Nova and just consume Nova's REST APIs. If there is some
piece of information that this service would need that isn't made available
via the REST API, we can fix that.



 Thanks,
 Mehdi

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Neutron][NFV][Third-party] CI for NUMA, SR-IOV, and other features that can't be tested on current infra.

2014-11-18 Thread Joe Gordon
On Mon, Nov 17, 2014 at 1:29 PM, Ian Wells ijw.ubu...@cack.org.uk wrote:

 On 12 November 2014 11:11, Steve Gordon sgor...@redhat.com wrote:

 NUMA
 

 We still need to identify some hardware to run third party CI for the
 NUMA-related work, and no doubt other things that will come up. It's
 expected that this will be an interim solution until OPNFV resources can be
 used (note cdub jokingly replied 1-2 years when asked for a rough
 estimate - I mention this because based on a later discussion some people
 took this as a serious estimate).

 Ian did you have any luck kicking this off? Russell and I are also
 endeavouring to see what we can do on our side w.r.t. this short term
 approach - in particular if you find hardware we still need to find an
 owner to actually setup and manage it as discussed.



 In theory to get started we need a physical multi-socket box and a virtual
 machine somewhere on the same network to handle job control etc. I believe
 the tests themselves can be run in VMs (just not those exposed by existing
 public clouds) assuming a recent Libvirt and an appropriately crafted
 Libvirt XML that ensures the VM gets a multi-socket topology etc. (we can
 assist with this).


 With apologies for the late reply, but I was off last week.  And because I
 was off last week I've not done anything about this so far.

 I'm assuming that we'll just set up one physical multisocket box and
 ensure that we can do a cleanup-deploy cycle so that we can run whatever
 x86-dependent but otherwise relatively hardware agnostic tests we might
 need.  Seems easier than worrying about what libvirt and KVM do and don't
 support at a given moment in time.

 I'll go nag our lab people for the machines.  I'm thinking for the
 cleanup-deploy that I might just try booting the physical machine into a
 RAM root disk and then running a devstack setup, as it's probably faster
 than a clean install, but I'm open to options.  (There's quite a lot of
 memory in the servers we have so this is likely to work fine.)

 That aside, where are the tests going to live?


Great question, I am thinking these tests are a good candidate for
functional (devstack) based tests that live in the nova tree.



 --
 Ian.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] CI for NUMA, SR-IOV, and other features that can't be tested on current infra.

2014-11-18 Thread Joe Gordon
On Sun, Nov 16, 2014 at 5:31 AM, Irena Berezovsky ire...@mellanox.com
wrote:

 Hi Steve,
 Regarding SR-IOV testing, at Mellanox we have CI job running on bare metal
 node with Mellanox SR-IOV NIC.  This job is reporting on neutron patches.
 Currently API tests are executed.
 The contact person for SRIOV CI job is listed at driverlog:

 https://github.com/stackforge/driverlog/blob/master/etc/default_data.json#L1439




 The following items are in progress:
  - SR-IOV functional testing


Where do you envision these tests living?


  - Reporting CI job on nova patches


Looking forward to it. I assume you will be working with the other people
trying to set up assorted CI systems in this space.


  - Multi-node setup
 It worth to mention that we   want to start the collaboration on SR-IOV
 testing effort as part of the pci pass-through subteam activity.
 Please join the weekly meeting if you want to collaborate or have some
 inputs: https://wiki.openstack.org/wiki/Meetings/Passthrough

 BR,
 Irena

 -Original Message-
 From: Steve Gordon [mailto:sgor...@redhat.com]
 Sent: Wednesday, November 12, 2014 9:11 PM
 To: itai mendelsohn; Adrian Hoban; Russell Bryant; Ian Wells (iawells);
 Irena Berezovsky; ba...@cisco.com
 Cc: Nikola Đipanov; Russell Bryant; OpenStack Development Mailing List
 (not for usage questions)
 Subject: [Nova][Neutron][NFV][Third-party] CI for NUMA, SR-IOV, and other
 features that can't be tested on current infra.

 Hi all,

 We had some discussions last week - particularly in the Nova NFV design
 session [1] - on the subject of ensuring that telecommunications and
 NFV-related functionality has adequate continuous integration testing. In
 particular the focus here is on functionality that can't easily be tested
 on the public clouds that back the gate, including:

 - NUMA (vCPU pinning, vCPU layout, vRAM layout, huge pages, I/O device
 locality)
 - SR-IOV with Intel, Cisco, and Mellanox devices (possibly others)

 In each case we need to confirm where we are at, and the plan going
 forward, with regards to having:

 1) Hardware to run the CI on.
 2) Tests that actively exercise the functionality (if not already in
 existence).
 3) Point person for each setup to maintain it and report into the
 third-party meeting [2].
 4) Getting the jobs operational and reporting [3][4][5][6].

 In the Nova session we discussed a goal of having the hardware by K-1 (Dec
 18) and having it reporting at least periodically by K-2 (Feb 5). I'm not
 sure if similar discussions occurred on the Neutron side of the design
 summit.

 SR-IOV
 ==

 Adrian and Irena mentioned they were already in the process of getting up
 to speed with third party CI for their respective SR-IOV configurations.
 Robert are you attempting similar with regards to Cisco devices? What is
 the status of each of these efforts versus the four items I lifted above
 and what do you need assistance with?

 NUMA
 

 We still need to identify some hardware to run third party CI for the
 NUMA-related work, and no doubt other things that will come up. It's
 expected that this will be an interim solution until OPNFV resources can be
 used (note cdub jokingly replied 1-2 years when asked for a rough
 estimate - I mention this because based on a later discussion some people
 took this as a serious estimate).

 Ian did you have any luck kicking this off? Russell and I are also
 endeavouring to see what we can do on our side w.r.t. this short term
 approach - in particular if you find hardware we still need to find an
 owner to actually setup and manage it as discussed.

 In theory to get started we need a physical multi-socket box and a virtual
 machine somewhere on the same network to handle job control etc. I believe
 the tests themselves can be run in VMs (just not those exposed by existing
 public clouds) assuming a recent Libvirt and an appropriately crafted
 Libvirt XML that ensures the VM gets a multi-socket topology etc. (we can
 assist with this).

 Thanks,

 Steve

 [1] https://etherpad.openstack.org/p/kilo-nova-nfv
 [2] https://wiki.openstack.org/wiki/Meetings/ThirdParty
 [3] http://ci.openstack.org/third_party.html
 [4] http://www.joinfu.com/2014/01/understanding-the-openstack-ci-system/
 [5]
 http://www.joinfu.com/2014/02/setting-up-an-external-openstack-testing-system/
 [6]
 http://www.joinfu.com/2014/02/setting-up-an-openstack-external-testing-system-part-2/
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Wednesday Bug Day

2014-11-18 Thread Joe Gordon
Hi All,

At the kilo nova meeting up in Paris, We had a lengthy discussion on better
bug management. One of the action items was to make every Wednesday a bug
day nova in #openstack-nova [0]. So tomorrow will be our first attempt at
Bug Wednesday.   One of the issues we found with previous bug meeting
efforts was holding them off in a meeting room away from most of the nova
developers. Instead we are trying to set aside all of Wednesday as the day
where nova developers get together and discuss potential bugs, bug fixes in
#openstack-nova. So if you found a bug or are working on a bug fix and
would like feedback (or a review), join #openstack-nova and we, the nova
team, will try to help out.

best,
Joe


P.S. We are not sure if this will work, but at the meetup we agreed it was
at least worth a try.

[0] https://etherpad.openstack.org/p/kilo-nova-meetup
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Multi-node testing (for redis and others...)

2014-11-17 Thread Joe Gordon
On Mon, Nov 17, 2014 at 1:06 PM, Joshua Harlow harlo...@outlook.com wrote:

 Hi guys,

 A recent question came up about how do we test better with redis for tooz.
 I think this question is also relevant for ceilometer (and other users of
 redis) and in general applies to the whole of openstack as the larger
 system is what people run (I hope not everyone just runs devstack on a
 single-node and that's where they stop, ha).


https://review.openstack.org/#/c/106043/23



 The basic question is that redis (or zookeeper) have (and typically are)
 ways to be setup with multi-node instances (for example with redis +
 sentinel or zookeeper in multi-node configurations, or the newly released
 redis clustering...). It seems though that our testing infrastructure is
 setup to do the basics of tests (which isn't bad, but does have its
 limits), and this got me thinking on what would be needed to actually test
 these multi-node configurations of things like redis (configured in
 sentinel mode, or redis in clustering mode) in a realistic manner that
 tests 'common' failure patterns (net splits for example).

 I guess we can split it up into 3 or 4 (or more questions).

 1. How do we get a multi-node configuration (of say redis) setup in the
 first place, configured so that all nodes are running and sentinel (for
 example) is running as expected?
 2. How do we then inject failures into this setup to ensure that the
 applications and clients built ontop of those systems reliably handle these
 type of injected failures (something like https://github.com/aphyr/jepsen
 or similar?).
 3. How do we analyze those results (for when #2 doesn't turn out to work
 as expected) in a meaningful manner, so that we can then turn those
 experiments into more reliable software?

 Anyone else have any interesting ideas for this?

 -Josh

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder] [barbican] Stable check of openstack/cinder failed

2014-11-02 Thread Joe Gordon
the release of the new barbicanclient caused another bug as well:
https://bugs.launchpad.net/cinder/+bug/1388414, this one is causing all
grenade jobs on master to fail. It looks like we have a hole in the gating
logic somewhere.

On Sat, Nov 1, 2014 at 3:42 PM, Alan Pevec ape...@gmail.com wrote:

 Hi,

 cinder juno tests are failing after new barbicanclient release

  - periodic-cinder-python26-juno
 http://logs.openstack.org/periodic-stable/periodic-cinder-python26-juno/d660c21
 : FAILURE in 11m 37s
  - periodic-cinder-python27-juno
 http://logs.openstack.org/periodic-stable/periodic-cinder-python27-juno/d9bf4cb
 : FAILURE in 9m 04s

 I've filed https://bugs.launchpad.net/cinder/+bug/1388461 AFACT this
 affects master too.

 Cheers,
 Alan

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [All] Finalizing cross-project design summit track

2014-10-30 Thread Joe Gordon
On Thu, Oct 30, 2014 at 4:01 AM, Thierry Carrez thie...@openstack.org
wrote:

 Jay Pipes wrote:
  On 10/29/2014 09:07 PM, Russell Bryant wrote:
  On 10/29/2014 06:46 PM, Rochelle Grober wrote:
  Any chance we could use the opening to move either the Refstack
  session or the logging session from their current joint (and
  conflicting) time (15:40)?  QA really would be appreciated at both.
  And I'd really like to be at both.  I'd say the Refstack one would go
  better in the debug slot, as the API stuff is sort of related to the
  logging.  Switching with one of the 14:50 sessions might also work.
 
  Just hoping.  I really want great participation at all of these
  sessions.
 
  The gate debugging session is most likely going to be dropped at this
  point.  I don't see a big problem with moving the refstack one to that
  slot (the first time).
 
  Anyone else have a strong opinion on this?
 
  Sounds good to me.

 Sounds good.


With the gate debugging session being  dropped due to being the wrong
format to be productive, we now need a new session. After looking over the
etherpad of proposed cross project sessions I think there is one glaring
omission: the SDK. In the Kilo Cycle Goals Exercise thread [0] having a
real SDK was one of the top answers. Many folks had great responses that
clearly explained the issues end users are having [1].  As for who could
lead a session like this I have two ideas: Monty Taylor, who had one of the
most colorful explanations to why this is so critical, or Dean Troyer, one
of the few people actually working on this right now. I think it would be
embarrassing if we had no cross project session on SDKs, since there
appears to be a consensus that the making life easier for the end user is a
high priority.

The current catch is, the free slot is now at 15:40, so it would compete
with 'How to Tackle Technical Debt in Kilo,' a session which I expect to be
very popular with the same people who would be interested in attending a
SDK session.

[0]
http://lists.openstack.org/pipermail/openstack-dev/2014-September/044766.html
[1] https://etherpad.openstack.org/p/6cWQG9oNsr


 --
 Thierry Carrez (ttx)

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [All] Finalizing cross-project design summit track

2014-10-30 Thread Joe Gordon
On Thu, Oct 30, 2014 at 9:20 AM, Anne Gentle a...@openstack.org wrote:



 On Thu, Oct 30, 2014 at 10:53 AM, Joe Gordon joe.gord...@gmail.com
 wrote:



 On Thu, Oct 30, 2014 at 4:01 AM, Thierry Carrez thie...@openstack.org
 wrote:

 Jay Pipes wrote:
  On 10/29/2014 09:07 PM, Russell Bryant wrote:
  On 10/29/2014 06:46 PM, Rochelle Grober wrote:
  Any chance we could use the opening to move either the Refstack
  session or the logging session from their current joint (and
  conflicting) time (15:40)?  QA really would be appreciated at both.
  And I'd really like to be at both.  I'd say the Refstack one would go
  better in the debug slot, as the API stuff is sort of related to the
  logging.  Switching with one of the 14:50 sessions might also work.
 
  Just hoping.  I really want great participation at all of these
  sessions.
 
  The gate debugging session is most likely going to be dropped at
 this
  point.  I don't see a big problem with moving the refstack one to that
  slot (the first time).
 
  Anyone else have a strong opinion on this?
 
  Sounds good to me.

 Sounds good.


 With the gate debugging session being  dropped due to being the wrong
 format to be productive, we now need a new session. After looking over the
 etherpad of proposed cross project sessions I think there is one glaring
 omission: the SDK. In the Kilo Cycle Goals Exercise thread [0] having a
 real SDK was one of the top answers. Many folks had great responses that
 clearly explained the issues end users are having [1].  As for who could
 lead a session like this I have two ideas: Monty Taylor, who had one of the
 most colorful explanations to why this is so critical, or Dean Troyer, one
 of the few people actually working on this right now. I think it would be
 embarrassing if we had no cross project session on SDKs, since there
 appears to be a consensus that the making life easier for the end user is a
 high priority.


 There are many discussion sessions related to SDKs, they just aren't all
 in the cross-project slots. Plus these don't require an ATC badge
 (something users may not have).


If we want to make sure the end user has a more uniform experience, having
the individual python-*client discussions isn't sufficient.

Also, the issue is not lack of user feedback, the issue here is more of a
lack of people implementing the feedback.


 Application Ecosystem Working Group

 https://wiki.openstack.org/wiki/Application_Ecosystem_Working_Group

 Monday 2:30 (Degas)

 Thursday 1:40 (Hyatt)


These sessions have pretty broad scopes, and I don't think a discussion on
SDKs here is enough, since the issue isn't a lack of feedback.


 I think we can talk about the real SDK at one of these.

 There's also:

 Getting Started with the OpenStack Python SDK

 Monday 4:20 (Room 242AB)

This isn't a a design summit session, so it doesn't really make sense to do
future design work here.


 Anne

 The current catch is, the free slot is now at 15:40, so it would compete
 with 'How to Tackle Technical Debt in Kilo,' a session which I expect to be
 very popular with the same people who would be interested in attending a
 SDK session.

 [0]
 http://lists.openstack.org/pipermail/openstack-dev/2014-September/044766.html
 [1] https://etherpad.openstack.org/p/6cWQG9oNsr


 --
 Thierry Carrez (ttx)

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][CI] nova-networking or neutron netwokring for CI

2014-10-29 Thread Joe Gordon
On Wed, Oct 29, 2014 at 2:23 AM, Andreas Scheuring 
scheu...@linux.vnet.ibm.com wrote:

 Thanks for the feedback.

 OK, got it, nova networking is not a requirement for a CI. Then I'll see
 not a single reason to support it. We will investigate in the neutron
 way for our CI and also for production.


I wouldn't go so far as to say there is no reason to support
nova-networking in your CI system, and we are working on deprecating it. It
hasn't been deprecated yet. Ideally you could test both neutron and
nova-network, but if you had to choose one it should be neutron.





 Now coming back to the Hypervisorsupportmatrix
 ( https://wiki.openstack.org/wiki/HypervisorSupportMatrix ).
 I guess the scope of this matrix is only nova and not neutron,cinder,..
 isn't it?

 So in this matrix I have to tick the networking lines (vlan, routing,..)
 as NOT supported, right? (as scope is neutron-networking, altough we
 would support it with neutron).


Correct, if you don' support nova-network at all, then you should have an
'X' in those boxes.



 Thanks,
 Andreas





 --
 Andreas
 (irc: scheuran)


 On Tue, 2014-10-28 at 11:06 -0700, Joe Gordon wrote:
 
 
  On Tue, Oct 28, 2014 at 6:44 AM, Dan Smith d...@danplanet.com wrote:
   Are current nova CI platforms configured with
  nova-networking or with
   neutron networking? Or is networking in general not even a
  part of the
   nova CI approach?
 
  I think we have several that only run on Neutron, so I think
  it's fine
  to just do that.
 
 
  Agreed, neutron should be considered required for all of the reasons
  listed above.
 
 
  --Dan
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
 
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] TC election by the numbers

2014-10-29 Thread Joe Gordon
On Wed, Oct 29, 2014 at 6:27 PM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from Eoghan Glynn's message of 2014-10-29 16:37:42 -0700:
 
On Oct 29, 2014, at 3:32 PM, Eoghan Glynn egl...@redhat.com wrote:
   
   
Folks,
   
I haven't seen the customary number-crunching on the recent TC
 election,
so I quickly ran the numbers myself.
   
Voter Turnout
=
   
The turnout rate continues to decline, in this case from 29.7% to
 26.7%.
   
Here's how the participation rates have shaped up since the first
 TC2.0
election:
   
Election | Electorate | Voted | Turnout | Change

10/2013  | 1106   | 342   | 30.9%   | -8.0%
04/2014  | 1510   | 448   | 29.7%   | -4.1%
10/2014  | 1892   | 506   | 26.7%   | -9.9%
  
  
   Overall percentage of the electorate voting is declining, but absolute
   numbers of voters has increased. And in fact, the electorate has grown
 more
   than the turnout has declined.
 
  True that, but AFAIK the generally accepted metric on participation rates
  in elections is turnout as opposed to absolute voter numbers.
 

 IIRC, there is no method for removing foundation members. So there are
 likely a number of people listed who have moved on to other activities and
 are no longer involved with OpenStack. I'd actually be quite interested
 to see the turnout numbers with voters who missed the last two elections
 prior to this one filtered out.


Sounds like you need to freshen up on your bylaws ;).  There are methods to
remove foundation members: Bylaws Appendix 1 Section 3 [0]. Also you have
to be an ATC not just a Individual Member to vote, Appendix 4 Section 2 [1]

[0] http://www.openstack.org/legal/individual-member-policy/
[1] http://www.openstack.org/legal/technical-committee-member-policy/



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][CI] nova-networking or neutron netwokring for CI

2014-10-28 Thread Joe Gordon
On Tue, Oct 28, 2014 at 6:44 AM, Dan Smith d...@danplanet.com wrote:

  Are current nova CI platforms configured with nova-networking or with
  neutron networking? Or is networking in general not even a part of the
  nova CI approach?

 I think we have several that only run on Neutron, so I think it's fine
 to just do that.


Agreed, neutron should be considered required for all of the reasons listed
above.



 --Dan


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Summit] Coordination between OpenStack lower layer virt stack (libvirt, QEMU/KVM)

2014-10-22 Thread Joe Gordon
On Oct 21, 2014 4:10 AM, Daniel P. Berrange berra...@redhat.com wrote:

 On Tue, Oct 21, 2014 at 12:58:48PM +0200, Kashyap Chamarthy wrote:
  I was discussing $subject on #openstack-nova, Nikola Dipanov suggested

Sounds like a  great idea.

  it's worthwhile to bring this up on the list.
 
  I was looking at
 
  http://kilodesignsummit.sched.org/
 
  and noticed there's no specific session (correct me if I'm wrong) that's
  targeted at coordination between OpenStack - libvirt - QEMU/KVM.

 At previous summits, Nova has given each virt driver a dedicated session
 in its track. Those sessions have pretty much just been a walkthrough of
 the various features each virt team was planning.

 We always have far more topics to discuss than we have time available,
 and for this summit we want to change direction to maximise the value
 extracted from face-to-face meetings.

 As such any session which is just duplicating stuff that could easily be
 dealt with over email or irc is being cut, to make room for topics where
 we really need to have the f2f discussions. So the virt driver general
 sessions from previous summits are not likely to be on the schedule this
 time around.

Agreed, this mailing list is a great place to kick off the closer libvirt
QEMU/KVM discussions.


 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
:|
 |: http://libvirt.org  -o- http://virt-manager.org
:|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
:|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
:|

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [novaclient] E12* rules

2014-10-20 Thread Joe Gordon
On Fri, Oct 17, 2014 at 6:40 AM, Andrey Kurilin akuri...@mirantis.com
wrote:

 Hi everyone!

 I'm working on enabling E12* PEP8 rules in novaclient(status of my work
 listed below). Imo, PEP8 rules should be ignored only in extreme cases/for
 important reasons and we should decrease a number of ignored rules. This
 helps to keep code in more strict, readable form, which is very important
 when working in community.

 While working on rule E126, we started discussion with Joe Gordon about
 demand of these rules. I have no idea about reasons of why they should be
 ignored, so I want to know:
 - Why these rules should be ignored?
 - What do you think about enabling these rules?


I found the source of my confusion. See my inline comments in
https://review.openstack.org/#/c/122888/10/tox.ini

Hopefully this patch should clarify things:
https://review.openstack.org/129677




 Please, leave your opinion about E12* rules.

 Already enabled rules:
   E121,E125 - https://review.openstack.org/#/c/122888/
   E122 - https://review.openstack.org/#/c/123830/
   E123 - https://review.openstack.org/#/c/123831/

 Abandoned rule:
   E124 - https://review.openstack.org/#/c/123832/

 Pending review:
   E126 - https://review.openstack.org/#/c/123850/
   E127 - https://review.openstack.org/#/c/123851/
   E128 - https://review.openstack.org/#/c/127559/
   E129 - https://review.openstack.org/#/c/123852/


 --
 Best regards,
 Andrey Kurilin.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] FreeBSD host support

2014-10-20 Thread Joe Gordon
On Sat, Oct 18, 2014 at 10:04 AM, Roman Bogorodskiy 
rbogorods...@mirantis.com wrote:

 Hi,

 In discussion of this spec proposal:
 https://review.openstack.org/#/c/127827/ it was suggested by Joe Gordon
 to start a discussion on the mailing list.

 So I'll share my thoughts and a long term plan on adding FreeBSD host
 support for OpenStack.

 An ultimate goal is to allow using libvirt/bhyve as a compute driver.
 However, I think it would be reasonable to start with libvirt/qemu
 support first as it will allow to prepare the ground.


Before diving into the technical details below, I have one question. Why,
What is the benefit of this, besides the obvious 'we not support FreeBSD'?
Adding support for a new kernel introduces yet another column in our
support matrix, and will require a long term commitment to testing and
maintaining OpenStack on FreeBSD.




 High level overview of what needs to be done:

  - Nova
   * linux_net needs to be re-factored to allow to plug in FreeBSD
 support (that's what the spec linked above is about)
   * nova.virt.disk.mount needs to be extended to support FreeBSD's
 mdconfig(8) in a similar way to Linux's losetup
  - Glance and Keystone
 These components are fairly free of system specifics. Most likely
 they will require some small fixes like e.g. I made for Glance
 https://review.openstack.org/#/c/94100/
  - Cinder
 I didn't look close at Cinder from a porting perspective, tbh.
 Obviously, it'll need some backend driver that would work on
 FreeBSD, e.g. ZFS. I've seen some patches floating around for ZFS
 though. Also, I think it'll need an implementation of iSCSI stack
 on FreeBSD, because it has its own stack, not stgt. On the other
 hand, Cinder is not required for a minimal installation and that
 could be done after adding support of the other components.


What about neutron? We are in the process of trying to deprecate
nova-network, so any new thing needs to support neutron.



 Also, it's worth to mention that a discussion on this topic already
 happened on this maillist:

 http://lists.openstack.org/pipermail/openstack-dev/2014-March/031431.html

 Some of the limitations were resolved since then, specifically,
 libvirt/bhyve has no limitation on count of disk and ethernet devices
 anymore.

 Roman Bogorodskiy

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] request_id deprecation strategy question

2014-10-20 Thread Joe Gordon
On Mon, Oct 20, 2014 at 11:12 AM, gordon chung g...@live.ca wrote:

  The issue I'm highlighting is that those projects using the code now have
  to update their api-paste.ini files to import from the new location,
  presumably while giving some warning to operators about the impending
  removal of the old code.

 This was the issue i ran into when trying to switch projects to
 oslo.middleware where i couldn't get jenkins to pass -- grenade tests
 successfully did their job. we had a discussion on openstack-qa and it was
 suggested to add a upgrade script to grenade to handle the new reference
 and document the switch. [1]

 if there's any issue with this solution, feel free to let us know.


Going down this route means every deployment that wishes to upgrade now has
an extra step, and should be avoided whenever possible. Why not just have a
wrapper in project.openstack.common pointing to the new oslo.middleware
library. If that is not a viable solution, we should give operators one
full cycle where the oslo-incubator version is deprecated and they can
migrate to the new copy outside of the upgrade process itself. Since there
is no deprecation warning in Juno [0], We can deprecate the oslo-incubator
copy in Kilo and remove in L.


[0] first email in this thread



 [1]
 http://eavesdrop.openstack.org/irclogs/%23openstack-qa/%23openstack-qa.2014-10-10.log
  (search
 for gordc)

 cheers,
 *gord*

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Removing nova-bm support within os-cloud-config

2014-10-20 Thread Joe Gordon
On Mon, Oct 20, 2014 at 8:00 PM, Steve Kowalik ste...@wedontsleep.org
wrote:

 With the move to removing nova-baremetal, I'm concerned that portions
 of os-cloud-config will break once python-novaclient has released with
 the bits of the nova-baremetal gone -- import errors, and such like.


Nova won't be removing nova-baremetal support in the client until Juno is
end of lifed. As clients aren't part of the integrated release and need to
work with all supported versions.



 I'm also concerned about backward compatibility -- in that we can't
 really remove the functionality, because it will break that
 compatibility. A further concern is that because nova-baremetal is no
 longer checked in CI, code paths may bitrot.

 Should we pony up and remove support for talking to nova-baremetal in
 os-cloud-config? Or any other suggestions?

 --
 Steve
 If it (dieting) was like a real time strategy game, I'd have loaded a
 save game from ten years ago.
  - Greg, Columbia Internet

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] add cyclomatic complexity check to pep8 target

2014-10-16 Thread Joe Gordon
On Thu, Oct 16, 2014 at 8:11 PM, Angus Salkeld asalk...@mirantis.com
wrote:

 Hi all

 I came across some tools [1]  [2] that we could use to make sure we don't
 increase our code complexity.

 Has anyone had any experience with these or other tools?



Flake8 (and thus hacking) has built in McCabe Complexity checking.

flake8 --select=C --max-complexity 10

https://github.com/flintwork/mccabe
http://flake8.readthedocs.org/en/latest/warnings.html

Example on heat: http://paste.openstack.org/show/121561
Example in nova (max complexity of 20):
http://paste.openstack.org/show/121562



 radon is the underlying reporting tool and xenon is a monitor - meaning
 it will fail if a threshold is reached.

 To save you the time:
 radon cc -nd heat
 heat/engine/stack.py
 M 809:4 Stack.delete - E
 M 701:4 Stack.update_task - D
 heat/engine/resources/server.py
 M 738:4 Server.handle_update - D
 M 891:4 Server.validate - D
 heat/openstack/common/jsonutils.py
 F 71:0 to_primitive - D
 heat/openstack/common/config/generator.py
 F 252:0 _print_opt - D
 heat/tests/v1_1/fakes.py
 M 240:4 FakeHTTPClient.post_servers_1234_action - F

 It ranks the complexity from A (best) upwards, the command above (-nd)
 says only show D or worse.
 If you look at these methods they are getting out of hand and are
 becoming difficult to understand.
 I like the idea of having a threshold that says we are not going to just
 keep adding to the complexity
 of these methods.

 This can be enforced with:
 xenon --max-absolute E heat
 ERROR:xenon:block heat/tests/v1_1/fakes.py:240 post_servers_1234_action
 has a rank of F

 [1] https://pypi.python.org/pypi/radon
 [2] https://pypi.python.org/pypi/xenon

 If people are open to this, I'd like to add these to the test-requirements
 and trial this in Heat
 (as part of the pep8 tox target).


I think the idea of gating on complexity is a great idea and would like to
see nova adopt this as well. But why not just use flake8's built in stuff?



 Regards
 Angus

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] add cyclomatic complexity check to pep8 target

2014-10-16 Thread Joe Gordon
On Thu, Oct 16, 2014 at 8:53 PM, Morgan Fainberg morgan.fainb...@gmail.com
wrote:

 I agree we should use flake8 built-in if at all possible. I complexity
 checking will definitely help us in the long run keeping code maintainable.


Well this is scary:

./nova/virt/libvirt/driver.py:3736:1: C901
'LibvirtDriver._get_guest_config' is too complex (67)

http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py#n373
http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py#n3736

to

*http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py#n4113
http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py#n4113*




First step in fixing this, put a cap on it:  goog_106984861
https://review.openstack.org/129125




 +1 from me.


 —
 Morgan Fainberg


 On October 16, 2014 at 20:45:35, Joe Gordon (joe.gord...@gmail.com) wrote:
  On Thu, Oct 16, 2014 at 8:11 PM, Angus Salkeld
  wrote:
 
   Hi all
  
   I came across some tools [1]  [2] that we could use to make sure we
 don't
   increase our code complexity.
  
   Has anyone had any experience with these or other tools?
  
 
 
  Flake8 (and thus hacking) has built in McCabe Complexity checking.
 
  flake8 --select=C --max-complexity 10
 
  https://github.com/flintwork/mccabe
  http://flake8.readthedocs.org/en/latest/warnings.html
 
  Example on heat: http://paste.openstack.org/show/121561
  Example in nova (max complexity of 20):
  http://paste.openstack.org/show/121562
 
 
  
   radon is the underlying reporting tool and xenon is a monitor -
 meaning
   it will fail if a threshold is reached.
  
   To save you the time:
   radon cc -nd heat
   heat/engine/stack.py
   M 809:4 Stack.delete - E
   M 701:4 Stack.update_task - D
   heat/engine/resources/server.py
   M 738:4 Server.handle_update - D
   M 891:4 Server.validate - D
   heat/openstack/common/jsonutils.py
   F 71:0 to_primitive - D
   heat/openstack/common/config/generator.py
   F 252:0 _print_opt - D
   heat/tests/v1_1/fakes.py
   M 240:4 FakeHTTPClient.post_servers_1234_action - F
  
   It ranks the complexity from A (best) upwards, the command above (-nd)
   says only show D or worse.
   If you look at these methods they are getting out of hand and are
   becoming difficult to understand.
   I like the idea of having a threshold that says we are not going to
 just
   keep adding to the complexity
   of these methods.
  
   This can be enforced with:
   xenon --max-absolute E heat
   ERROR:xenon:block heat/tests/v1_1/fakes.py:240
 post_servers_1234_action
   has a rank of F
  
   [1] https://pypi.python.org/pypi/radon
   [2] https://pypi.python.org/pypi/xenon
  
   If people are open to this, I'd like to add these to the
 test-requirements
   and trial this in Heat
   (as part of the pep8 tox target).
  
 
  I think the idea of gating on complexity is a great idea and would like
 to
  see nova adopt this as well. But why not just use flake8's built in
 stuff?
 
 
  
   Regards
   Angus
  
   ___
   OpenStack-dev mailing list
   OpenStack-dev@lists.openstack.org
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Automatic evacuate

2014-10-13 Thread Joe Gordon
On Mon, Oct 13, 2014 at 1:32 PM, Adam Lawson alaw...@aqorn.com wrote:

 Looks like this was proposed and denied to be part of Nova for some reason
 last year. Thoughts on why and is the reasoning (whatever it was) still
 applicable?


Link?





 *Adam Lawson*

 AQORN, Inc.
 427 North Tatnall Street
 Ste. 58461
 Wilmington, Delaware 19801-2230
 Toll-free: (844) 4-AQORN-NOW ext. 101
 International: +1 302-387-4660
 Direct: +1 916-246-2072


 On Mon, Oct 13, 2014 at 1:26 PM, Adam Lawson alaw...@aqorn.com wrote:

 [switching to openstack-dev]

 Has anyone automated nova evacuate so that VM's on a failed compute host
 using shared storage are automatically moved onto a new host or is manually
 entering *nova compute instance host* required in all cases?

 If it's manual only or require custom Heat/Ceilometer templates, how hard
 would it be to enable automatic evacuation within Novs?

 i.e. (within /etc/nova/nova.conf)
 auto_evac = true

 Or is this possible now and I've simply not run across it?


 *Adam Lawson*

 AQORN, Inc.
 427 North Tatnall Street
 Ste. 58461
 Wilmington, Delaware 19801-2230
 Toll-free: (844) 4-AQORN-NOW ext. 101
 International: +1 302-387-4660
 Direct: +1 916-246-2072


 On Sat, Sep 27, 2014 at 12:32 AM, Clint Byrum cl...@fewbar.com wrote:

 So, what you're looking for is basically the same old IT, but with an
 API. I get that. For me, the point of this cloud thing is so that server
 operators can make _reasonable_ guarantees, and application operators
 can make use of them in an automated fashion.

 If you start guaranteeing 4 and 5 nines for single VM's, you're right
 back in the boat of spending a lot on server infrastructure even if your
 users could live without it sometimes.

 Compute hosts are going to go down. Networks are going to partition. It
 is not actually expensive to deal with that at the application layer. In
 fact when you know your business rules, you'll do a better job at doing
 this efficiently than some blanket replicate all the things layer
 might.

 I know, some clouds are just new ways to chop up these fancy 40 core
 megaservers that everyone is shipping. I'm sure OpenStack can do it, but
 I'm saying, I don't think OpenStack _should_ do it.

 Excerpts from Adam Lawson's message of 2014-09-26 20:30:29 -0700:
  Generally speaking that's true when you have full control over how you
  deploy applications as a consumer. As a provider however, cloud
 resiliency
  is king and it's generally frowned upon to associate instances
 directly to
  the underlying physical hardware for any reason. It's good when
 instances
  can come and go as needed, but in a production context, a failed
 compute
  host shouldn't take down every instance hosted on it. Otherwise there
 is no
  real abstraction going on and the cloud loses immense value.
  On Sep 26, 2014 4:15 PM, Clint Byrum cl...@fewbar.com wrote:
 
   Excerpts from Adam Lawson's message of 2014-09-26 14:43:40 -0700:
Hello fellow stackers.
   
I'm looking for discussions/plans re VM continuity.
   
I.e. Protection for instances using ephemeral storage against host
   failures
or auto-failover capability for instances on hosts where the host
 suffers
from an attitude problem?
   
I know fail-overs are supported and I'm quite certain
 auto-fail-overs are
possible in the event of a host failure (hosting instances not
 using
   shared
storage). I just can't find where this has been
 addressed/discussed.
   
Someone help a brother out? ; )
  
   I'm sure some of that is possible, but it's a cloud, so why not do
 things
   the cloud way?
  
   Spin up redundant bits in disparate availability zones. Replicate
 only
   what must be replicated. Use volumes for DR only when replication
 would
   be too expensive.
  
   Instances are cattle, not pets. Keep them alive just long enough to
 make
   your profit.
  
   ___
   Mailing list:
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
   Post to : openst...@lists.openstack.org
   Unsubscribe :
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
  




 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-qa] Post job failures

2014-10-09 Thread Joe Gordon
What about using graphite + logstash to power a post-job
/nightly-job/post-merge-periodic (the new thing we talked about in Germany)
dashboard?

There are a few different use cases for a dashboard for jobs that don't
report on gerrit changes.

* Track the success an failure rates over time
  * If I am maintaining a a job that doesn't vote anywhere, I will check
this daily
  * If I am part of the core team of a project where one feature is tested
post-merge, I want to periodically check this to see if that feature is
being maintained.
* Provide links to logs for failed jobs so the cause of the failure can be
investigated


We can do all this with graphite on logstash. Graphite for the tracking the
trends (something like http://jogo.github.io/gate/) and logstash to find
the logs for failed jobs (we can get around the 10 day logstash window by
saving the results instead of overwriting them every time we regenerate the
list of log links)

And if we really want some sort of alerts, there are a lot of graphite
tools (http://graphite.readthedocs.org/en/latest/tools.html) that can give
us alerts on metrics (alert me if the last X runs of job-foo-bar failed).


On Wed, Oct 1, 2014 at 9:46 AM, Jeremy Stanley fu...@yuggoth.org wrote:

 On 2014-10-01 10:39:40 -0400 (-0400), Matthew Treinish wrote:
 [...]
  So I actually think as a first pass this would be the best way to
  handle it. You can leave comments on a closed gerrit changes,
 [...]

 Not so easy as it sounds. Jobs in post are running on an arbitrary
 Git commit (more often than not, a merge commit), and mapping that
 back to a change in Gerrit is nontrivial.
 --
 Jeremy Stanley

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] gerrit based statistics

2014-10-08 Thread Joe Gordon
Recently there has been a lot of discussion around the development growing
pains in nova. Instead of guessing about how bad some of the issues are, I
tried to answer a few questions that may help us better understand the
issues.


Q: How many revisions does it take to merge a patch?

Average: 6.76 revisions
median: 4.0 revisions


Q: How many rechecks/verifies does it take to merge a patch (ignoring
rechecks where the same job failed before and after)?

Average: 0.749 rechecks per patch revision
median: 0.4285  rechecks per patch revision

For comparison here are the same results for tempest, which has a lot more
gating tests:

Average: 1.01591525738
median: 0.6


Q: How long does it take for a patch to get approved?

Average: 28 days
median: 11 days


Q: How long does it take for a patch to get approved that touches
'nova/virt/'?

Average: 34 days
median: 18 days


When looking at these numbers two things stick out out:

* We successfully use recheck an awful lot. More then I expected
* Patches that touch 'nova/virt' take about 20% more time to land or about
6 days. While that is definitely a difference, its smaller then I expected


Dataset: last 800 patches in nova
Code: https://github.com/jogo/gerrit-fun
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] gerrit based statistics

2014-10-08 Thread Joe Gordon
On Wed, Oct 8, 2014 at 4:30 PM, Joe Gordon joe.gord...@gmail.com wrote:

 Recently there has been a lot of discussion around the development growing
 pains in nova. Instead of guessing about how bad some of the issues are, I
 tried to answer a few questions that may help us better understand the
 issues.


 Q: How many revisions does it take to merge a patch?

 Average: 6.76 revisions
 median: 4.0 revisions


 Q: How many rechecks/verifies does it take to merge a patch (ignoring
 rechecks where the same job failed before and after)?

 Average: 0.749 rechecks per patch revision
 median: 0.4285  rechecks per patch revision

 For comparison here are the same results for tempest, which has a lot more
 gating tests:

 Average: 1.01591525738
 median: 0.6


 Q: How long does it take for a patch to get approved?

 Average: 28 days
 median: 11 days


 Q: How long does it take for a patch to get approved that touches
 'nova/virt/'?

 Average: 34 days
 median: 18 days


To expand on these numbers, same results for last 6 months of commits:

all of nova (1723 patches):
Average: 28.8
median: 11.0

nova/virt (476 patches):
 Average: 34.5




 When looking at these numbers two things stick out out:

 * We successfully use recheck an awful lot. More then I expected
 * Patches that touch 'nova/virt' take about 20% more time to land or about
 6 days. While that is definitely a difference, its smaller then I expected


 Dataset: last 800 patches in nova
 Code: https://github.com/jogo/gerrit-fun

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] gerrit based statistics

2014-10-08 Thread Joe Gordon
On Wed, Oct 8, 2014 at 6:12 PM, Michael Still mi...@stillhq.com wrote:

 On Thu, Oct 9, 2014 at 11:58 AM, Joe Gordon joe.gord...@gmail.com wrote:
  On Wed, Oct 8, 2014 at 4:30 PM, Joe Gordon joe.gord...@gmail.com
 wrote:
 
  Recently there has been a lot of discussion around the development
 growing
  pains in nova. Instead of guessing about how bad some of the issues
 are, I
  tried to answer a few questions that may help us better understand the
  issues.
 
 
  Q: How many revisions does it take to merge a patch?
 
  Average: 6.76 revisions
  median: 4.0 revisions
 
 
  Q: How many rechecks/verifies does it take to merge a patch (ignoring
  rechecks where the same job failed before and after)?
 
  Average: 0.749 rechecks per patch revision
  median: 0.4285  rechecks per patch revision
 
  For comparison here are the same results for tempest, which has a lot
 more
  gating tests:
 
  Average: 1.01591525738
  median: 0.6
 
 
  Q: How long does it take for a patch to get approved?
 
  Average: 28 days
  median: 11 days
 
 
  Q: How long does it take for a patch to get approved that touches
  'nova/virt/'?
 
  Average: 34 days
  median: 18 days
 
 
  To expand on these numbers, same results for last 6 months of commits:
 
  all of nova (1723 patches):
  Average: 28.8
  median: 11.0
 
  nova/virt (476 patches):
   Average: 34.5

 I think it would be interesting to break this up by driver
 directory... Are there drivers which take longer to land code for than
 others?


Like this?

subtree: None (1724 patches):
Average: 28.7
median: 11.0
subtree: nova/virt/ (476 patches):
Average: 34.5
median: 18.0
subtree: nova/virt/hyperv/ (38 patches):
Average: 46.8
median: 33.0
subtree: nova/virt/libvirt/ (224 patches):
Average: 35.9
median: 18.0
subtree: nova/virt/xenapi/ (72 patches):
Average: 39.5
median: 20.0
subtree: nova/virt/vmwareapi/ (134 patches):
Average: 38.7
median: 26.0



  When looking at these numbers two things stick out out:
 
  * We successfully use recheck an awful lot. More then I expected
  * Patches that touch 'nova/virt' take about 20% more time to land or
 about
  6 days. While that is definitely a difference, its smaller then I
 expected
 
 
  Dataset: last 800 patches in nova
  Code: https://github.com/jogo/gerrit-fun
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

 Michael

 --
 Rackspace Australia

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Resource tracker

2014-10-07 Thread Joe Gordon
On Tue, Oct 7, 2014 at 10:56 AM, Vishvananda Ishaya vishvana...@gmail.com
wrote:


 On Oct 7, 2014, at 6:21 AM, Daniel P. Berrange berra...@redhat.com
 wrote:

  On Mon, Oct 06, 2014 at 02:55:20PM -0700, Joe Gordon wrote:
  On Mon, Oct 6, 2014 at 6:03 AM, Gary Kotton gkot...@vmware.com wrote:
 
  Hi,
  At the moment the resource tracker in Nova ignores that statistics that
  are returned by the hypervisor and it calculates the values on its
 own. Not
  only is this highly error prone but it is also very costly – all of the
  resources on the host are read from the database. Not only the fact
 that we
  are doing something very costly is troubling, the fact that we are over
  calculating resources used by the hypervisor is also an issue. In my
  opinion this leads us to not fully utilize hosts at our disposal. I
 have a
  number of concerns with this approach and would like to know why we
 are not
  using the actual resource reported by the hypervisor.
  The reason for asking this is that I have added a patch which uses the
  actual hypervisor resources returned and it lead to a discussion on the
  particular review (https://review.openstack.org/126237).
 
 
  So it sounds like you have mentioned two concerns here:
 
  1. The current method to calculate hypervisor usage is expensive in
 terms
  of database access.
  2. Nova ignores that statistics that are returned by the hypervisor and
  uses its own calculations.
 
 
  To #1, maybe we can doing something better, optimize the query, cache
 the
  result etc. As for #2 nova intentionally doesn't use the hypervisor
  statistics for a few reasons:
 
  * Make scheduling more deterministic, make it easier to reproduce issues
  etc.
  * Things like memory ballooning and thin provisioning in general, mean
 that
  the hypervisor is not reporting how much of the resources can be
 allocated
  but rather how much are currently in use (This behavior can vary from
  hypervisor to hypervisor today AFAIK -- which makes things confusing).
 So
  if I don't want to over subscribe RAM, and the hypervisor is using
 memory
  ballooning, the hypervisor statistics are mostly useless. I am sure
 there
  are more complex schemes that we can come up with that allow us to
 factor
  in the properties of thin provisioning, but is the extra complexity
 worth
  it?
 
  That is just an example of problems with the way Nova virt drivers
  /currently/ report usage to the schedular. It is easily within the
  realm of possibility for the virt drivers to be changed so that they
  report stats which take into account things like ballooning and thin
  provisioning so that we don't oversubscribe. Ignoring the hypervisor
  stats entirely and re-doing the calculations in the resource tracker
  code is just a crude workaround really. It is just swapping one set
  of problems for a new set of problems.


I agree, lets make reported hypervisor stats actually useful for
scheduling. This would mean we can have fewer config options (currently the
operator has to set aside resources for the underlying OS via a config
option).



 +1 lets make the hypervisors report detailed enough information that we
 can do it without having to recalculate.


Do we have any idea of how expensive recalculating this information is?



 Vish

 
  That being said I am fine with discussing in a spec the idea of adding
 an
  option to use the hypervisor reported statistics, as long as it is off
 by
  default.
 
  I'm against the idea of adding config options to switch between multiple
  codepaths because it is just punting the problem to the admins who are
  in an even worse position to decide what is best. It is saying would you
  rather your cloud have bug A or have bug B. We should be fixing the data
  the hypervisors report so that the resource tracker doesn't have to
 ignore
  them, and give the admins something which just works and avoid having to
  choose between 2 differently broken options.


 
  Regards,
  Daniel
  --
  |: http://berrange.com  -o-
 http://www.flickr.com/photos/dberrange/ :|
  |: http://libvirt.org  -o-
 http://virt-manager.org :|
  |: http://autobuild.org   -o-
 http://search.cpan.org/~danberr/ :|
  |: http://entangle-photo.org   -o-
 http://live.gnome.org/gtk-vnc :|
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Quota management and enforcement across projects

2014-10-07 Thread Joe Gordon
On Fri, Oct 3, 2014 at 10:47 AM, Morgan Fainberg morgan.fainb...@gmail.com
wrote:

 Keeping the enforcement local (same way policy works today) helps limit
 the fragility, big +1 there.

 I also agree with Vish, we need a uniform way to talk about quota
 enforcement similar to how we have a uniform policy language / enforcement
 model (yes I know it's not perfect, but it's far closer to uniform than
 quota management is).


It sounds like maybe we should have an oslo library for quotas? Somewhere
where we can share the code,but keep the operations local to each service.



 If there is still interest of placing quota in keystone, let's talk about
 how that will work and what will be needed from Keystone . The previous
 attempt didn't get much traction and stalled out early in implementation.
 If we want to revisit this lets make sure we have the resources needed and
 spec(s) in progress / info on etherpads (similar to how the multitenancy
 stuff was handled at the last summit) as early as possible.


Why not centralize quota management via the python-openstackclient, what is
the benefit of getting keystone involved?



 Cheers,
 Morgan

 Sent via mobile


 On Friday, October 3, 2014, Salvatore Orlando sorla...@nicira.com wrote:

 Thanks Vish,

 this seems a very reasonable first step as well - and since most projects
 would be enforcing quotas in the same way, the shared library would be the
 logical next step.
 After all this is quite the same thing we do with authZ.

 Duncan is expressing valid concerns which in my opinion can be addressed
 with an appropriate design - and a decent implementation.

 Salvatore

 On 3 October 2014 18:25, Vishvananda Ishaya vishvana...@gmail.com
 wrote:

 The proposal in the past was to keep quota enforcement local, but to
 put the resource limits into keystone. This seems like an obvious first
 step to me. Then a shared library for enforcing quotas with decent
 performance should be next. The quota calls in nova are extremely
 inefficient right now and it will only get worse when we try to add
 hierarchical projects and quotas.

 Vish

 On Oct 3, 2014, at 7:53 AM, Duncan Thomas duncan.tho...@gmail.com
 wrote:

  Taking quota out of the service / adding remote calls for quota
  management is going to make things fragile - you've somehow got to
  deal with the cases where your quota manager is slow, goes away,
  hiccups, drops connections etc. You'll also need some way of
  reconciling actual usage against quota usage periodically, to detect
  problems.
 
  On 3 October 2014 15:03, Salvatore Orlando sorla...@nicira.com
 wrote:
  Hi,
 
  Quota management is currently one of those things where every
 openstack
  project does its own thing. While quotas are obviously managed in a
 similar
  way for each project, there are subtle differences which ultimately
 result
  in lack of usability.
 
  I recall that in the past there have been several calls for unifying
 quota
  management. The blueprint [1] for instance, hints at the possibility
 of
  storing quotas in keystone.
  On the other hand, the blazar project [2, 3] seems to aim at solving
 this
  problem for good enabling resource reservation and therefore
 potentially
  freeing openstack projects from managing and enforcing quotas.
 
  While Blazar is definetely a good thing to have, I'm not entirely
 sure we
  want to make it a required component for every deployment. Perhaps
 single
  projects should still be able to enforce quota. On the other hand, at
 least
  on paper, the idea of making Keystone THE endpoint for managing
 quotas,
  and then letting the various project enforce them, sounds promising -
 is
  there any reason for which this blueprint is stalled to the point
 that it
  seems forgotten now?
 
  I'm coming to the mailing list with these random questions about quota
  management, for two reasons:
  1) despite developing and using openstack on a daily basis I'm still
  confused by quotas
  2) I've found a race condition in neutron quotas and the fix is not
 trivial.
  So, rather than start coding right away, it might probably make more
 sense
  to ask the community if there is already a known better approach to
 quota
  management - and obviously enforcement.
 
  Thanks in advance,
  Salvatore
 
  [1] https://blueprints.launchpad.net/keystone/+spec/service-metadata
  [2] https://wiki.openstack.org/wiki/Blazar
  [3] https://review.openstack.org/#/q/project:stackforge/blazar,n,z
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 
  --
  Duncan Thomas
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 

Re: [openstack-dev] [nova] Resource tracker

2014-10-06 Thread Joe Gordon
On Mon, Oct 6, 2014 at 6:03 AM, Gary Kotton gkot...@vmware.com wrote:

  Hi,
 At the moment the resource tracker in Nova ignores that statistics that
 are returned by the hypervisor and it calculates the values on its own. Not
 only is this highly error prone but it is also very costly – all of the
 resources on the host are read from the database. Not only the fact that we
 are doing something very costly is troubling, the fact that we are over
 calculating resources used by the hypervisor is also an issue. In my
 opinion this leads us to not fully utilize hosts at our disposal. I have a
 number of concerns with this approach and would like to know why we are not
 using the actual resource reported by the hypervisor.
 The reason for asking this is that I have added a patch which uses the
 actual hypervisor resources returned and it lead to a discussion on the
 particular review (https://review.openstack.org/126237).


So it sounds like you have mentioned two concerns here:

1. The current method to calculate hypervisor usage is expensive in terms
of database access.
2. Nova ignores that statistics that are returned by the hypervisor and
uses its own calculations.


To #1, maybe we can doing something better, optimize the query, cache the
result etc. As for #2 nova intentionally doesn't use the hypervisor
statistics for a few reasons:

* Make scheduling more deterministic, make it easier to reproduce issues
etc.
* Things like memory ballooning and thin provisioning in general, mean that
the hypervisor is not reporting how much of the resources can be allocated
but rather how much are currently in use (This behavior can vary from
hypervisor to hypervisor today AFAIK -- which makes things confusing). So
if I don't want to over subscribe RAM, and the hypervisor is using memory
ballooning, the hypervisor statistics are mostly useless. I am sure there
are more complex schemes that we can come up with that allow us to factor
in the properties of thin provisioning, but is the extra complexity worth
it?

That being said I am fine with discussing in a spec the idea of adding an
option to use the hypervisor reported statistics, as long as it is off by
default.




 Thanks
 Gary

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc] governance changes for big tent model

2014-10-03 Thread Joe Gordon
On Fri, Oct 3, 2014 at 6:07 AM, Doug Hellmann d...@doughellmann.com wrote:


 On Oct 3, 2014, at 12:46 AM, Joe Gordon joe.gord...@gmail.com wrote:


 On Thu, Oct 2, 2014 at 4:16 PM, Devananda van der Veen 
 devananda@gmail.com wrote:

 On Thu, Oct 2, 2014 at 2:16 PM, Doug Hellmann d...@doughellmann.com
 wrote:
  As promised at this week’s TC meeting, I have applied the various blog
 posts and mailing list threads related to changing our governance model to
 a series of patches against the openstack/governance repository [1].
 
  I have tried to include all of the inputs, as well as my own opinions,
 and look at how each proposal needs to be reflected in our current policies
 so we do not drop commitments we want to retain along with the processes we
 are shedding [2].
 
  I am sure we need more discussion, so I have staged the changes as a
 series rather than one big patch. Please consider the patches together when
 commenting. There are many related changes, and some incremental steps
 won’t make sense without the changes that come after (hey, just like code!).
 
  Doug
 
  [1]
 https://review.openstack.org/#/q/status:open+project:openstack/governance+branch:master+topic:big-tent,n,z
  [2] https://etherpad.openstack.org/p/big-tent-notes

 I've summed up a lot of my current thinking on this etherpad as well
 (I should really blog, but hey ...)

 https://etherpad.openstack.org/p/in-pursuit-of-a-new-taxonomy


 After seeing Jay's idea of making a yaml file modeling things and talking
 to devananda about this I went ahead and tried to graph the relationships
 out.

 repo: https://github.com/jogo/graphing-openstack
 preliminary YAML file:
 https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml
 sample graph: http://i.imgur.com/LwlkE73.png

 It turns out its really hard to figure out what the relationships are
 without digging deep into the code for each project, so I am sure I got a
 few things wrong (along with missing a lot of projects).


 The relationships are very important for setting up an optimal gate
 structure. I’m less convinced they are important for setting up the
 governance structure, and I do not think we want a specific gate
 configuration embedded in the governance structure at all. That’s why I’ve
 tried to describe general relationships (“optional inter-project
 dependences” vs. “strict co-dependent project groups” [1]) up until the
 very last patch in the series [2], which redefines the integrated release
 in terms of those other relationships and a base set of projects.


I agree the relationships are very important for gate structure and less so
for governance. I thought it would be nice to codify the relationships in a
machine readable format so we can do things with it, like try making
different rules and see how they would work.  For example we can already
make two groups of things that may be useful for testing:

* services that nothing depends on
* services that don't depend on other services

Latest graph: http://i.imgur.com/y8zmNIM.png


 Doug

 [1]
 https://review.openstack.org/#/c/125785/2/reference/project-testing-policies.rst
 [2] https://review.openstack.org/#/c/125789/


 -Deva

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc] governance changes for big tent model

2014-10-03 Thread Joe Gordon
On Fri, Oct 3, 2014 at 9:42 AM, Eoghan Glynn egl...@redhat.com wrote:



 - Original Message -
 
 
  On Fri, Oct 3, 2014 at 6:07 AM, Doug Hellmann  d...@doughellmann.com 
  wrote:
 
 
 
 
  On Oct 3, 2014, at 12:46 AM, Joe Gordon  joe.gord...@gmail.com  wrote:
 
 
 
 
 
  On Thu, Oct 2, 2014 at 4:16 PM, Devananda van der Veen 
  devananda@gmail.com  wrote:
 
 
  On Thu, Oct 2, 2014 at 2:16 PM, Doug Hellmann  d...@doughellmann.com 
  wrote:
   As promised at this week’s TC meeting, I have applied the various blog
   posts and mailing list threads related to changing our governance
 model to
   a series of patches against the openstack/governance repository [1].
  
   I have tried to include all of the inputs, as well as my own opinions,
 and
   look at how each proposal needs to be reflected in our current
 policies so
   we do not drop commitments we want to retain along with the processes
 we
   are shedding [2].
  
   I am sure we need more discussion, so I have staged the changes as a
 series
   rather than one big patch. Please consider the patches together when
   commenting. There are many related changes, and some incremental steps
   won’t make sense without the changes that come after (hey, just like
   code!).
  
   Doug
  
   [1]
  
 https://review.openstack.org/#/q/status:open+project:openstack/governance+branch:master+topic:big-tent,n,z
   [2] https://etherpad.openstack.org/p/big-tent-notes
 
  I've summed up a lot of my current thinking on this etherpad as well
  (I should really blog, but hey ...)
 
  https://etherpad.openstack.org/p/in-pursuit-of-a-new-taxonomy
 
 
  After seeing Jay's idea of making a yaml file modeling things and
 talking to
  devananda about this I went ahead and tried to graph the relationships
 out.
 
  repo: https://github.com/jogo/graphing-openstack
  preliminary YAML file:
  https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml
  sample graph: http://i.imgur.com/LwlkE73.png
  It turns out its really hard to figure out what the relationships are
 without
  digging deep into the code for each project, so I am sure I got a few
 things
  wrong (along with missing a lot of projects).
 
  The relationships are very important for setting up an optimal gate
  structure. I’m less convinced they are important for setting up the
  governance structure, and I do not think we want a specific gate
  configuration embedded in the governance structure at all. That’s why
 I’ve
  tried to describe general relationships (“optional inter-project
  dependences” vs. “strict co-dependent project groups” [1]) up until the
 very
  last patch in the series [2], which redefines the integrated release in
  terms of those other relationships and a base set of projects.
 
 
  I agree the relationships are very important for gate structure and less
 so
  for governance. I thought it would be nice to codify the relationships
 in a
  machine readable format so we can do things with it, like try making
  different rules and see how they would work. For example we can already
 make
  two groups of things that may be useful for testing:
 
  * services that nothing depends on
  * services that don't depend on other services
 
  Latest graph: http://i.imgur.com/y8zmNIM.png

 This diagram is missing any relationships for ceilometer.


It sure is, the graph is very much a work in progress.  Here is the yaml
that generates it
https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml want
to update that to includes ceilometer's relationships?



 Ceilometer calls APIs provided by:

  * keystone
  * nova
  * glance
  * neutron
  * swift

 Ceilometer consumes notifications from:

  * keystone
  * nova
  * glance
  * neutron
  * cinder
  * ironic
  * heat
  * sahara

 Ceilometer serves incoming API calls from:

  * heat
  * horizon

 Cheers,
 Eoghan

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] What's a dependency (was Re: [all][tc] governance changes for big tent...) model

2014-10-03 Thread Joe Gordon
On Fri, Oct 3, 2014 at 9:51 AM, Chris Dent chd...@redhat.com wrote:

 On Fri, 3 Oct 2014, Joe Gordon wrote:

  * services that nothing depends on
 * services that don't depend on other services

 Latest graph: http://i.imgur.com/y8zmNIM.png


 I'm hesitant to open this can but it's just lying there waiting,
 wiggling like good bait, so:

 How are you defining dependency in that picture?


data is coming from here:
https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml
and the key is here: https://github.com/jogo/graphing-openstack

Note ceilometer has no relationships because I wasn't sure what exactly
they were (which were required and which are optional etc.), not because
there are none. It turns out its not easy to find this information in an
easily digestible format.


 For example:

 Many of those services expect[1] to be able to send notifications (or
 be polled by) ceilometer[2]. We've got an ongoing thread about the need
 to contractualize notifications. Are those contracts (or the desire
 for them) a form of dependency? Should they be?


So in the case of notifications, I think that is a Ceilometer CAN-USE Nova
THROUGH notifications




 [1] It's not that it is a strict requirement but lots of people involved
 with the other projects contribute code to ceilometer or make
 changes in their own[3] project specifically to send info to
 ceilometer.

 [2] I'm not trying to defend ceilometer from slings here, just point out
 a good example, since it has _no_ arrows.

 [3] their own, that's hateful, let's have less of that.

 --
 Chris Dent tw:@anticdent freenode:cdent
 https://tank.peermore.com/tanks/cdent

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc] governance changes for big tent model

2014-10-02 Thread Joe Gordon
On Thu, Oct 2, 2014 at 4:16 PM, Devananda van der Veen 
devananda@gmail.com wrote:

 On Thu, Oct 2, 2014 at 2:16 PM, Doug Hellmann d...@doughellmann.com
 wrote:
  As promised at this week’s TC meeting, I have applied the various blog
 posts and mailing list threads related to changing our governance model to
 a series of patches against the openstack/governance repository [1].
 
  I have tried to include all of the inputs, as well as my own opinions,
 and look at how each proposal needs to be reflected in our current policies
 so we do not drop commitments we want to retain along with the processes we
 are shedding [2].
 
  I am sure we need more discussion, so I have staged the changes as a
 series rather than one big patch. Please consider the patches together when
 commenting. There are many related changes, and some incremental steps
 won’t make sense without the changes that come after (hey, just like code!).
 
  Doug
 
  [1]
 https://review.openstack.org/#/q/status:open+project:openstack/governance+branch:master+topic:big-tent,n,z
  [2] https://etherpad.openstack.org/p/big-tent-notes

 I've summed up a lot of my current thinking on this etherpad as well
 (I should really blog, but hey ...)

 https://etherpad.openstack.org/p/in-pursuit-of-a-new-taxonomy


After seeing Jay's idea of making a yaml file modeling things and talking
to devananda about this I went ahead and tried to graph the relationships
out.

repo: https://github.com/jogo/graphing-openstack
preliminary YAML file:
https://github.com/jogo/graphing-openstack/blob/master/openstack.yaml
sample graph: http://i.imgur.com/LwlkE73.png

It turns out its really hard to figure out what the relationships are
without digging deep into the code for each project, so I am sure I got a
few things wrong (along with missing a lot of projects).

-Deva

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Create an instance with a custom uuid

2014-10-01 Thread Joe Gordon
On Wed, Oct 1, 2014 at 8:29 AM, Solly Ross sr...@redhat.com wrote:

 (response inline)

 - Original Message -
  From: Pasquale Porreca pasquale.porr...@dektech.com.au
  To: openstack-dev@lists.openstack.org
  Sent: Wednesday, October 1, 2014 11:08:50 AM
  Subject: Re: [openstack-dev] [nova] Create an instance with a custom uuid
 
  Thank you for the answers.
 
  I understood the concerns about having the UUID completely user defined
 and I
  also understand Nova has no interest in supporting a customized
 algorithm to
  generate UUID. Anyway I may have found a solution that will cover my use
  case and respect the standard for UUID (RFC 4122
  http://www.ietf.org/rfc/rfc4122.txt ) .
 
  The generation of the UUID in Nova make use of the function uuid4() from
 the
  module uuid.py to have an UUID (pseudo)random, according to version 4
  described in RFC 4122. Anyway this is not the only algorithm supported in
  the standard (and implemented yet in uuid.py ).
 
  In particular I focused my attention on UUID version 1 and the method
  uuid1(node=None, clock_seq=None) that allows to pass as parameter a part
 of
  the UUID ( node is the field containing the last 12 hexadecimal digits of
  the UUID).
 
  So my idea was to give the chance to the user to set uiid version (1 or
 4,
  with the latter as default) when creating a new instance and in case of
  version 1 to pass optionally a value for parameter node .

 I would think that we could just have a node parameter here, and
 automatically
 use version 1 if that parameter is passed (if we decided to go the route
 of changing the current UUID behavior).


From what I gather this requested change in API is based on for your
blueprint https://blueprints.launchpad.net/nova/+spec/pxe-boot-instance.
Since your blueprint is not approved yet discussing further work to improve
it is a bit premature.



 
  Any thoughts?
 
  On 09/30/14 14:07, Andrew Laski wrote:
 
 
 
  On 09/30/2014 06:53 AM, Pasquale Porreca wrote:
 
 
  Going back to my original question, I would like to know:
 
  1) Is it acceptable to have the UUID passed from client side?
 
  In my opinion, no. This opens a door to issues we currently don't need to
  deal with, and use cases I don't think Nova should support. Another
  possibility, which I don't like either, would be to pass in some data
 which
  could influence the generation of the UUID to satisfy requirements.
 
  But there was a suggestion to look into addressing your use case on the
 QEMU
  mailing list, which I think would be a better approach.
 
 
 
 
  2) What is the correct way to do it? I started to implement this feature,
  simply passing it as metadata with key uuid, but I feel that this feature
  should have a reserved option rather then use metadata.
 
 
  On 09/25/14 17:26, Daniel P. Berrange wrote:
 
 
  On Thu, Sep 25, 2014 at 05:23:22PM +0200, Pasquale Porreca wrote:
 
 
  This is correct Daniel, except that that it is done by the virtual
  firmware/BIOS of the virtual machine and not by the OS (not yet
 installed at
  that time).
 
  This is the reason we thought about UUID: it is yet used by the iPXE
 client
  to be included in Bootstrap Protocol messages, it is taken from the
 uuid
  field in libvirt template and the uuid in libvirt is set by OpenStack;
 the
  only missing passage is the chance to set the UUID in OpenStack instead
 to
  have it randomly generated.
 
  Having another user defined tag in libvirt won't help for our issue,
 since
  it won't be included in Bootstrap Protocol messages, not without changes
 in
  the virtual BIOS/firmware (as you stated too) and honestly my team
 doesn't
  have interest in this (neither the competence).
 
  I don't think the configdrive or metadata service would help either: the
 OS
  on the instance is not yet installed at that time (the target if the
 network
  boot is exactly to install the OS on the instance!), so it won't be able
 to
  mount it.
  Ok, yes, if we're considering the DHCP client inside the iPXE BIOS
  blob, then I don't see any currently viable options besides UUID.
  There's no mechanism for passing any other data into iPXE that I
  am aware of, though if there is a desire todo that it could be
  raised on the QEMU mailing list for discussion.
 
 
  Regards,
  Daniel
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
  --
  Pasquale Porreca
 
  DEK Technologies
  Via dei Castelli Romani, 22
  00040 Pomezia (Roma)
 
  Mobile +39 3394823805
  Skype paskporr
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

 Best Regards,
 Solly Ross

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 

Re: [openstack-dev] [tc][cross-project-work] What about adding cross-project-spec repo?

2014-10-01 Thread Joe Gordon
On Mon, Sep 29, 2014 at 11:58 AM, Doug Hellmann d...@doughellmann.com
wrote:


 On Sep 29, 2014, at 5:51 AM, Thierry Carrez thie...@openstack.org wrote:

  Boris Pavlovic wrote:
  it goes without saying that working on cross-project stuff in OpenStack
  is quite hard task.
 
  Because it's always hard to align something between a lot of people from
  different project. And when topic start being too HOT  the discussion
  goes in wrong direction and attempt to do cross project change fails, as
  a result maybe not *ideal* but *good enough* change in OpenStack will be
  abandoned.
 
  The another issue that we have are specs. Projects are asking to make
  spec for change in their project, and in case of cross project stuff you
  need to make N similar specs (for every project). That is really hard to
  manage, and as a result you have N different specs that are describing
  the similar stuff.
 
  To make this process more formal, clear and simple, let's reuse process
  of specs but do it in one repo /openstack/cross-project-specs.
 
  It means that every cross project topic: Unification of python clients,
  Unification of logging, profiling, debugging api, bunch of others will
  be discussed in one single place..
 
  I think it's a good idea, as long as we truly limit it to cross-project
  specs, that is, to concepts that may apply to every project. The
  examples you mention are good ones. As a counterexample, if we have to
  sketch a plan to solve communication between Nova and Neutron, I don't
  think it would belong to that repository (it should live in whatever
  project would have the most work to do).
 
  Process description of cross-project-specs:
 
   * PTL - person that mange core team members list and puts workflow +1
 on accepted specs
   * Every project have 1 core position (stackforge projects are included)
   * Cores are chosen by project team, they task is to advocate project
 team opinion
   * No more veto, and -2 votes
   * If  75% cores +1 spec it's accepted. It means that all project have
 to accept this change.
   * Accepted specs gret high priority blueprints in all projects
 
  So I'm not sure you can force all projects to accept the change.
  Ideally, projects should see the benefits of alignment and adopt the
  common spec. In our recent discussions we are also going towards more
  freedom to projects, rather than less : imposing common specs to
  stackforge projects sounds like a step backwards there.
 
  Finally, I see some overlap with Oslo, which generally ends up
  implementing most of the common policy into libraries it encourages
  usage of. Therefore I'm not sure having a cross-project PTL makes
  sense, as he would be stuck between the Oslo PTL and the Technical
  Committee.

 There is some overlap with Oslo, and we would want to be involved in the
 discussions — especially if the plan includes any code to land in an Oslo
 library. I have so far been resisting the idea that oslo-specs is the best
 home for this, mostly because I didn’t want us to assume everything related
 to cross-project work is also related to Oslo work.

 That said, our approval process looks for consensus among all of the
 participants on the review, in addition to Oslo cores, so we can use
 oslo-specs and continue incorporating the +1/-1 votes from everyone. One of
 the key challenges we’ve had is signaling buy-in for cross-project work so
 having some sort of broader review process would be good, especially to
 help ensure that all interested parties have a chance to participate in the
 review.

 OTOH, a special repo with different voting permission settings also makes
 sense. I don’t have any good suggestions for who would decide when the
 voting on a proposal had reached consensus, or what to do if no consensus
 emerges. Having the TC manage that seems logical, but impractical. Maybe a
 person designated by the TC would oversee it?


Here is a governance patch to propose a openstack-specs repo:
https://review.openstack.org/125509



 
  With such simple rules we will simplify cross project work:
 
  1) Fair rules for all projects, as every project has 1 core that has 1
  vote.
 
  A project is hardly a metric for fairness. Some projects are 50 times
  bigger than others. What is a project in your mind ? A code repository
  ? Or more like a program (a collection of code repositories being worked
  on by the same team ?)
 
  So in summary, yes we need a place to discuss truly cross-project specs,
  but I think it can't force decisions to all projects (especially
  stackforge ones), and it can live within a larger-scope Oslo effort
  and/or the Technical Committee.
 
  --
  Thierry Carrez (ttx)
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 

Re: [openstack-dev] [all] [tc] Multi-clouds integration by OpenStack cascading

2014-09-30 Thread Joe Gordon
On Tue, Sep 30, 2014 at 6:04 AM, joehuang joehu...@huawei.com wrote:

 Hello, Dear TC and all,

 Large cloud operators prefer to deploy multiple OpenStack instances(as
 different zones), rather than a single monolithic OpenStack instance
 because of these reasons:

 1) Multiple data centers distributed geographically;
 2) Multi-vendor business policy;
 3) Server nodes scale up modularized from 00's up to million;
 4) Fault and maintenance isolation between zones (only REST interface);

 At the same time, they also want to integrate these OpenStack instances
 into one cloud. Instead of proprietary orchestration layer, they want to
 use standard OpenStack framework for Northbound API compatibility with
 HEAT/Horizon or other 3rd ecosystem apps.

 We call this pattern as OpenStack Cascading, with proposal described by
 [1][2]. PoC live demo video can be found[3][4].

 Nova, Cinder, Neutron, Ceilometer and Glance (optional) are involved in
 the OpenStack cascading.

 Kindly ask for cross program design summit session to discuss OpenStack
 cascading and the contribution to Kilo.


Cross program design summit sessions should be used for things that we are
unable to make progress on via this mailing list, and not as a way to begin
new conversations. With that in mind, I think this thread is a good place
to get initial feedback on the idea and possible make a plan for how to
tackle this.



 Kindly invite those who are interested in the OpenStack cascading to work
 together and contribute it to OpenStack.

 (I applied for “other projects” track [5], but it would be better to have
 a discussion as a formal cross program session, because many core programs
 are involved )


 [1] wiki: https://wiki.openstack.org/wiki/OpenStack_cascading_solution
 [2] PoC source code: https://github.com/stackforge/tricircle
 [3] Live demo video at YouTube:
 https://www.youtube.com/watch?v=OSU6PYRz5qY
 [4] Live demo video at Youku (low quality, for those who can't access
 YouTube):http://v.youku.com/v_show/id_XNzkzNDQ3MDg4.html
 [5]
 http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg36395.html

 Best Regards
 Chaoyi Huang ( Joe Huang )
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Kilo Blueprints and Specs

2014-09-29 Thread Joe Gordon
On Mon, Sep 29, 2014 at 5:23 AM, Gary Kotton gkot...@vmware.com wrote:

 Hi,
 Is the process documented anywhere? That is, if say for example I had a
 spec approved in J and its code did not land, how do we go about kicking
 the tires for K on that spec.


Specs will need be re-submitted once we open up the specs repo for Kilo.
The Kilo template will be changing a little bit, so specs will need a
little bit of reworking. But I expect the process to approve previously
approved specs to be quicker


 Thanks
 Gary

 On 9/29/14, 1:07 PM, John Garbutt j...@johngarbutt.com wrote:

 On 27 September 2014 00:31, Joe Gordon joe.gord...@gmail.com wrote:
  On Thu, Sep 25, 2014 at 9:21 AM, John Garbutt j...@johngarbutt.com
 wrote:
  On 25 September 2014 14:10, Daniel P. Berrange berra...@redhat.com
  wrote:
   The proposal is to keep kilo-1, kilo-2 much the same as juno.
 Except,
   we work harder on getting people to buy into the priorities that are
   set, and actively provoke more debate on their correctness, and we
   reduce the bar for what needs a blueprint.
  
   We can't have 50 high priority blueprints, it doesn't mean anything,
   right? We need to trim the list down to a manageable number, based
 on
   the agreed project priorities. Thats all I mean by slots / runway at
   this point.
  
   I would suggest we don't try to rank high/medium/low as that is
   too coarse, but rather just an ordered priority list. Then you
   would not be in the situation of having 50 high blueprints. We
   would instead naturally just start at the highest priority and
   work downwards.
 
  OK. I guess I was fixating about fitting things into launchpad.
 
  I guess having both might be what happens.
 
The runways
idea is just going to make me less efficient at reviewing. So I'm
very much against it as an idea.
  
   This proposal is different to the runways idea, although it
 certainly
   borrows aspects of it. I just don't understand how this proposal has
   all the same issues?
  
  
   The key to the kilo-3 proposal, is about getting better at saying
 no,
   this blueprint isn't very likely to make kilo.
  
   If we focus on a smaller number of blueprints to review, we should
 be
   able to get a greater percentage of those fully completed.
  
   I am just using slots/runway-like ideas to help pick the high
 priority
   blueprints we should concentrate on, during that final milestone.
   Rather than keeping the distraction of 15 or so low priority
   blueprints, with those poor submitters jamming up the check queue,
 and
   constantly rebasing, and having to deal with the odd stray review
   comment they might get lucky enough to get.
  
   Maybe you think this bit is overkill, and thats fine. But I still
   think we need a way to stop wasting so much of peoples time on
 things
   that will not make it.
  
   The high priority blueprints are going to end up being mostly the big
   scope changes which take alot of time to review  probably go through
   many iterations. The low priority blueprints are going to end up
 being
   the small things that don't consume significant resource to review
 and
   are easy to deal with in the time we're waiting for the big items to
   go through rebases or whatever. So what I don't like about the
 runways
   slots idea is that removes the ability to be agile and take the
   initiative
   to review  approve the low priority stuff that would otherwise never
   make it through.
 
  The idea is more around concentrating on the *same* list of things.
 
  Certainly we need to avoid the priority inversion of concentrating
  only on the big things.
 
  Its also why I suggested that for kilo-1 and kilo-2, we allow any
  blueprint to merge, and only restrict it to a specific list in kilo-3,
  the idea being to maximise the number of things that get completed,
  rather than merging some half blueprints, but not getting to the good
  bits.
 
 
  Do we have to decide this now, or can we see how project priorities go
 and
  reevaluate half way through Kilo-2?
 
 What we need to decide is not to use the runway idea for kilo-1 and
 kilo-2. At this point, I guess we have (passively) decided that now.
 
 I like the idea of waiting till mid kilo-2. Thats around Spec freeze,
 which is handy.
 
 Thanks,
 John
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-29 Thread Joe Gordon
On Wed, Sep 17, 2014 at 8:03 AM, Matt Riedemann mrie...@linux.vnet.ibm.com
wrote:



 On 9/16/2014 1:01 PM, Joe Gordon wrote:


 On Sep 15, 2014 8:31 PM, Jay Pipes jaypi...@gmail.com
 mailto:jaypi...@gmail.com wrote:
  
   On 09/15/2014 08:07 PM, Jeremy Stanley wrote:
  
   On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
   [...]
  
   Sometimes it's pretty hard to determine whether something in the
   E-R check page is due to something in the infra scripts, some
   transient issue in the upstream CI platform (or part of it), or
   actually a bug in one or more of the OpenStack projects.
  
   [...]
  
   Sounds like an NP-complete problem, but if you manage to solve it
   let me know and I'll turn it into the first line of triage for Infra
   bugs. ;)
  
  
   LOL, thanks for making me take the last hour reading Wikipedia pages
 about computational complexity theory! :P
  
   No, in all seriousness, I wasn't actually asking anyone to boil the
 ocean, mathematically. I think doing a couple things just making the
 categorization more obvious (a UI thing, really) and doing some
 (hopefully simple?) inspection of some control group of patches that we
 know do not introduce any code changes themselves and comparing to
 another group of patches that we know *do* introduce code changes to
 Nova, and then seeing if there are a set of E-R issues that consistently
 appear in *both* groups. That set of E-R issues has a higher likelihood
 of not being due to Nova, right?

 We use launchpad's affected projects listings on the elastic recheck
 page to say what may be causing the bug.  Tagging projects to bugs is a
 manual process, but one that works pretty well.

 UI: The elastic recheck UI definitely could use some improvements. I am
 very poor at writing UIs, so patches welcome!

  
   OK, so perhaps it's not the most scientific or well-thought out plan,
 but hey, it's a spark for thought... ;)
  
   Best,
   -jay
  
  
   ___
   OpenStack-dev mailing list
   OpenStack-dev@lists.openstack.org
 mailto:OpenStack-dev@lists.openstack.org
   http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 I'm not great with UIs either but would a dropdown of the affected
 projects be helpful and then people can filter on their favorite project
 and then the page is sorted by top offenders as we have today?

 There are times when the top bugs are infra issues (pip timeouts for
 exapmle) so you have to scroll a ways before finding something for your
 project (nova isn't the only one).



I think that would be helpful.




 --

 Thanks,

 Matt Riedemann



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Kilo Blueprints and Specs

2014-09-29 Thread Joe Gordon
On Mon, Sep 29, 2014 at 4:46 PM, Christopher Yeoh cbky...@gmail.com wrote:

 On Mon, 29 Sep 2014 13:32:57 -0700
 Joe Gordon joe.gord...@gmail.com wrote:

  On Mon, Sep 29, 2014 at 5:23 AM, Gary Kotton gkot...@vmware.com
  wrote:
 
   Hi,
   Is the process documented anywhere? That is, if say for example I
   had a spec approved in J and its code did not land, how do we go
   about kicking the tires for K on that spec.
  
 
  Specs will need be re-submitted once we open up the specs repo for
  Kilo. The Kilo template will be changing a little bit, so specs will
  need a little bit of reworking. But I expect the process to approve
  previously approved specs to be quicker

 Am biased given I have a spec approved for Juno which we didn't quite
 fully merge which we want to finish off early in Kilo (most of the
 patches are very close already to being ready to merge), but I think we
 should give priority to reviewing specs already approved in Juno and
 perhaps only require one +2 for re-approval.


I like the idea of prioritizing specs that were previously approved and
only requiring a single +2 for re-approval if there are no major changes to
them.



 Otherwise we'll end up wasting weeks of development time just when
 there is lots of review bandwidth available and the CI system is
 lightly loaded. Honestly, ideally I'd like to just start merging as
 soon as Kilo opens. Nothing has changed between Juno FF and Kilo opening
 so there's really no reason that an approved Juno spec should not be
 reapproved.

 Chris

 
 
   Thanks
   Gary
  
   On 9/29/14, 1:07 PM, John Garbutt j...@johngarbutt.com wrote:
  
   On 27 September 2014 00:31, Joe Gordon joe.gord...@gmail.com
   wrote:
On Thu, Sep 25, 2014 at 9:21 AM, John Garbutt
j...@johngarbutt.com
   wrote:
On 25 September 2014 14:10, Daniel P. Berrange
berra...@redhat.com wrote:
 The proposal is to keep kilo-1, kilo-2 much the same as juno.
   Except,
 we work harder on getting people to buy into the priorities
 that are set, and actively provoke more debate on their
 correctness, and we reduce the bar for what needs a
 blueprint.

 We can't have 50 high priority blueprints, it doesn't mean
 anything, right? We need to trim the list down to a
 manageable number, based
   on
 the agreed project priorities. Thats all I mean by slots /
 runway at this point.

 I would suggest we don't try to rank high/medium/low as that
 is too coarse, but rather just an ordered priority list. Then
 you would not be in the situation of having 50 high
 blueprints. We would instead naturally just start at the
 highest priority and work downwards.
   
OK. I guess I was fixating about fitting things into launchpad.
   
I guess having both might be what happens.
   
  The runways
  idea is just going to make me less efficient at reviewing.
  So I'm very much against it as an idea.

 This proposal is different to the runways idea, although it
   certainly
 borrows aspects of it. I just don't understand how this
 proposal has all the same issues?


 The key to the kilo-3 proposal, is about getting better at
 saying
   no,
 this blueprint isn't very likely to make kilo.

 If we focus on a smaller number of blueprints to review, we
 should
   be
 able to get a greater percentage of those fully completed.

 I am just using slots/runway-like ideas to help pick the high
   priority
 blueprints we should concentrate on, during that final
 milestone. Rather than keeping the distraction of 15 or so
 low priority blueprints, with those poor submitters jamming
 up the check queue,
   and
 constantly rebasing, and having to deal with the odd stray
 review comment they might get lucky enough to get.

 Maybe you think this bit is overkill, and thats fine. But I
 still think we need a way to stop wasting so much of peoples
 time on
   things
 that will not make it.

 The high priority blueprints are going to end up being mostly
 the big scope changes which take alot of time to review 
 probably go through many iterations. The low priority
 blueprints are going to end up
   being
 the small things that don't consume significant resource to
 review
   and
 are easy to deal with in the time we're waiting for the big
 items to go through rebases or whatever. So what I don't like
 about the
   runways
 slots idea is that removes the ability to be agile and take
 the initiative
 to review  approve the low priority stuff that would
 otherwise never make it through.
   
The idea is more around concentrating on the *same* list of
things.
   
Certainly we need to avoid the priority inversion of
concentrating only on the big things.
   
Its also why I suggested that for kilo-1 and kilo-2, we allow
any blueprint to merge, and only

Re: [openstack-dev] [nova] Kilo Blueprints and Specs

2014-09-26 Thread Joe Gordon
On Thu, Sep 25, 2014 at 9:21 AM, John Garbutt j...@johngarbutt.com wrote:

 On 25 September 2014 14:10, Daniel P. Berrange berra...@redhat.com
 wrote:
  The proposal is to keep kilo-1, kilo-2 much the same as juno. Except,
  we work harder on getting people to buy into the priorities that are
  set, and actively provoke more debate on their correctness, and we
  reduce the bar for what needs a blueprint.
 
  We can't have 50 high priority blueprints, it doesn't mean anything,
  right? We need to trim the list down to a manageable number, based on
  the agreed project priorities. Thats all I mean by slots / runway at
  this point.
 
  I would suggest we don't try to rank high/medium/low as that is
  too coarse, but rather just an ordered priority list. Then you
  would not be in the situation of having 50 high blueprints. We
  would instead naturally just start at the highest priority and
  work downwards.

 OK. I guess I was fixating about fitting things into launchpad.

 I guess having both might be what happens.

   The runways
   idea is just going to make me less efficient at reviewing. So I'm
   very much against it as an idea.
 
  This proposal is different to the runways idea, although it certainly
  borrows aspects of it. I just don't understand how this proposal has
  all the same issues?
 
 
  The key to the kilo-3 proposal, is about getting better at saying no,
  this blueprint isn't very likely to make kilo.
 
  If we focus on a smaller number of blueprints to review, we should be
  able to get a greater percentage of those fully completed.
 
  I am just using slots/runway-like ideas to help pick the high priority
  blueprints we should concentrate on, during that final milestone.
  Rather than keeping the distraction of 15 or so low priority
  blueprints, with those poor submitters jamming up the check queue, and
  constantly rebasing, and having to deal with the odd stray review
  comment they might get lucky enough to get.
 
  Maybe you think this bit is overkill, and thats fine. But I still
  think we need a way to stop wasting so much of peoples time on things
  that will not make it.
 
  The high priority blueprints are going to end up being mostly the big
  scope changes which take alot of time to review  probably go through
  many iterations. The low priority blueprints are going to end up being
  the small things that don't consume significant resource to review and
  are easy to deal with in the time we're waiting for the big items to
  go through rebases or whatever. So what I don't like about the runways
  slots idea is that removes the ability to be agile and take the
 initiative
  to review  approve the low priority stuff that would otherwise never
  make it through.

 The idea is more around concentrating on the *same* list of things.

 Certainly we need to avoid the priority inversion of concentrating
 only on the big things.

 Its also why I suggested that for kilo-1 and kilo-2, we allow any
 blueprint to merge, and only restrict it to a specific list in kilo-3,
 the idea being to maximise the number of things that get completed,
 rather than merging some half blueprints, but not getting to the good
 bits.


Do we have to decide this now, or can we see how project priorities go and
reevaluate half way through Kilo-2?



 Anyways, it seems like this doesn't hit a middle ground that would
 gain pre-summit approval. Or at least needs some online chat time to
 work out something.


 Thanks,
 John

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Choice of Series goal of a blueprint

2014-09-25 Thread Joe Gordon
On Thu, Sep 25, 2014 at 7:22 AM, Angelo Matarazzo 
angelo.matara...@dektech.com.au wrote:

 Hi all,
 Can create a blueprint and choose a previous Series goal (eg:Icehouse)?
 I think that it can be possible but no reviewer or driver will be
 interested in it.
 Right?


I am not sure what the 'why' is here, but Icehouse is under stable
maintenance mode so it is not accepting new features.


 Best regards,
 Angelo

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Thoughts on OpenStack Layers and a Big Tent model

2014-09-23 Thread Joe Gordon
On Tue, Sep 23, 2014 at 9:50 AM, Vishvananda Ishaya vishvana...@gmail.com
wrote:


 On Sep 23, 2014, at 8:40 AM, Doug Hellmann d...@doughellmann.com wrote:

  If we are no longer incubating *programs*, which are the teams of people
 who we would like to ensure are involved in OpenStack governance, then how
 do we make that decision? From a practical standpoint, how do we make a
 list of eligible voters for a TC election? Today we pull a list of
 committers from the git history from the projects associated with “official
 programs, but if we are dropping “official programs” we need some other
 way to build the list.

 Joe Gordon mentioned an interesting idea to address this (which I am
 probably totally butchering), which is that we make incubation more similar
 to the ASF Incubator. In other words make it more lightweight with no
 promise of governance or infrastructure support.


you only slightly butchered it :). From what I gather the Apache Software
Foundation primary goals are to:


* provide a foundation for open, collaborative software development
projects by supplying hardware, communication, and business infrastructure
* create an independent legal entity to which companies and individuals can
donate resources and be assured that those resources will be used for the
public benefit
* provide a means for individual volunteers to be sheltered from legal
suits directed at the Foundation's projects
* protect the 'Apache' brand, as applied to its software products, from
being abused by other organizations
[0]

This roughly translates into: JIRA, SVN, Bugzilla and Confluence etc.
for infrastructure resources. So ASF provides infrastructure, legal
support, a trademark and some basic oversight.


The [Apache] incubator is responsible for:
* filtering the proposals about the creation of a new project or sub-project
* help the creation of the project and the infrastructure that it needs to
operate
* supervise and mentor the incubated community in order for them to reach
an open meritocratic environment
* evaluate the maturity of the incubated project, either promoting it to
official project/ sub-project status or by retiring it, in case of failure.

It must be noted that the incubator (just like the board) does not perform
filtering on the basis of technical issues. This is because the foundation
respects and suggests variety of technical approaches. It doesn't fear
innovation or even internal confrontation between projects which overlap in
functionality. [1]

So my idea, which is very similar to Monty's, is to make move all the
non-layer 1 projects into something closer to an ASF model where there is
still incubation and graduation. But the only things a project receives out
of this process is:

* Legal support
* A trademark
* Mentorship
* Infrastructure to use
* Basic oversight via the incubation/graduation process with respect to the
health of the community.

They do not get:

* Required co-gating or integration with any other projects
* People to right there docs for them, etc.
* Technical review/oversight
* Technical requirements
* Evaluation on how the project fits into a bigger picture
* Language requirements
* etc.

Note: this is just an idea, not a fully formed proposal

[0] http://www.apache.org/foundation/how-it-works.html#what
[1] http://www.apache.org/foundation/how-it-works.html#incubator



 It is also interesting to consider that we may not need much governance
 for things outside of layer1. Of course, this may be dancing around the
 actual problem to some extent, because there are a bunch of projects that
 are not layer1 that are already a part of the community, and we need a
 solution that includes them somehow.

 Vish

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-23 Thread Joe Gordon
On Tue, Sep 23, 2014 at 9:13 AM, Zane Bitter zbit...@redhat.com wrote:

 On 22/09/14 22:04, Joe Gordon wrote:

 To me this is less about valid or invalid choices. The Zaqar team is
 comparing Zaqar to SQS, but after digging into the two of them, zaqar
 barely looks like SQS. Zaqar doesn't guarantee what IMHO is the most
 important parts of SQS: the message will be delivered and will never be
 lost by SQS.


 I agree that this is the most important feature. Happily, Flavio has
 clarified this in his other thread[1]:

  *Zaqar's vision is to provide a cross-cloud interoperable,
   fully-reliable messaging service at scale that is both, easy and not
   invasive, for deployers and users.*

   ...

   Zaqar aims to be a fully-reliable service, therefore messages should
   never be lost under any circumstances except for when the message's
   expiration time (ttl) is reached

 So Zaqar _will_ guarantee reliable delivery.

  Zaqar doesn't have the same scaling properties as SQS.


 This is true. (That's not to say it won't scale, but it doesn't scale in
 exactly the same way that SQS does because it has a different architecture.)

 It appears that the main reason for this is the ordering guarantee, which
 was introduced in response to feedback from users. So this is clearly a
 different design choice: SQS chose reliability plus effectively infinite
 scalability, while Zaqar chose reliability plus FIFO. It's not feasible to
 satisfy all three simultaneously, so the options are:

 1) Implement two separate modes and allow the user to decide
 2) Continue to choose FIFO over infinite scalability
 3) Drop FIFO and choose infinite scalability instead

 This is one of the key points on which we need to get buy-in from the
 community on selecting one of these as the long-term strategy.

  Zaqar is aiming for low latency per message, SQS doesn't appear to be.


 I've seen no evidence that Zaqar is actually aiming for that. There are
 waaay lower-latency ways to implement messaging if you don't care about
 durability (you wouldn't do store-and-forward, for a start). If you see a
 lot of talk about low latency, it's probably because for a long time people
 insisted on comparing Zaqar to RabbitMQ instead of SQS.


I thought this was why Zaqar uses Falcon and not Pecan/WSME?

For an application like Marconi where throughput and latency is of
paramount importance, I recommend Falcon over Pecan.
https://wiki.openstack.org/wiki/Zaqar/pecan-evaluation#Recommendation

Yes that statement mentions throughput as well, but it does mention latency
as well.



 (Let's also be careful not to talk about high latency as if it were a
 virtue in itself; it's simply something we would happily trade off for
 other properties. Zaqar _is_ making that trade-off.)

  So if Zaqar isn't SQS what is Zaqar and why should I use it?


 If you are a small-to-medium user of an SQS-like service, Zaqar is like
 SQS but better because not only does it never lose your messages but they
 always arrive in order, and you have the option to fan them out to multiple
 subscribers. If you are a very large user along one particular dimension (I
 believe it's number of messages delivered from a single queue, but probably
 Gordon will correct me :D) then Zaqar may not _yet_ have a good story for
 you.

 cheers,
 Zane.

 [1] http://lists.openstack.org/pipermail/openstack-dev/2014-
 September/046809.html


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-23 Thread Joe Gordon
On Tue, Sep 23, 2014 at 2:40 AM, Flavio Percoco fla...@redhat.com wrote:

 On 09/23/2014 05:13 AM, Clint Byrum wrote:
  Excerpts from Joe Gordon's message of 2014-09-22 19:04:03 -0700:

 [snip]

 
  To me this is less about valid or invalid choices. The Zaqar team is
  comparing Zaqar to SQS, but after digging into the two of them, zaqar
  barely looks like SQS. Zaqar doesn't guarantee what IMHO is the most
  important parts of SQS: the message will be delivered and will never be
  lost by SQS. Zaqar doesn't have the same scaling properties as SQS.
 Zaqar
  is aiming for low latency per message, SQS doesn't appear to be. So if
  Zaqar isn't SQS what is Zaqar and why should I use it?
 
 
  I have to agree. I'd like to see a simple, non-ordered, high latency,
  high scale messaging service that can be used cheaply by cloud operators
  and users. What I see instead is a very powerful, ordered, low latency,
  medium scale messaging service that will likely cost a lot to scale out
  to the thousands of users level.

 I don't fully agree :D

 Let me break the above down into several points:

 * Zaqar team is comparing Zaqar to SQS: True, we're comparing to the
 *type* of service SQS is but not *all* the guarantees it gives. We're
 not working on an exact copy of the service but on a service capable of
 addressing the same use cases.

 * Zaqar is not guaranteeing reliability: This is not true. Yes, the
 current default write concern for the mongodb driver is `acknowledge`
 but that's a bug, not a feature [0] ;)

 * Zaqar doesn't have the same scaling properties as SQS: What are SQS
 scaling properties? We know they have a big user base, we know they have
 lots of connections, queues and what not but we don't have numbers to
 compare ourselves with.


Here is *a* number
30k messages per second on a single queue:
http://java.dzone.com/articles/benchmarking-sqs



 * Zaqar is aiming for low latency per message: This is not true and I'd
 be curious to know where did this come from. A couple of things to
 consider:

 - First and foremost, low latency is a very relative measure  and
 it
 depends on each use-case.
 - The benchmarks Kurt did were purely informative. I believe it's
 good
 to do them every once in a while but this doesn't mean the team is
 mainly focused on that.
 - Not being focused on 'low-latency' does not mean the team will
 overlook performance.

 * Zaqar has FIFO and SQS doesn't: FIFO won't hurt *your use-case* if
 ordering is not a requirement but the lack of it does when ordering is a
 must.

 * Scaling out Zaqar will cost a lot: In terms of what? I'm pretty sure
 it's not for free but I'd like to understand better this point and
 figure out a way to improve it, if possible.

 * If Zaqar isn't SQS then what is it? Why should I use it?: I don't
 believe Zaqar is SQS as I don't believe nova is EC2. Do they share
 similar features and provide similar services? Yes, does that mean you
 can address similar use cases, hence a similar users? Yes.

 In addition to the above, I believe Zaqar is a simple service, easy to
 install and to interact with. From a user perspective the semantics are
 few and the concepts are neither new nor difficult to grasp. From an
 operators perspective, I don't believe it adds tons of complexity. It
 does require the operator to deploy a replicated storage environment but
 I believe all services require that.

 Cheers,
 Flavio

 P.S: Sorry for my late answer or lack of it. I lost *all* my emails
 yesterday and I'm working on recovering them.

 [0] https://bugs.launchpad.net/zaqar/+bug/1372335

 --
 @flaper87
 Flavio Percoco

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-22 Thread Joe Gordon
On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter zbit...@redhat.com wrote:

 On 22/09/14 10:11, Gordon Sim wrote:

 On 09/19/2014 09:13 PM, Zane Bitter wrote:

 SQS offers very, very limited guarantees, and it's clear that the reason
 for that is to make it massively, massively scalable in the way that
 e.g. S3 is scalable while also remaining comparably durable (S3 is
 supposedly designed for 11 nines, BTW).

 Zaqar, meanwhile, seems to be promising the world in terms of
 guarantees. (And then taking it away in the fine print, where it says
 that the operator can disregard many of them, potentially without the
 user's knowledge.)

 On the other hand, IIUC Zaqar does in fact have a sharding feature
 (Pools) which is its answer to the massive scaling question.


 There are different dimensions to the scaling problem.


 Many thanks for this analysis, Gordon. This is really helpful stuff.

  As I understand it, pools don't help scaling a given queue since all the
 messages for that queue must be in the same pool. At present traffic
 through different Zaqar queues are essentially entirely orthogonal
 streams. Pooling can help scale the number of such orthogonal streams,
 but to be honest, that's the easier part of the problem.


 But I think it's also the important part of the problem. When I talk about
 scaling, I mean 1 million clients sending 10 messages per second each, not
 10 clients sending 1 million messages per second each.

 When a user gets to the point that individual queues have massive
 throughput, it's unlikely that a one-size-fits-all cloud offering like
 Zaqar or SQS is _ever_ going to meet their needs. Those users will want to
 spin up and configure their own messaging systems on Nova servers, and at
 that kind of size they'll be able to afford to. (In fact, they may not be
 able to afford _not_ to, assuming per-message-based pricing.)


Running a message queue that has a high guarantee of not loosing a message
is hard and SQS promises exactly that, it *will* deliver your message. If a
use case can handle occasionally dropping messages then running your own MQ
makes more sense.

SQS is designed to handle massive queues as well, while I haven't found any
examples of queues that have 1 million messages/second being sent or
received  30k to 100k messages/second is not unheard of [0][1][2].

[0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s
[1] http://java.dzone.com/articles/benchmarking-sqs
[2]
http://www.slideshare.net/AmazonWebServices/massive-message-processing-with-amazon-sqs-and-amazon-dynamodb-arc301-aws-reinvent-2013-28431182


  There is also the possibility of using the sharding capabilities of the
 underlying storage. But the pattern of use will determine how effective
 that can be.

 So for example, on the ordering question, if order is defined by a
 single sequence number held in the database and atomically incremented
 for every message published, that is not likely to be something where
 the databases sharding is going to help in scaling the number of
 concurrent publications.

 Though sharding would allow scaling the total number messages on the
 queue (by distributing them over multiple shards), the total ordering of
 those messages reduces it's effectiveness in scaling the number of
 concurrent getters (e.g. the concurrent subscribers in pub-sub) since
 they will all be getting the messages in exactly the same order.

 Strict ordering impacts the competing consumers case also (and is in my
 opinion of limited value as a guarantee anyway). At any given time, the
 head of the queue is in one shard, and all concurrent claim requests
 will contend for messages in that same shard. Though the unsuccessful
 claimants may then move to another shard as the head moves, they will
 all again try to access the messages in the same order.

 So if Zaqar's goal is to scale the number of orthogonal queues, and the
 number of messages held at any time within these, the pooling facility
 and any sharding capability in the underlying store for a pool would
 likely be effective even with the strict ordering guarantee.


 IMHO this is (or should be) the goal - support enormous numbers of
 small-to-moderate sized queues.


If 50,000 messages per second doesn't count as small-to-moderate then Zaqar
does not fulfill a major SQS use case.




  If scaling the number of communicants on a given communication channel
 is a goal however, then strict ordering may hamper that. If it does, it
 seems to me that this is not just a policy tweak on the underlying
 datastore to choose the desired balance between ordering and scale, but
 a more fundamental question on the internal structure of the queue
 implementation built on top of the datastore.


 I agree with your analysis, but I don't think this should be a goal.

 Note that the user can still implement this themselves using
 application-level sharding - if you know that in-order delivery is not
 important to you, then randomly assign clients to a queue 

Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-22 Thread Joe Gordon
On Mon, Sep 22, 2014 at 5:47 PM, Zane Bitter zbit...@redhat.com wrote:

 On 22/09/14 17:06, Joe Gordon wrote:

 On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter zbit...@redhat.com wrote:

  On 22/09/14 10:11, Gordon Sim wrote:

  On 09/19/2014 09:13 PM, Zane Bitter wrote:

  SQS offers very, very limited guarantees, and it's clear that the
 reason
 for that is to make it massively, massively scalable in the way that
 e.g. S3 is scalable while also remaining comparably durable (S3 is
 supposedly designed for 11 nines, BTW).

 Zaqar, meanwhile, seems to be promising the world in terms of
 guarantees. (And then taking it away in the fine print, where it says
 that the operator can disregard many of them, potentially without the
 user's knowledge.)

 On the other hand, IIUC Zaqar does in fact have a sharding feature
 (Pools) which is its answer to the massive scaling question.


 There are different dimensions to the scaling problem.


 Many thanks for this analysis, Gordon. This is really helpful stuff.

   As I understand it, pools don't help scaling a given queue since all
 the

 messages for that queue must be in the same pool. At present traffic
 through different Zaqar queues are essentially entirely orthogonal
 streams. Pooling can help scale the number of such orthogonal streams,
 but to be honest, that's the easier part of the problem.


 But I think it's also the important part of the problem. When I talk
 about
 scaling, I mean 1 million clients sending 10 messages per second each,
 not
 10 clients sending 1 million messages per second each.

 When a user gets to the point that individual queues have massive
 throughput, it's unlikely that a one-size-fits-all cloud offering like
 Zaqar or SQS is _ever_ going to meet their needs. Those users will want
 to
 spin up and configure their own messaging systems on Nova servers, and at
 that kind of size they'll be able to afford to. (In fact, they may not be
 able to afford _not_ to, assuming per-message-based pricing.)


 Running a message queue that has a high guarantee of not loosing a message
 is hard and SQS promises exactly that, it *will* deliver your message. If
 a
 use case can handle occasionally dropping messages then running your own
 MQ
 makes more sense.

 SQS is designed to handle massive queues as well, while I haven't found
 any
 examples of queues that have 1 million messages/second being sent or
 received  30k to 100k messages/second is not unheard of [0][1][2].

 [0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s
 [1] http://java.dzone.com/articles/benchmarking-sqs
 [2]
 http://www.slideshare.net/AmazonWebServices/massive-
 message-processing-with-amazon-sqs-and-amazon-
 dynamodb-arc301-aws-reinvent-2013-28431182


 Thanks for digging those up, that's really helpful input. I think number
 [1] kind of summed up part of what I'm arguing here though:

 But once your requirements get above 35k messages per second, chances are
 you need custom solutions anyway; not to mention that while SQS is cheap,
 it may become expensive with such loads.


If you don't require the reliability guarantees that SQS provides then
perhaps. But I would be surprised to hear that a user can set up something
with this level of uptime for less:

Amazon SQS runs within Amazon’s high-availability data centers, so queues
will be available whenever applications need them. To prevent messages from
being lost or becoming unavailable, all messages are stored redundantly
across multiple servers and data centers. [1]




There is also the possibility of using the sharding capabilities of the

 underlying storage. But the pattern of use will determine how effective
 that can be.

 So for example, on the ordering question, if order is defined by a
 single sequence number held in the database and atomically incremented
 for every message published, that is not likely to be something where
 the databases sharding is going to help in scaling the number of
 concurrent publications.

 Though sharding would allow scaling the total number messages on the
 queue (by distributing them over multiple shards), the total ordering of
 those messages reduces it's effectiveness in scaling the number of
 concurrent getters (e.g. the concurrent subscribers in pub-sub) since
 they will all be getting the messages in exactly the same order.

 Strict ordering impacts the competing consumers case also (and is in my
 opinion of limited value as a guarantee anyway). At any given time, the
 head of the queue is in one shard, and all concurrent claim requests
 will contend for messages in that same shard. Though the unsuccessful
 claimants may then move to another shard as the head moves, they will
 all again try to access the messages in the same order.

 So if Zaqar's goal is to scale the number of orthogonal queues, and the
 number of messages held at any time within these, the pooling facility
 and any sharding capability in the underlying store for a pool would
 likely be effective even

Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-22 Thread Joe Gordon
On Mon, Sep 22, 2014 at 7:04 PM, Joe Gordon joe.gord...@gmail.com wrote:



 On Mon, Sep 22, 2014 at 5:47 PM, Zane Bitter zbit...@redhat.com wrote:

 On 22/09/14 17:06, Joe Gordon wrote:

 On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter zbit...@redhat.com wrote:

  On 22/09/14 10:11, Gordon Sim wrote:

  On 09/19/2014 09:13 PM, Zane Bitter wrote:

  SQS offers very, very limited guarantees, and it's clear that the
 reason
 for that is to make it massively, massively scalable in the way that
 e.g. S3 is scalable while also remaining comparably durable (S3 is
 supposedly designed for 11 nines, BTW).

 Zaqar, meanwhile, seems to be promising the world in terms of
 guarantees. (And then taking it away in the fine print, where it says
 that the operator can disregard many of them, potentially without the
 user's knowledge.)

 On the other hand, IIUC Zaqar does in fact have a sharding feature
 (Pools) which is its answer to the massive scaling question.


 There are different dimensions to the scaling problem.


 Many thanks for this analysis, Gordon. This is really helpful stuff.

   As I understand it, pools don't help scaling a given queue since all
 the

 messages for that queue must be in the same pool. At present traffic
 through different Zaqar queues are essentially entirely orthogonal
 streams. Pooling can help scale the number of such orthogonal streams,
 but to be honest, that's the easier part of the problem.


 But I think it's also the important part of the problem. When I talk
 about
 scaling, I mean 1 million clients sending 10 messages per second each,
 not
 10 clients sending 1 million messages per second each.

 When a user gets to the point that individual queues have massive
 throughput, it's unlikely that a one-size-fits-all cloud offering like
 Zaqar or SQS is _ever_ going to meet their needs. Those users will want
 to
 spin up and configure their own messaging systems on Nova servers, and
 at
 that kind of size they'll be able to afford to. (In fact, they may not
 be
 able to afford _not_ to, assuming per-message-based pricing.)


 Running a message queue that has a high guarantee of not loosing a
 message
 is hard and SQS promises exactly that, it *will* deliver your message.
 If a
 use case can handle occasionally dropping messages then running your own
 MQ
 makes more sense.

 SQS is designed to handle massive queues as well, while I haven't found
 any
 examples of queues that have 1 million messages/second being sent or
 received  30k to 100k messages/second is not unheard of [0][1][2].

 [0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s
 [1] http://java.dzone.com/articles/benchmarking-sqs
 [2]
 http://www.slideshare.net/AmazonWebServices/massive-
 message-processing-with-amazon-sqs-and-amazon-
 dynamodb-arc301-aws-reinvent-2013-28431182


 Thanks for digging those up, that's really helpful input. I think number
 [1] kind of summed up part of what I'm arguing here though:

 But once your requirements get above 35k messages per second, chances
 are you need custom solutions anyway; not to mention that while SQS is
 cheap, it may become expensive with such loads.


 If you don't require the reliability guarantees that SQS provides then
 perhaps. But I would be surprised to hear that a user can set up something
 with this level of uptime for less:

 Amazon SQS runs within Amazon’s high-availability data centers, so queues
 will be available whenever applications need them. To prevent messages from
 being lost or becoming unavailable, all messages are stored redundantly
 across multiple servers and data centers. [1]




There is also the possibility of using the sharding capabilities of the

 underlying storage. But the pattern of use will determine how effective
 that can be.

 So for example, on the ordering question, if order is defined by a
 single sequence number held in the database and atomically incremented
 for every message published, that is not likely to be something where
 the databases sharding is going to help in scaling the number of
 concurrent publications.

 Though sharding would allow scaling the total number messages on the
 queue (by distributing them over multiple shards), the total ordering
 of
 those messages reduces it's effectiveness in scaling the number of
 concurrent getters (e.g. the concurrent subscribers in pub-sub) since
 they will all be getting the messages in exactly the same order.

 Strict ordering impacts the competing consumers case also (and is in my
 opinion of limited value as a guarantee anyway). At any given time, the
 head of the queue is in one shard, and all concurrent claim requests
 will contend for messages in that same shard. Though the unsuccessful
 claimants may then move to another shard as the head moves, they will
 all again try to access the messages in the same order.

 So if Zaqar's goal is to scale the number of orthogonal queues, and the
 number of messages held at any time within these, the pooling facility
 and any

Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-22 Thread Joe Gordon
On Mon, Sep 22, 2014 at 8:13 PM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from Joe Gordon's message of 2014-09-22 19:04:03 -0700:
  On Mon, Sep 22, 2014 at 5:47 PM, Zane Bitter zbit...@redhat.com wrote:
 
   On 22/09/14 17:06, Joe Gordon wrote:
  
   On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter zbit...@redhat.com
 wrote:
  
On 22/09/14 10:11, Gordon Sim wrote:
  
On 09/19/2014 09:13 PM, Zane Bitter wrote:
  
SQS offers very, very limited guarantees, and it's clear that the
   reason
   for that is to make it massively, massively scalable in the way
 that
   e.g. S3 is scalable while also remaining comparably durable (S3 is
   supposedly designed for 11 nines, BTW).
  
   Zaqar, meanwhile, seems to be promising the world in terms of
   guarantees. (And then taking it away in the fine print, where it
 says
   that the operator can disregard many of them, potentially without
 the
   user's knowledge.)
  
   On the other hand, IIUC Zaqar does in fact have a sharding feature
   (Pools) which is its answer to the massive scaling question.
  
  
   There are different dimensions to the scaling problem.
  
  
   Many thanks for this analysis, Gordon. This is really helpful stuff.
  
 As I understand it, pools don't help scaling a given queue since
 all
   the
  
   messages for that queue must be in the same pool. At present traffic
   through different Zaqar queues are essentially entirely orthogonal
   streams. Pooling can help scale the number of such orthogonal
 streams,
   but to be honest, that's the easier part of the problem.
  
  
   But I think it's also the important part of the problem. When I talk
   about
   scaling, I mean 1 million clients sending 10 messages per second
 each,
   not
   10 clients sending 1 million messages per second each.
  
   When a user gets to the point that individual queues have massive
   throughput, it's unlikely that a one-size-fits-all cloud offering
 like
   Zaqar or SQS is _ever_ going to meet their needs. Those users will
 want
   to
   spin up and configure their own messaging systems on Nova servers,
 and at
   that kind of size they'll be able to afford to. (In fact, they may
 not be
   able to afford _not_ to, assuming per-message-based pricing.)
  
  
   Running a message queue that has a high guarantee of not loosing a
 message
   is hard and SQS promises exactly that, it *will* deliver your
 message. If
   a
   use case can handle occasionally dropping messages then running your
 own
   MQ
   makes more sense.
  
   SQS is designed to handle massive queues as well, while I haven't
 found
   any
   examples of queues that have 1 million messages/second being sent or
   received  30k to 100k messages/second is not unheard of [0][1][2].
  
   [0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s
   [1] http://java.dzone.com/articles/benchmarking-sqs
   [2]
   http://www.slideshare.net/AmazonWebServices/massive-
   message-processing-with-amazon-sqs-and-amazon-
   dynamodb-arc301-aws-reinvent-2013-28431182
  
  
   Thanks for digging those up, that's really helpful input. I think
 number
   [1] kind of summed up part of what I'm arguing here though:
  
   But once your requirements get above 35k messages per second, chances
 are
   you need custom solutions anyway; not to mention that while SQS is
 cheap,
   it may become expensive with such loads.
 
 
  If you don't require the reliability guarantees that SQS provides then
  perhaps. But I would be surprised to hear that a user can set up
 something
  with this level of uptime for less:
 
  Amazon SQS runs within Amazon’s high-availability data centers, so
 queues
  will be available whenever applications need them. To prevent messages
 from
  being lost or becoming unavailable, all messages are stored redundantly
  across multiple servers and data centers. [1]
 

 This is pretty easily doable with gearman or even just using Redis
 directly. But it is still ops for end users. The AWS users I've talked to
 who use SQS do so because they like that they can use RDS, SQS, and ELB,
 and have only one type of thing to operate: their app.

  
  
  There is also the possibility of using the sharding capabilities of
 the
  
   underlying storage. But the pattern of use will determine how
 effective
   that can be.
  
   So for example, on the ordering question, if order is defined by a
   single sequence number held in the database and atomically
 incremented
   for every message published, that is not likely to be something
 where
   the databases sharding is going to help in scaling the number of
   concurrent publications.
  
   Though sharding would allow scaling the total number messages on the
   queue (by distributing them over multiple shards), the total
 ordering of
   those messages reduces it's effectiveness in scaling the number of
   concurrent getters (e.g. the concurrent subscribers in pub-sub)
 since
   they will all be getting the messages in exactly the same order

Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-18 Thread Joe Gordon
On Thu, Sep 18, 2014 at 9:02 AM, Devananda van der Veen 
devananda@gmail.com wrote:

 On Thu, Sep 18, 2014 at 7:55 AM, Flavio Percoco fla...@redhat.com wrote:
  On 09/18/2014 04:24 PM, Clint Byrum wrote:
  Great job highlighting what our friends over at Amazon are doing.
 
  It's clear from these snippets, and a few other pieces of documentation
  for SQS I've read, that the Amazon team approached SQS from a _massive_
  scaling perspective. I think what may be forcing a lot of this
 frustration
  with Zaqar is that it was designed with a much smaller scale in mind.
 
  I think as long as that is the case, the design will remain in question.
  I'd be comfortable saying that the use cases I've been thinking about
  are entirely fine with the limitations SQS has.
 
  I think these are pretty strong comments with not enough arguments to
  defend them.
 

 Please see my prior email. I agree with Clint's assertions here.

  Saying that Zaqar was designed with a smaller scale in mid without
  actually saying why you think so is not fair besides not being true. So
  please, do share why you think Zaqar was not designed for big scales and
  provide comments that will help the project to grow and improve.
 
  - Is it because the storage technologies that have been chosen?
  - Is it because of the API?
  - Is it because of the programing language/framework ?

 It is not because of the storage technology or because of the
 programming language.

  So far, we've just discussed the API semantics and not zaqar's
  scalability, which makes your comments even more surprising.

 - guaranteed message order
 - not distributing work across a configurable number of back ends

 These are scale-limiting design choices which are reflected in the
 API's characteristics.


I agree with Clint and Devananda



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-18 Thread Joe Gordon
On Thu, Sep 18, 2014 at 7:45 AM, Flavio Percoco fla...@redhat.com wrote:

 On 09/18/2014 04:09 PM, Gordon Sim wrote:
  On 09/18/2014 12:31 PM, Flavio Percoco wrote:
  On 09/17/2014 10:36 PM, Joe Gordon wrote:
  My understanding of Zaqar is that it's like SQS. SQS uses distributed
  queues, which have a few unusual properties [0]:
 
 
   Message Order
 
  Amazon SQS makes a best effort to preserve order in messages, but due
 to
  the distributed nature of the queue, we cannot guarantee you will
  receive messages in the exact order you sent them. If your system
  requires that order be preserved, we recommend you place sequencing
  information in each message so you can reorder the messages upon
  receipt.
 
 
  Zaqar guarantees FIFO. To be more precise, it does that relying on the
  storage backend ability to do so as well. Depending on the storage used,
  guaranteeing FIFO may have some performance penalties.
 
  Would it be accurate to say that at present Zaqar does not use
  distributed queues, but holds all queue data in a storage mechanism of
  some form which may internally distribute that data among servers but
  provides Zaqar with a consistent data model of some form?

 I think this is accurate. The queue's distribution depends on the
 storage ability to do so and deployers will be able to choose what
 storage works best for them based on this as well. I'm not sure how
 useful this separation is from a user perspective but I do see the
 relevance when it comes to implementation details and deployments.


  [...]
  As of now, Zaqar fully relies on the storage replication/clustering
  capabilities to provide data consistency, availability and fault
  tolerance.
 
  Is the replication synchronous or asynchronous with respect to client
  calls? E.g. will the response to a post of messages be returned only
  once the replication of those messages is confirmed? Likewise when
  deleting a message, is the response only returned when replicas of the
  message are deleted?

 It depends on the driver implementation and/or storage configuration.
 For example, in the mongodb driver, we use the default write concern
 called acknowledged. This means that as soon as the message gets to
 the master node (note it's not written on disk yet nor replicated) zaqar
 will receive a confirmation and then send the response back to the
 client. This is also configurable by the deployer by changing the
 default write concern in the mongodb uri using `?w=SOME_WRITE_CONCERN`[0].


This means that by default Zaqar cannot guarantee a message will be
delivered at all. A message can be acknowledged and then the 'master node'
crashes and the message is lost. Zaqar's ability to guarantee delivery is
limited by the reliability of a single node, while something like swift can
only loose a piece of data if 3 machines crash at the same time.



 [0] http://docs.mongodb.org/manual/reference/connection-string/#uri.w

 
  However, as far as consuming messages is concerned, it can
  guarantee once-and-only-once and/or at-least-once delivery depending on
  the message pattern used to consume messages. Using pop or claims
  guarantees the former whereas streaming messages out of Zaqar guarantees
  the later.
 
  From what I can see, pop provides unreliable delivery (i.e. its similar
  to no-ack). If the delete call using pop fails while sending back the
  response, the messages are removed but didn't get to the client.

 Correct, pop works like no-ack. If you want to have pop+ack, it is
 possible to claim just 1 message and then delete it.

 
  What do you mean by 'streaming messages'?

 I'm sorry, that went out wrong. I had the browsability term in my head
 and went with something even worse. By streaming messages I meant
 polling messages without claiming them. In other words, at-least-once is
 guaranteed by default, whereas once-and-only-once is guaranteed just if
 claims are used.

 
  [...]
  Based on our short conversation on IRC last night, I understand you're
  concerned that FIFO may result in performance issues. That's a valid
  concern and I think the right answer is that it depends on the storage.
  If the storage has a built-in FIFO guarantee then there's nothing Zaqar
  needs to do there. In the other hand, if the storage does not have a
  built-in support for FIFO, Zaqar will cover it in the driver
  implementation. In the mongodb driver, each message has a marker that is
  used to guarantee FIFO.
 
  That marker is a sequence number of some kind that is used to provide
  ordering to queries? Is it generated by the database itself?

 It's a sequence number to provide ordering to queries, correct.
 Depending on the driver, it may be generated by Zaqar or the database.
 In mongodb's case it's generated by Zaqar[0].

 [0]

 https://github.com/openstack/zaqar/blob/master/zaqar/queues/storage/mongodb/queues.py#L103-L185

 --
 @flaper87
 Flavio Percoco

 ___
 OpenStack-dev mailing list

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-17 Thread Joe Gordon
On Tue, Sep 16, 2014 at 8:02 AM, Kurt Griffiths 
kurt.griffi...@rackspace.com wrote:

  Right, graphing those sorts of variables has always been part of our
 test plan. What I’ve done so far was just some pilot tests, and I realize
 now that I wasn’t very clear on that point. I wanted to get a rough idea of
 where the Redis driver sat in case there were any obvious bug fixes that
 needed to be taken care of before performing more extensive testing. As it
 turns out, I did find one bug that has since been fixed.

  Regarding latency, saying that it is not important” is an exaggeration;
 it is definitely important, just not the* only *thing that is important.
 I have spoken with a lot of prospective Zaqar users since the inception of
 the project, and one of the common threads was that latency needed to be
 reasonable. For the use cases where they see Zaqar delivering a lot of
 value, requests don't need to be as fast as, say, ZMQ, but they do need
 something that isn’t horribly *slow,* either. They also want HTTP,
 multi-tenant, auth, durability, etc. The goal is to find a reasonable
 amount of latency given our constraints and also, obviously, be able to
 deliver all that at scale.


Can you further quantify what you would consider too slow, is it 100ms too
slow.



  In any case, I’ve continue working through the test plan and will be
 publishing further test results shortly.

   graph latency versus number of concurrent active tenants

  By tenants do you mean in the sense of OpenStack Tenants/Project-ID's or
 in  the sense of “clients/workers”? For the latter case, the pilot tests
 I’ve done so far used multiple clients (though not graphed), but in the
 former case only one “project” was used.


multiple  Tenant/Project-IDs



   From: Joe Gordon joe.gord...@gmail.com
 Reply-To: OpenStack Dev openstack-dev@lists.openstack.org
 Date: Friday, September 12, 2014 at 1:45 PM
 To: OpenStack Dev openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

  If zaqar is like amazon SQS, then the latency for a single message and
 the throughput for a single tenant is not important. I wouldn't expect
 anyone who has latency sensitive work loads or needs massive throughput to
 use zaqar, as these people wouldn't use SQS either. The consistency of the
 latency (shouldn't change under load) and zaqar's ability to scale
 horizontally mater much more. What I would be great to see some other
 things benchmarked instead:

  * graph latency versus number of concurrent active tenants
 * graph latency versus message size
 * How throughput scales as you scale up the number of assorted zaqar
 components. If one of the benefits of zaqar is its horizontal scalability,
 lets see it.
  * How does this change with message batching?

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-17 Thread Joe Gordon
Hi All,

My understanding of Zaqar is that it's like SQS. SQS uses distributed
queues, which have a few unusual properties [0]:
Message Order

Amazon SQS makes a best effort to preserve order in messages, but due to
the distributed nature of the queue, we cannot guarantee you will receive
messages in the exact order you sent them. If your system requires that
order be preserved, we recommend you place sequencing information in each
message so you can reorder the messages upon receipt.
At-Least-Once Delivery

Amazon SQS stores copies of your messages on multiple servers for
redundancy and high availability. On rare occasions, one of the servers
storing a copy of a message might be unavailable when you receive or delete
the message. If that occurs, the copy of the message will not be deleted on
that unavailable server, and you might get that message copy again when you
receive messages. Because of this, you must design your application to be
idempotent (i.e., it must not be adversely affected if it processes the
same message more than once).
Message Sample

The behavior of retrieving messages from the queue depends whether you are
using short (standard) polling, the default behavior, or long polling. For
more information about long polling, see Amazon SQS Long Polling
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html
.

With short polling, when you retrieve messages from the queue, Amazon SQS
samples a subset of the servers (based on a weighted random distribution)
and returns messages from just those servers. This means that a particular
receive request might not return all your messages. Or, if you have a small
number of messages in your queue (less than 1000), it means a particular
request might not return any of your messages, whereas a subsequent request
will. If you keep retrieving from your queues, Amazon SQS will sample all
of the servers, and you will receive all of your messages.

The following figure shows short polling behavior of messages being
returned after one of your system components makes a receive request.
Amazon SQS samples several of the servers (in gray) and returns the
messages from those servers (Message A, C, D, and B). Message E is not
returned to this particular request, but it would be returned to a
subsequent request.



Presumably SQS has these properties because it makes the system scalable,
if so does Zaqar have the same properties (not just making these same
guarantees in the API, but actually having these properties in the
backends)? And if not, why? I looked on the wiki [1] for information on
this, but couldn't find anything.





[0]
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/DistributedQueues.html
[1] https://wiki.openstack.org/wiki/Zaqar
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] PostgreSQL jobs slow in the gate

2014-09-17 Thread Joe Gordon
Postgres is also logging a lot of errors:
http://logs.openstack.org/63/122263/1/check/check-tempest-dsvm-postgres-full/2f27252/logs/postgres.txt.gz

On Wed, Sep 17, 2014 at 4:49 PM, Clark Boylan cboy...@sapwetik.org wrote:

 Hello,

 Recent sampling of test run times shows that our tempest jobs run
 against clouds using PostgreSQL are significantly slower than jobs run
 against clouds using MySQL.

 (check|gate)-tempest-dsvm-full has an average run time of 52.9 minutes
 (stddev 5.92 minutes) over 516 runs.
 (check|gate)-tempest-dsvm-postgres-full has an average run time of 73.78
 minutes (stddev 11.01 minutes) over 493 runs.

 I think this is a bug and and an important one to solve prior to release
 if we want to continue to care and feed for PostgreSQL support. I
 haven't filed a bug in LP because I am not sure where the slowness is
 and creating a bug against all the projects is painful. (If there are
 suggestions for how to do this in a non painful way I will happily go
 file a proper bug).

 Is there interest in fixing this? If not we should probably reconsider
 removing these PostgreSQL jobs from the gate.


++ to getting someone to own and fix this or drop it from the gate.


 Note, a quick spot check indicates the increase in job time is not
 related to job setup. Total time before running tempest appears to be
 just over 18 minutes in the jobs I checked.

 Thank you,
 Clark

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Its time to start identifying release critical bugs

2014-09-17 Thread Joe Gordon
On Tue, Sep 16, 2014 at 2:58 AM, Michael Still mi...@stillhq.com wrote:

 Hi.

 Is time to start identifying (and working on) release critical bugs in
 nova before we ship RC1.

 My initial position is that any critical bug is release critical.
 There are currently critical bugs not targeted to rc1, but that should
 change in the next day or so. If we're not interested in fixing a
 critical bug in rc1, I think we should start to question if it is
 really critical.

 I'd also like help in deciding what other bugs are critical to be
 fixed before release. Please use this thread to suggest such things.


I went through the top nova induced gate bugs, and made sure they are
marked as critical and targeted [0].  As of the writing of this email we
have 5 critical bugs, none of which have an assignee. Of those 5, 4 are
gate bugs, with the two worst being:

* Bug 1323658 - Nova resize/restart results in guest ending up in
inconsistent state Top gate bug
* Bug 1357578 - Unit test:
nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_terminate_sigterm
timing out in gate



[0]
https://bugs.launchpad.net/nova/+bugs?field.searchtext=orderby=-importancefield.status%3Alist=NEWfield.status%3Alist=CONFIRMEDfield.status%3Alist=TRIAGEDfield.status%3Alist=INPROGRESSfield.status%3Alist=INCOMPLETE_WITH_RESPONSEfield.status%3Alist=INCOMPLETE_WITHOUT_RESPONSEfield.importance%3Alist=CRITICALassignee_option=anyfield.assignee=field.bug_reporter=field.bug_commenter=field.subscriber=field.structural_subscriber=field.tag=field.tags_combinator=ANYfield.has_cve.used=field.omit_dupes.used=field.omit_dupes=onfield.affects_me.used=field.has_patch.used=field.has_branches.used=field.has_branches=onfield.has_no_branches.used=field.has_no_branches=onfield.has_blueprints.used=field.has_blueprints=onfield.has_no_blueprints.used=field.has_no_blueprints=onsearch=Search




 Thanks,
 Michael

 --
 Rackspace Australia

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] What's holding nova development back?

2014-09-16 Thread Joe Gordon
On Sep 15, 2014 8:31 PM, Jay Pipes jaypi...@gmail.com wrote:

 On 09/15/2014 08:07 PM, Jeremy Stanley wrote:

 On 2014-09-15 17:59:10 -0400 (-0400), Jay Pipes wrote:
 [...]

 Sometimes it's pretty hard to determine whether something in the
 E-R check page is due to something in the infra scripts, some
 transient issue in the upstream CI platform (or part of it), or
 actually a bug in one or more of the OpenStack projects.

 [...]

 Sounds like an NP-complete problem, but if you manage to solve it
 let me know and I'll turn it into the first line of triage for Infra
 bugs. ;)


 LOL, thanks for making me take the last hour reading Wikipedia pages
about computational complexity theory! :P

 No, in all seriousness, I wasn't actually asking anyone to boil the
ocean, mathematically. I think doing a couple things just making the
categorization more obvious (a UI thing, really) and doing some (hopefully
simple?) inspection of some control group of patches that we know do not
introduce any code changes themselves and comparing to another group of
patches that we know *do* introduce code changes to Nova, and then seeing
if there are a set of E-R issues that consistently appear in *both* groups.
That set of E-R issues has a higher likelihood of not being due to Nova,
right?

We use launchpad's affected projects listings on the elastic recheck page
to say what may be causing the bug.  Tagging projects to bugs is a manual
process, but one that works pretty well.

UI: The elastic recheck UI definitely could use some improvements. I am
very poor at writing UIs, so patches welcome!


 OK, so perhaps it's not the most scientific or well-thought out plan, but
hey, it's a spark for thought... ;)

 Best,
 -jay


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-12 Thread Joe Gordon
On Tue, Sep 9, 2014 at 12:19 PM, Kurt Griffiths 
kurt.griffi...@rackspace.com wrote:

 Hi folks,

 In this second round of performance testing, I benchmarked the new Redis
 driver. I used the same setup and tests as in Round 1 to make it easier to
 compare the two drivers. I did not test Redis in master-slave mode, but
 that likely would not make a significant difference in the results since
 Redis replication is asynchronous[1].

 As always, the usual benchmarking disclaimers apply (i.e., take these
 numbers with a grain of salt; they are only intended to provide a ballpark
 reference; you should perform your own tests, simulating your specific
 scenarios and using your own hardware; etc.).

 ## Setup ##

 Rather than VMs, I provisioned some Rackspace OnMetal[3] servers to
 mitigate noisy neighbor when running the performance tests:

 * 1x Load Generator
 * Hardware
 * 1x Intel Xeon E5-2680 v2 2.8Ghz
 * 32 GB RAM
 * 10Gbps NIC
 * 32GB SATADOM
 * Software
 * Debian Wheezy
 * Python 2.7.3
 * zaqar-bench
 * 1x Web Head
 * Hardware
 * 1x Intel Xeon E5-2680 v2 2.8Ghz
 * 32 GB RAM
 * 10Gbps NIC
 * 32GB SATADOM
 * Software
 * Debian Wheezy
 * Python 2.7.3
 * zaqar server
 * storage=mongodb
 * partitions=4
 * MongoDB URI configured with w=majority
 * uWSGI + gevent
 * config: http://paste.openstack.org/show/100592/
 * app.py: http://paste.openstack.org/show/100593/
 * 3x MongoDB Nodes
 * Hardware
 * 2x Intel Xeon E5-2680 v2 2.8Ghz
 * 128 GB RAM
 * 10Gbps NIC
 * 2x LSI Nytro WarpDrive BLP4-1600[2]
 * Software
 * Debian Wheezy
 * mongod 2.6.4
 * Default config, except setting replSet and enabling periodic
   logging of CPU and I/O
 * Journaling enabled
 * Profiling on message DBs enabled for requests over 10ms
 * 1x Redis Node
 * Hardware
 * 2x Intel Xeon E5-2680 v2 2.8Ghz
 * 128 GB RAM
 * 10Gbps NIC
 * 2x LSI Nytro WarpDrive BLP4-1600[2]
 * Software
 * Debian Wheezy
 * Redis 2.4.14
 * Default config (snapshotting and AOF enabled)
 * One process

 As in Round 1, Keystone auth is disabled and requests go over HTTP, not
 HTTPS. The latency introduced by enabling these is outside the control of
 Zaqar, but should be quite minimal (speaking anecdotally, I would expect
 an additional 1-3ms for cached tokens and assuming an optimized TLS
 termination setup).

 For generating the load, I again used the zaqar-bench tool. I would like
 to see the team complete a large-scale Tsung test as well (including a
 full HA deployment with Keystone and HTTPS enabled), but decided not to
 wait for that before publishing the results for the Redis driver using
 zaqar-bench.

 CPU usage on the Redis node peaked at around 75% for the one process. To
 better utilize the hardware, a production deployment would need to run
 multiple Redis processes and use Zaqar's backend pooling feature to
 distribute queues across the various instances.

 Several different messaging patterns were tested, taking inspiration
 from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar)

 Each test was executed three times and the best time recorded.

 A ~1K sample message (1398 bytes) was used for all tests.

 ## Results ##

 ### Event Broadcasting (Read-Heavy) ###

 OK, so let's say you have a somewhat low-volume source, but tons of event
 observers. In this case, the observers easily outpace the producer, making
 this a read-heavy workload.

 Options
 * 1 producer process with 5 gevent workers
 * 1 message posted per request
 * 2 observer processes with 25 gevent workers each
 * 5 messages listed per request by the observers
 * Load distributed across 4[6] queues
 * 10-second duration


10 seconds is way too short



 Results
 * Redis
 * Producer: 1.7 ms/req,  585 req/sec
 * Observer: 1.5 ms/req, 1254 req/sec
 * Mongo
 * Producer: 2.2 ms/req,  454 req/sec
 * Observer: 1.5 ms/req, 1224 req/sec


If zaqar is like amazon SQS, then the latency for a single message and the
throughput for a single tenant is not important. I wouldn't expect anyone
who has latency sensitive work loads or needs massive throughput to use
zaqar, as these people wouldn't use SQS either. The consistency of the
latency (shouldn't change under load) and zaqar's ability to scale
horizontally mater much more. What I would be great to see some other
things benchmarked instead:

* graph latency versus number of concurrent active tenants
* graph latency versus message size
* How throughput scales as you scale up the number of assorted zaqar
components. If one of the benefits of zaqar is its horizontal scalability,
lets see it.
* How does this change 

Re: [openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers

2014-09-12 Thread Joe Gordon
On Thu, Sep 11, 2014 at 2:18 AM, Daniel P. Berrange berra...@redhat.com
wrote:

 On Thu, Sep 11, 2014 at 09:23:34AM +1000, Michael Still wrote:
  On Thu, Sep 11, 2014 at 8:11 AM, Jay Pipes jaypi...@gmail.com wrote:
 
   a) Sorting out the common code is already accounted for in Dan B's
 original
   proposal -- it's a prerequisite for the split.
 
  Its a big prerequisite though. I think we're talking about a release
  worth of work to get that right. I don't object to us doing that work,
  but I think we need to be honest about how long its going to take. It
  will also make the core of nova less agile, as we'll find it hard to
  change the hypervisor driver interface over time. Do we really think
  its ready to be stable?

 Yes, in my proposal I explicitly said we'd need to have Kilo
 for all the prep work to clean up the virt API, before only
 doing the split in Lx.

 The actual nova/virt/driver.py has been more stable over the
 past few releases than I thought it would be. In terms of APIs
 we're not really modified existing APIs, mostly added new ones.
 Where we did modify existing APIs, we could have easily taken
 the approach of adding a new API in parallel and deprecating
 the old entry point to maintain compat.

 The big change which isn't visible directly is the conversion
 of internal nova code to use objects. Finishing this conversion
 is clearly a pre-requisite to any such split, since we'd need
 to make sure all data passed into the nova virt APIs as parameters
 is stable  well defined.

  As an alternative approach...
 
  What if we pushed most of the code for a driver into a library?
  Imagine a library which controls the low level operations of a
  hypervisor -- create a vm, attach a NIC, etc. Then the driver would
  become a shim around that which was relatively thin, but owned the
  interface into the nova core. The driver handles the nova specific
  things like knowing how to create a config drive, or how to
  orchestrate with cinder, but hands over all the hypervisor operations
  to the library. If we found a bug in the library we just pin our
  dependancy on the version we know works whilst we fix things.
 
  In fact, the driver inside nova could be a relatively generic library
  driver, and we could have multiple implementations of the library,
  one for each hypervisor.

 I don't think that particularly solves the problem, particularly
 the ones you are most concerned about above of API stability. The
 naive impl of any library for the virt driver would pretty much
 mirror the nova virt API. The virt driver impls would thus have to
 do the job of taking the Nova objects passed in as parameters and
 turning them into something stable to pass to the library. Except
 now instead of us only having to figure out a stable API in one
 place, every single driver has to reinvent the wheel defining their
 own stable interface  objects. I'd also be concerned that ongoing
 work on drivers is still going to require alot of patches to Nova
 to update the shims all the time, so we're still going to contend
 on resource fairly highly.

   b) The conflict Dan is speaking of is around the current situation
 where we
   have a limited core review team bandwidth and we have to pick and
 choose
   which virt driver-specific features we will review. This leads to bad
   feelings and conflict.
 
  The way this worked in the past is we had cores who were subject
  matter experts in various parts of the code -- there is a clear set of
  cores who get xen or libivrt for example and I feel like those
  drivers get reasonable review times. What's happened though is that
  we've added a bunch of drivers without adding subject matter experts
  to core to cover those drivers. Those newer drivers therefore have a
  harder time getting things reviewed and approved.

 FYI, for Juno at least I really don't consider that even the libvirt
 driver got acceptable review times in any sense. The pain of waiting
 for reviews in libvirt code I've submitted this cycle is what prompted
 me to start this thread. All the virt drivers are suffering way more
 than they should be, but those without core team representation suffer


Can't you replace the word 'libvirt code' with 'nova code' and this would
still be true? Do you think landing virt driver code is harder then landing
non virt driver code? If so do you have any numbers to back this up?

If the issue here is 'landing code in nova is too painful', then we should
discuss solving that more generalized issue first, and maybe we conclude
that pulling out the virt drivers gets us the most bang for our buck.  But
unless we have that more general discussion, saying the right fix for that
is to spend a large amount of time  working specifically on virt driver
related issues seems premature.


 to an even greater degree.  And this is ignoring the point Jay  I
 were making about how the use of a single team means that there is
 always contention for feature approval, so much 

Re: [openstack-dev] Kilo Cycle Goals Exercise

2014-09-09 Thread Joe Gordon
On Wed, Sep 3, 2014 at 8:37 AM, Joe Gordon joe.gord...@gmail.com wrote:

 As you all know, there has recently been several very active discussions
 around how to improve assorted aspects of our development process. One idea
 that was brought up is to come up with a list of cycle goals/project
 priorities for Kilo [0].

 To that end, I would like to propose an exercise as discussed in the TC
 meeting yesterday [1]:
 Have anyone interested (especially TC members) come up with a list of what
 they think the project wide Kilo cycle goals should be and post them on
 this thread by end of day Wednesday, September 10th. After which time we
 can begin discussing the results.
 The goal of this exercise is to help us see if our individual world views
 align with the greater community, and to get the ball rolling on a larger
 discussion of where as a project we should be focusing more time.




1. Strengthen our north bound APIs

* API micro-versioning
* Improved CLI's and SDKs
* Better capability discovery
* Hide usability issues with client side logic
* Improve reliability

As others have said in this thread trying to use OpenStack as a user is a
very frustrating experience. For a long time now we have focused on
southbound APIs such as drivers, configuration options, supported
architectures etc. But as a project we have not spent nearly enough time on
the end user experience. If our northbound APIs aren't something developers
want to use, our southbound API work doesn't matter.

2. 'Fix' our development process

* openstack-specs. Currently we don't have any good way to work on big
entire-project efforts, hopefully something like a openstack-specs repo
(with liasons from each core-team reviewing it) will help make it possible
for us to tackle these issues.  I see us addressing the API
micro-versioning and capability  discovery issues here.
* functional testing and post merge testing. As discussed elsewhere in this
thread our current testing model isn't meeting our current requirements.

3. Pay down technical debt

This is the one I am actually least sure about, as I can really only speak
for nova on this one. In our constant push forward we have accumulated a
lot of technical debt. The debt manifests itself as hard to maintain code,
bugs (nova had over 1000 open bugs until yesterday), performance/scaling
issues and missing basic features. I think its time for us to take
inventory if our technical debt and fix some of the biggest issues.



 best,
 Joe Gordon

 [0]
 http://lists.openstack.org/pipermail/openstack-dev/2014-August/041929.html
 [1]
 http://eavesdrop.openstack.org/meetings/tc/2014/tc.2014-09-02-20.04.log.html

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] memory usage in devstack-gate (the oom-killer strikes again)

2014-09-08 Thread Joe Gordon
Hi All,

We have recently started seeing assorted memory issues in the gate
including the oom-killer [0] and libvirt throwing memory errors [1].
Luckily we run ps and dstat on every devstack run so we have some insight
into why we are running out of memory. Based on the output from job taken
at random [2][3] a typical run consists of:

* 68 openstack api processes alone
* the following services are running 8 processes (number of CPUs on test
nodes)
  * nova-api (we actually run 24 of these, 8 compute, 8 EC2, 8 metadata)
  * nova-conductor
  * cinder-api
  * glance-api
  * trove-api
  * glance-registry
  * trove-conductor
* together nova-api, nova-conductor, cinder-api alone take over 45 %MEM
(note: some of that is memory usage is counted multiple times as RSS
includes shared libraries)
* based on dstat numbers, it looks like we don't use that much memory
before tempest runs, and after tempest runs we use a lot of memory.

Based on this information I have two categories of questions:

1) Should we explicitly set the number of workers that services use in
devstack? Why have so many workers in a small all-in-one environment? What
is the right balance here?

2) Should we be worried that some OpenStack services such as nova-api,
nova-conductor and cinder-api take up so much memory? Does there memory
usage keep growing over time, does anyone have any numbers to answer this?
Why do these processes take up so much memory?

best,
Joe


[0]
http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwib29tLWtpbGxlclwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiIxNzI4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDEwMjExMjA5NzY3fQ==
[1] https://bugs.launchpad.net/nova/+bug/1366931
[2] http://paste.openstack.org/show/108458/
[3]
http://logs.openstack.org/83/119183/4/check/check-tempest-dsvm-full/ea576e7/logs/screen-dstat.txt.gz
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-05 Thread Joe Gordon
On Fri, Sep 5, 2014 at 4:05 AM, Nikola Đipanov ndipa...@redhat.com wrote:

 On 09/04/2014 10:25 PM, Solly Ross wrote:
  Anyway, I think it would be useful to have some sort of page where people
  could say I'm an SME in X, ask me for reviews and then patch
 submitters could go
  and say, oh, I need an someone to review my patch about storage
 backends, let me
  ask sross.
 

 This is a good point - I've been thinking along similar lines that we
 really could have a huge win in terms of the review experience by
 building a tool (maybe a social network looking one :)) that relates
 reviews to people being able to do them, visualizes reviewer karma and
 other things that can help make the code submissions and reviews more
 human friendly.

 Dan seems to dismiss the idea of improved tooling as something that can
 get us only thus far, but I am not convinced. However - this will
 require even more manpower and we are already ridiculously short on that
 so...


I have previously toyed with idea of making such a tool, and if someone
else wants to work on it I would be happy to help.



 N.


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][FFE] Feature Freeze exception for juno-slaveification

2014-09-05 Thread Joe Gordon
On Fri, Sep 5, 2014 at 9:11 AM, Mike Wilson geekinu...@gmail.com wrote:

 Hi all,

 I am requesting an exception for the juno-slaveification blueprint. There
 is a single outstanding patch [1] which has already been approved before,
 but needed to be re-spun due to gate failures which then necessitated a
 rebase. For those not familiar with the work, the spec[2] can shed some
 more light on the scope of work for Juno.

 All the other patches from this blueprint have merged, the only remaining
 patch really just needs a +W as it has been extensively reviewed and
 already approved previously. This may be an easy candidate since Andrew
 Laski, Jay Pipes and Dan Smith have reviewed and +2'd this already.


I am happy to sponsor this, as this is the last patch needed to finish a BP
and the patch was approved once bit needed a rebase over some objects
changes. The change from the approved version is pretty minor.



 Thanks,

 Mike Wilson

 [1] https://review.openstack.org/#/c/103064/
 [2]
 http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/juno/juno-slaveification.rst


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [feature freeze exception] Move to oslo.db

2014-09-04 Thread Joe Gordon
On Wed, Sep 3, 2014 at 11:30 PM, Michael Still mi...@stillhq.com wrote:

 I'm good with this one too, so that makes three if Joe is ok with this.


I am ok with this, I hope the move to oslo.db will fix a few bugs for us
and the nova patch to review isn't too bad.



 @Josh -- can you please take a look at the TH failures?

 Thanks,
 Michael

 On Wed, Sep 3, 2014 at 8:10 PM, Matt Riedemann
 mrie...@linux.vnet.ibm.com wrote:
 
 
  On 9/3/2014 5:08 PM, Andrey Kurilin wrote:
 
  Hi All!
 
  I'd like to ask for a feature freeze exception for porting nova to use
  oslo.db.
 
  This change not only removes 3k LOC, but fixes 4 bugs(see commit message
  for more details) and provides relevant, stable common db code.
 
  Main maintainers of oslo.db(Roman Podoliaka and Victor Sergeyev) are OK
  with this.
 
  Joe Gordon and Matt Riedemann are already signing up, so we need one
  more vote from Core developer.
 
  By the way a lot of core projects are using already oslo.db for a
  while:  keystone, cinder, glance, ceilometer, ironic, heat, neutron and
  sahara. So migration to oslo.db won’t produce any unexpected issues.
 
  Patch is here: https://review.openstack.org/#/c/101901/
 
  --
  Best regards,
  Andrey Kurilin.
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
  Just re-iterating my agreement to sponsor this.  I'm waiting for the
 latest
  patch set to pass Jenkins and for Roman to review after his comments from
  the previous patch set and -1.  Otherwise I think this is nearly ready to
  go.
 
  The turbo-hipster failures on the change appear to be infra issues in t-h
  rather than problems with the code.
 
  --
 
  Thanks,
 
  Matt Riedemann
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 --
 Rackspace Australia

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-04 Thread Joe Gordon
On Thu, Sep 4, 2014 at 2:23 AM, John Garbutt j...@johngarbutt.com wrote:

 Sorry for another top post, but I like how Nikola has pulled this
 problem apart, and wanted to respond directly to his response.

 On 3 September 2014 10:50, Nikola Đipanov ndipa...@redhat.com wrote:
  The reason many features including my own may not make the FF is not
  because there was not enough buy in from the core team (let's be
  completely honest - I have 3+ other core members working for the same
  company that are by nature of things easier to convince), but because of
  any of the following:
 
  * Crippling technical debt in some of the key parts of the code

 +1

 We have problems that need solving.

 One of the ideas behind the slots proposal is to encourage work on
 the urgent technical debt, before related features are even approved.

  * that we have not been acknowledging as such for a long time

 -1

 We keep saying thats cool, but we have to fix/finish XXX first.

 But... we have been very bad at:
 * remembering that, and recording that
 * actually fixing those problems

  * which leads to proposed code being arbitrarily delayed once it makes
  the glaring flaws in the underlying infra apparent

 Sometimes we only spot this stuff in code reviews, where you throw up
 reading all the code around the change, and see all the extra
 complexity being added to a fragile bit of the code, and well, then
 you really don't want to be the person who clicks approve on that.

 We need to track this stuff better. Every time it happens, we should
 try make a not to go back there and do more tidy ups.

  * and that specs process has been completely and utterly useless in
  helping uncover (not that process itself is useless, it is very useful
  for other things)

 Yeah, it hasn't helped for this.

 I don't think we should do this, but I keep thinking about making
 specs two step:
 * write generally direction doc
 * go write the code, maybe upload as WIP
 * write the documentation part of the spec
 * get docs merged before any code

  I am almost positive we can turn this rather dire situation around
  easily in a matter of months, but we need to start doing it! It will not
  happen through pinning arbitrary numbers to arbitrary processes.

 +1

 This is ongoing, but there are some major things, I feel we should
 stop and fix in kilo.

 ...and that will make getting features in much worse for a little
 while, but it will be much better on the other side.

  I will follow up with a more detailed email about what I believe we are
  missing, once the FF settles and I have applied some soothing creme to
  my burnout wounds

 Awesome, please catch up with jogo who was also trying to build this
 list. I would love to continue to contribute to that too.


I am not actually trying to build that list yet, right now I am trying to
get consensus on the idea of having project priorities:
https://review.openstack.org/#/c/112733/  Once that patch lands I was going
to start iterating on a kilo priorities patch so we have something written
down (in nova-specs) that we can go off for summit planning.



 Might be working moving into here:
 https://etherpad.openstack.org/p/kilo-nova-summit-topics

 The idea was/is to use that list to decide what fills up the majority
 of code slots in Juno.

  but currently my sentiment is:
  Contributing features to Nova nowadays SUCKS!!1 (even as a core
  reviewer) We _have_ to change that!

 Agreed.


 In addition, our bug list would suggest our users are seeing the
 impact of this technical debt.


 My personal feeling is we also need to tidy up our testing debt too:
 * document major bits that are NOT tested, so users are clear
 * document what combinations and features we actually see tested up stream
 * support different levels of testing: on-demand+daily vs every commit
 * making it easier to interested parties to own and maintain some testing
 * plan for removing the untested code paths in L
 * allow for untested code to enter the tree, as experimental, with the
 expectation it gets removed in the following release if not tested,
 and architected so that is possible (note this means supporting
 experimental APIs that can be ripped out at a later date.)

 We have started doing some of the above work. But I think we need to
 hold ALL code to the same standard. It seems it will take time to
 agree on that standard, but the above is an attempt to compromise
 between speed of innovation and stability.


 Thanks,
 John

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Joe Gordon
On Thu, Sep 4, 2014 at 3:24 AM, Daniel P. Berrange berra...@redhat.com
wrote:

 Position statement
 ==

 Over the past year I've increasingly come to the conclusion that
 Nova is heading for (or probably already at) a major crisis. If
 steps are not taken to avert this, the project is likely to loose
 a non-trivial amount of talent, both regular code contributors and
 core team members. That includes myself. This is not good for
 Nova's long term health and so should be of concern to anyone
 involved in Nova and OpenStack.

 For those who don't want to read the whole mail, the executive
 summary is that the nova-core team is an unfixable bottleneck
 in our development process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt
 drivers out of tree and let them all have their own core teams
 in their area of code, leaving current nova core to focus on
 all the common code outside the virt driver impls. I, now, none
 the less urge people to read the whole mail.


 Background information
 ==

 I see many factors coming together to form the crisis

  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed  merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing

 Each item on their own may not seem too bad, but combined they
 add up to a big problem.

 Core team burn out
 --

 Having been involved in Nova for several dev cycles now, it is clear
 that the backlog of code up for review never goes away. Even
 intensive code review efforts at various points in the dev cycle
 makes only a small impact on the backlog. This has a pretty
 significant impact on core team members, as their work is never
 done. At best, the dial is sometimes set to 10, instead of 11.

 Many people, myself included, have built tools to help deal with
 the reviews in a more efficient manner than plain gerrit allows
 for. These certainly help, but they can't ever solve the problem
 on their own - just make it slightly more bearable. And this is
 not even considering that core team members might have useful
 contributions to make in ways beyond just code review. Ultimately
 the workload is just too high to sustain the levels of review
 required, so core team members will eventually burn out (as they
 have done many times already).

 Even if one person attempts to take the initiative to heavily
 invest in review of certain features it is often to no avail.
 Unless a second dedicated core reviewer can be found to 'tag
 team' it is hard for one person to make a difference. The end
 result is that a patch is +2d and then sits idle for weeks or
 more until a merge conflict requires it to be reposted at which
 point even that one +2 is lost. This is a pretty demotivating
 outcome for both reviewers  the patch contributor.


 New core team talent
 

 It can't escape attention that the Nova core team does not grow
 in size very often. When Nova was younger and its code base was
 smaller, it was easier for contributors to get onto core because
 the base level of knowledge required was that much smaller. To
 get onto core today requires a major investment in learning Nova
 over a year or more. Even people who potentially have the latent
 skills may not have the time available to invest in learning the
 entire of Nova.

 With the number of reviews proposed to Nova, the core team should
 probably be at least double its current size[1]. There is plenty of
 expertize in the project as a whole but it is typically focused
 into specific areas of the codebase. There is nowhere we can find
 20 more people with broad knowledge of the codebase who could be
 promoted even over the next year, let alone today. This is ignoring
 that many existing members of core are relatively inactive due to
 burnout and so need replacing. That means we really need another
 25-30 people for core. That's not going to happen.


 Code review delays
 --

 The obvious result of having too much work for too few reviewers
 is that code contributors face major delays in getting their work
 reviewed and merged. From personal experience, during Juno, I've
 probably spent 1 week in aggregate on actual code development vs
 8 weeks on waiting on code review. You have to constantly be on
 alert for review comments because unless you can respond quickly
 (and repost) while you still have the attention of the reviewer,
 they may not be look again for days/weeks.

 The length of time to get work merged serves as a demotivator to
 actually do work in the first place. I've personally avoided doing
 alot of code refactoring  cleanup work that would improve the
 maintainability of the libvirt driver in the long term, because
 I can't face the battle to get it reviewed  

[openstack-dev] Kilo Cycle Goals Exercise

2014-09-03 Thread Joe Gordon
As you all know, there has recently been several very active discussions
around how to improve assorted aspects of our development process. One idea
that was brought up is to come up with a list of cycle goals/project
priorities for Kilo [0].

To that end, I would like to propose an exercise as discussed in the TC
meeting yesterday [1]:
Have anyone interested (especially TC members) come up with a list of what
they think the project wide Kilo cycle goals should be and post them on
this thread by end of day Wednesday, September 10th. After which time we
can begin discussing the results.
The goal of this exercise is to help us see if our individual world views
align with the greater community, and to get the ball rolling on a larger
discussion of where as a project we should be focusing more time.


best,
Joe Gordon

[0]
http://lists.openstack.org/pipermail/openstack-dev/2014-August/041929.html
[1]
http://eavesdrop.openstack.org/meetings/tc/2014/tc.2014-09-02-20.04.log.html
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra][qa][neutron] Neutron full job, advanced services, and the integrated gate

2014-09-03 Thread Joe Gordon
On Tue, Aug 26, 2014 at 4:47 PM, Salvatore Orlando sorla...@nicira.com
wrote:

 TL; DR
 A few folks are proposing to stop running tests for neutron advanced
 services [ie: (lb|vpn|fw)aas] in the integrated gate, and run them only on
 the neutron gate.

 Reason: projects like nova are 100% orthogonal to neutron advanced
 services. Also, there have been episodes in the past of unreliability of
 tests for these services, and it would be good to limit affected projects
 considering that more api tests and scenarios are being added.

 -

 So far the neutron full job runs tests (api and scenarios) for neutron
 core functionality as well as neutron advanced services, which run as
 neutron service plugin.

 It's highly unlikely, if not impossible, that changes in projects such as
 nova, glance or ceilometer can have an impact on the stability of these
 services.
 On the other hand, instability in these services can trigger gate failures
 in unrelated projects as long as tests for these services are run in the
 neutron full job in the integrated gate. There have already been several
 gate-breaking bugs in lbaas scenario tests are firewall api tests.
 Admittedly, advanced services do not have the same level of coverage as
 core neutron functionality. Therefore as more tests are being added, there
 is an increased possibility of unearthing dormant bugs.


I support this split but for slightly different reasons.  I am under the
impression that neutron advanced services are not ready for prime time. If
that is correct I don't think we should be gating on things that aren't
ready.



 For this reason we are proposing to not run anymore tests for neutron
 advanced services in the integrated gate, but keep them running on the
 neutron gate.
 This means we will have two neutron jobs:
 1) check-tempest-dsvm-neutron-full which will run only core neutron
 functionality
 2) check-tempest-dsvm-neutron-full-ext which will be what the neutron full
 job is today.


Using my breakdown, the extended job would include experimental neutron
features.



 The former will be part of the integrated gate, the latter will be part of
 the neutron gate.
 Considering that other integrating services should not have an impact on
 neutron advanced services, this should not make gate testing asymmetric.

 However, there might be exceptions for:
 - orchestration project like heat which in the future might leverage
 capabilities like load balancing
 - oslo-* libraries, as changes in them might have an impact on neutron
 advanced services, since they consume those libraries


Once another service starts consuming an advanced feature I think it makes
sense to move it to the main neutron-full job. Especially if we assume that
things will only depend on neutron features that are not too experimental.



 Another good question is whether extended tests should be performed as
 part of functional or tempest checks. My take on this is that scenario
 tests should always be part of tempest. On the other hand I reckon API
 tests should exclusively be part of functional tests, but as so far tempest
 is running a gazillion of API tests, this is probably a discussion for the
 medium/long term.

 In order to add this new job there are a few patches under review:
 [1] and [2] Introduces the 'full-ext' job and devstack-gate support for it.
 [3] Are the patches implementing a blueprint which will enable us to
 specify for which extensions test should be executed.

 Finally, one more note about smoketests. Although we're planning to get
 rid of them soon, we still have failures in the pg job because of [4]. For
 this reasons smoketests are still running for postgres in the integrated
 gate. As load balancing and firewall API tests are part of it, they should
 be removed from the smoke test executed on the integrated gate ([5], [6]).
 This is a temporary measure until the postgres issue is fixed.


++



 Regards,
 Salvatore

 [1] https://review.openstack.org/#/c/114933/
 [2] https://review.openstack.org/#/c/114932/
 [3]
 https://review.openstack.org/#/q/status:open+branch:master+topic:bp/branchless-tempest-extensions,n,z
 [4] https://bugs.launchpad.net/nova/+bug/1305892
 [5] https://review.openstack.org/#/c/115022/
 [6] https://review.openstack.org/#/c/115023/


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] libvirt version_cap, a postmortem

2014-09-03 Thread Joe Gordon
 letting situations fester.


 Thanks, and sorry for being a windbag,
 Mark.

 ---

 = July 1 =

 The starting point is this review:

https://review.openstack.org/103923

 Dan Smith proposes a policy that the libvirt driver may not use libvirt
 features until they have been available in Ubuntu or Fedora for at least
 30 days.

 The commit message mentions:

   broken us in the past when we add a new feature that requires a newer
libvirt than we test with, and we discover that it's totally broken
when we upgrade in the gate.

 which AIUI is a reference to the libvirt live snapshot issue the
 previous week, which is described here:

   https://review.openstack.org/102643

 where upgrading to Ubuntu Trusty meant the libvirt version in use in the
 gate went from 0.9.8 to 1.2.2, which caused the live snapshot code
 paths in Nova for the first time, which appeared to be related to some
 serious gate instability (although the exact root cause wasn't
 identified).

 Some background on the libvirt version upgrade can be seen here:


 http://lists.openstack.org/pipermail/openstack-dev/2014-March/thread.html#30284

 = July 1 - July 8 =

 Back and forth debate mostly between Dan Smith and Dan Berrange. Sean
 votes +2, Dan Berrange votes -2.

 = July 14 =

 Russell adds his support to Dan Berrange's position, votes -2. Some
 debate between Dan and Dan continues. Joe Gordon votes +2. Matt
 Riedemann expresses support-in-principal for Dan Smith's approach.

 = July 15 =

 Debate continues ...

 16:12 - I -2 the patch and attempt to take a step back and think about
 how we could have prevented (or at least mitigated against) the live
 snapshot issue and suggest the idea of adding a new configuration
 option:

   [libvirt]
   version_cap = 1.2.2

 which would mean we would not automatically start using new libvirt
 features in the gate because of a libvirt version upgrade, but instead
 the new features would only begin to be used when we merge a change to
 the default value of version_cap.

 16:31 - I leave a separate comment addressing the broader debate about
 our functional test coverage requirements.

 16:46 - Dan Berrange likes the version_cap idea

 15:37 - Dan Berrange posts an implementation of version_cap:

   https://review.openstack.org/107119

 and links to it from in Dan Smith's libvirt testing policy review (#103923)

 21:49 - Matt expresses some support for the config option, but worries
 about the precedent being set.

 23:14 - Dan Berrange explains his point of view that a test all the
 things rule must mean test all the things which can be practically
 tested by our current CI system.

 = July 16 =

 08:04 - I +2 the version_cap patch after Dan fixes up some issues I
 pointed out.

 13:44 - 14:28 - Sean and John Garbutt add further thoughts to the
 libvirt testing policy review without making any comment on the
 version_cap idea. Sean takes the debate to the mailing list:

   http://lists.openstack.org/pipermail/openstack-dev/2014-July/040421.html

 Debate continues in the thread, largely around the mechanics of how to
 allow a newer version of libvirt be used in the gate.

 15:08 - I mention the version_cap proposal on the thread for the first
 time:

   http://lists.openstack.org/pipermail/openstack-dev/2014-July/040436.html

 and the point I make is the configuration option makes it easier for
 operators to run only code paths that are tested by the gate.

 16:44 - Johannes notes that multiple issues with code paths not tested
 in the gate may need to be fixed as part of a future review to increase
 the default value of version_cap.

 http://lists.openstack.org/pipermail/openstack-dev/2014-July/040456.html

 18:50 - Russell approves the version_cap patch.

 https://review.openstack.org/107119

 = July 17 =

 05:38 - The version_cap patch merges.

 13:09 - Somewhat related, Dan Berrange and I explain we won't be at the
 mid-cycle issue for any test policy discussions. Sean makes a point that
 the discussion is best had on email/IRC where there is a permanent
 record.

 14:33 - Johannes expresses concern in gerrit that version_cap got merged
 too quickly.

 https://review.openstack.org/107119

 15:17 - Dan Berrange responds to Johannes in gerrit, saying that he
 thinks version_cap is useful irrespective of the broader testing
 discussion.

 15:28 - Johannes disagrees, asks for a response to his concerns on the
 mailing list.

 15:40 - Dan Berrange responds on the mailing list to Johannes
 version_cap concerns.

 http://lists.openstack.org/pipermail/openstack-dev/2014-July/040576.html

 18:15 - Russell also responds to Johannes.

 http://lists.openstack.org/pipermail/openstack-dev/2014-July/040597.html

 18:31 - Johannes responds to Dan.

 http://lists.openstack.org/pipermail/openstack-dev/2014-July/040602.html

 18:39 - Russell responds to Johannes again.

 http://lists.openstack.org/pipermail/openstack-dev/2014-July/040604.html

 19:13 - Johannes responds again.

 http://lists.openstack.org

Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-03 Thread Joe Gordon
On Wed, Sep 3, 2014 at 2:50 AM, Nikola Đipanov ndipa...@redhat.com wrote:

 On 09/02/2014 09:23 PM, Michael Still wrote:
  On Tue, Sep 2, 2014 at 1:40 PM, Nikola Đipanov ndipa...@redhat.com
 wrote:
  On 09/02/2014 08:16 PM, Michael Still wrote:
  Hi.
 
  We're soon to hit feature freeze, as discussed in Thierry's recent
  email. I'd like to outline the process for requesting a freeze
  exception:
 
  * your code must already be up for review
  * your blueprint must have an approved spec
  * you need three (3) sponsoring cores for an exception to be
 granted
 
  Can core reviewers who have features up for review have this number
  lowered to two (2) sponsoring cores, as they in reality then need four
  (4) cores (since they themselves are one (1) core but cannot really
  vote) making it an order of magnitude more difficult for them to hit
  this checkbox?
 
  That's a lot of numbers in that there paragraph.
 
  Let me re-phrase your question... Can a core sponsor an exception they
  themselves propose? I don't have a problem with someone doing that,
  but you need to remember that does reduce the number of people who
  have agreed to review the code for that exception.
 

 Michael has correctly picked up on a hint of snark in my email, so let
 me explain where I was going with that:

 The reason many features including my own may not make the FF is not
 because there was not enough buy in from the core team (let's be
 completely honest - I have 3+ other core members working for the same
 company that are by nature of things easier to convince), but because of
 any of the following:


I find the statement about having multiple cores at the same company very
concerning. To quote Mark McLoughlin, It is assumed that all core team
members are wearing their upstream hat and aren't there merely to
represent their employers interests [0]. Your statement appears to be in
direct conflict with Mark's idea of what core reviewer is, and idea that
IMHO is one of the basic tenants of OpenStack development.

[0] http://lists.openstack.org/pipermail/openstack-dev/2013-July/012073.html




 * Crippling technical debt in some of the key parts of the code
 * that we have not been acknowledging as such for a long time
 * which leads to proposed code being arbitrarily delayed once it makes
 the glaring flaws in the underlying infra apparent
 * and that specs process has been completely and utterly useless in
 helping uncover (not that process itself is useless, it is very useful
 for other things)

 I am almost positive we can turn this rather dire situation around
 easily in a matter of months, but we need to start doing it! It will not
 happen through pinning arbitrary numbers to arbitrary processes.


Nova is big and complex enough that I don't think any one person is able to
identify what we need to work on to make things better. That is one of the
reasons why I have the project priorities patch [1] up. I would like to see
nova as a team discuss and come up with what we think we need to focus on
to get us back on track.


[1] https://review.openstack.org/#/c/112733/



 I will follow up with a more detailed email about what I believe we are
 missing, once the FF settles and I have applied some soothing creme to
 my burnout wounds, but currently my sentiment is:

 Contributing features to Nova nowadays SUCKS!!1 (even as a core
 reviewer) We _have_ to change that!


Yes, I can agree with you on this part, things in nova land are not good.



 N.

  Michael
 
  * exceptions must be granted before midnight, Friday this week
  (September 5) UTC
  * the exception is valid until midnight Friday next week
  (September 12) UTC when all exceptions expire
 
  For reference, our rc1 drops on approximately 25 September, so the
  exception period needs to be short to maximise stabilization time.
 
  John Garbutt and I will both be granting exceptions, to maximise our
  timezone coverage. We will grant exceptions as they come in and gather
  the required number of cores, although I have also carved some time
  out in the nova IRC meeting this week for people to discuss specific
  exception requests.
 
  Michael
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Feature Freeze Exception process for Juno

2014-09-03 Thread Joe Gordon
On Wed, Sep 3, 2014 at 8:57 AM, Solly Ross sr...@redhat.com wrote:

  I will follow up with a more detailed email about what I believe we are
  missing, once the FF settles and I have applied some soothing creme to
  my burnout wounds, but currently my sentiment is:
 
  Contributing features to Nova nowadays SUCKS!!1 (even as a core
  reviewer) We _have_ to change that!

 I think this is *very* important.

 rant
 For instance, I have/had two patch series
 up. One is of length 2 and is relatively small.  It's basically sitting
 there
 with one +2 on each patch.  I will now most likely have to apply for a
 FFE
 to get it merged, not because there's more changes to be made before it
 can get merged
 (there was one small nit posted yesterday) or because it's a huge patch
 that needs a lot
 of time to review, but because it just took a while to get reviewed by
 cores,
 and still only appears to have been looked at by one core.

 For the other patch series (which is admittedly much bigger), it was hard
 just to
 get reviews (and it was something where I actually *really* wanted several
 opinions,
 because the patch series touched a couple of things in a very significant
 way).

 Now, this is not my first contribution to OpenStack, or to Nova, for that
 matter.  I
 know things don't always get in.  It's frustrating, however, when it seems
 like the
 reason something didn't get in wasn't because it was fundamentally flawed,
 but instead
 because it didn't get reviews until it was too late to actually take that
 feedback into
 account, or because it just didn't get much attention review-wise at all.
 If I were a
 new contributor to Nova who had successfully gotten a major blueprint
 approved and
 the implemented, only to see it get rejected like this, I might get turned
 off of Nova,
 and go to work on one of the other OpenStack projects that seemed to move
 a bit faster.
 /rant

 So, it's silly to rant without actually providing any ideas on how to fix
 it.
 One suggestion would be, for each approved blueprint, to have one or two
 cores
 explicitly marked as being responsible for providing at least some
 feedback on
 that patch.  This proposal has issues, since we have a lot of blueprints
 and only
 twenty cores, who also have their own stuff to work on.  However, I think
 the
 general idea of having guaranteed reviewers is not unsound by itself.
 Perhaps
 we should have a loose tier of reviewers between core and everybody
 else.
 These reviewers would be known good reviewers who would follow the
 implementation
 of particular blueprints if a core did not have the time.  Then, when
 those reviewers
 gave the +1 to all the patches in a series, they could ping a core, who
 could feel
 more comfortable giving a +2 without doing a deep inspection of the code.

 That's just one suggestion, though.  Whatever the solution may be, this is
 a
 problem that we need to fix.  While I enjoyed going through the blueprint
 process
 this cycle (not sarcastic -- I actually enjoyed the whole structured
 feedback thing),
 the follow up to that was not the most pleasant.

 One final note: the specs referenced above didn't get approved until Spec
 Freeze, which
 seemed to leave me with less time to implement things.  In fact, it seemed
 that a lot
 of specs didn't get approved until spec freeze.  Perhaps if we had more
 staggered
 approval of specs, we'd have more staggered submission of patches, and
 thus less of a
 sudden influx of patches in the couple weeks before feature proposal
 freeze.



While you raise some good points, albeit not new ones just long standing
issues that we really need to address, Nikola appears to not be commenting
on the shortage of reviews but rather on the amount of technical debt Nova
has.


 Best Regards,
 Solly Ross

 - Original Message -
  From: Nikola Đipanov ndipa...@redhat.com
  To: openstack-dev@lists.openstack.org
  Sent: Wednesday, September 3, 2014 5:50:09 AM
  Subject: Re: [openstack-dev] [Nova] Feature Freeze Exception process for
 Juno
 
  On 09/02/2014 09:23 PM, Michael Still wrote:
   On Tue, Sep 2, 2014 at 1:40 PM, Nikola Đipanov ndipa...@redhat.com
 wrote:
   On 09/02/2014 08:16 PM, Michael Still wrote:
   Hi.
  
   We're soon to hit feature freeze, as discussed in Thierry's recent
   email. I'd like to outline the process for requesting a freeze
   exception:
  
   * your code must already be up for review
   * your blueprint must have an approved spec
   * you need three (3) sponsoring cores for an exception to be
 granted
  
   Can core reviewers who have features up for review have this number
   lowered to two (2) sponsoring cores, as they in reality then need four
   (4) cores (since they themselves are one (1) core but cannot really
   vote) making it an order of magnitude more difficult for them to hit
   this checkbox?
  
   That's a lot of numbers in that there paragraph.
  
   Let me re-phrase your question... Can a core sponsor an exception they
   

Re: [openstack-dev] [nova] Is the BP approval process broken?

2014-08-29 Thread Joe Gordon
On Aug 29, 2014 10:42 AM, Dugger, Donald D donald.d.dug...@intel.com
wrote:

 Well, I think that there is a sign of a broken (or at least bent) process
and that's what I'm trying to expose.  Especially given the ongoing
conversations over Gantt it seems wrong that ultimately it was rejected due
to silence.  Maybe rejecting the BP was the right decision but the way the
decision was made was just wrong.

 Note that dealing with silence is `really` difficult.  You point out that
maybe silence means people don't agree with the BP but how do I know?
Maybe it means no one has time, maybe no one has an opinion, maybe it got
lost in the shuffle, maybe I'm being too obnoxious - who knows.  A simple
-1 with a one sentence explanation would helped a lot.

How is this:

-1, we already have too many approved blueprints in Juno and it sounds like
there are still concerns about the Gantt split in general. Hopefully after
trunk is open for Kilo we can revisit the Gantt idea. I'm thinking yet
another ML thread outlining why and how to get there.


 --
 Don Dugger
 Censeo Toto nos in Kansa esse decisse. - D. Gale
 Ph: 303/443-3786

 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: Friday, August 29, 2014 10:43 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Is the BP approval process broken?

 On 08/29/2014 12:25 PM, Zane Bitter wrote:
  On 28/08/14 17:02, Jay Pipes wrote:
  I understand your frustration about the silence, but the silence from
  core team members may actually be a loud statement about where their
  priorities are.
 
  I don't know enough about the Nova review situation to say if the
  process is broken or not. But I can say that if passive-aggressively
  ignoring people is considered a primary communication channel,
  something is definitely broken.

 Nobody is ignoring anyone. There have ongoing conversations about the
scheduler and Gantt, and those conversations haven't resulted in all the
decisions that Don would like. That is unfortunate, but it's not a sign of
a broken process.

 -jay


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] gate debugging

2014-08-28 Thread Joe Gordon
On Thu, Aug 28, 2014 at 10:17 AM, Sean Dague s...@dague.net wrote:

 On 08/28/2014 12:48 PM, Doug Hellmann wrote:
 
  On Aug 27, 2014, at 5:56 PM, Sean Dague s...@dague.net wrote:
 
  On 08/27/2014 05:27 PM, Doug Hellmann wrote:
 
  On Aug 27, 2014, at 2:54 PM, Sean Dague s...@dague.net wrote:
 
  Note: thread intentionally broken, this is really a different topic.
 
  On 08/27/2014 02:30 PM, Doug Hellmann wrote:
  On Aug 27, 2014, at 1:30 PM, Chris Dent chd...@redhat.com wrote:
 
  On Wed, 27 Aug 2014, Doug Hellmann wrote:
 
  I have found it immensely helpful, for example, to have a written
 set
  of the steps involved in creating a new library, from importing the
  git repo all the way through to making it available to other
 projects.
  Without those instructions, it would have been much harder to
 split up
  the work. The team would have had to train each other by word of
  mouth, and we would have had constant issues with inconsistent
  approaches triggering different failures. The time we spent
 building
  and verifying the instructions has paid off to the extent that we
 even
  had one developer not on the core team handle a graduation for us.
 
  +many more for the relatively simple act of just writing stuff down
 
  Write it down.” is my theme for Kilo.
 
  I definitely get the sentiment. Write it down is also hard when you
  are talking about things that do change around quite a bit. OpenStack
 as
  a whole sees 250 - 500 changes a week, so the interaction pattern
 moves
  around enough that it's really easy to have *very* stale information
  written down. Stale information is even more dangerous than no
  information some times, as it takes people down very wrong paths.
 
  I think we break down on communication when we get into a conversation
  of I want to learn gate debugging because I don't quite know what
 that
  means, or where the starting point of understanding is. So those
  intentions are well meaning, but tend to stall. The reality was there
  was no road map for those of us that dive in, it's just understanding
  how OpenStack holds together as a whole and where some of the high
 risk
  parts are. And a lot of that comes with days staring at code and logs
  until patterns emerge.
 
  Maybe if we can get smaller more targeted questions, we can help folks
  better? I'm personally a big fan of answering the targeted questions
  because then I also know that the time spent exposing that information
  was directly useful.
 
  I'm more than happy to mentor folks. But I just end up finding the I
  want to learn at the generic level something that's hard to grasp
 onto
  or figure out how we turn it into action. I'd love to hear more ideas
  from folks about ways we might do that better.
 
  You and a few others have developed an expertise in this important
 skill. I am so far away from that level of expertise that I don’t know the
 questions to ask. More often than not I start with the console log, find
 something that looks significant, spend an hour or so tracking it down, and
 then have someone tell me that it is a red herring and the issue is really
 some other thing that they figured out very quickly by looking at a file I
 never got to.
 
  I guess what I’m looking for is some help with the patterns. What made
 you think to look in one log file versus another? Some of these jobs save a
 zillion little files, which ones are actually useful? What tools are you
 using to correlate log entries across all of those files? Are you doing it
 by hand? Is logstash useful for that, or is that more useful for finding
 multiple occurrences of the same issue?
 
  I realize there’s not a way to write a how-to that will live forever.
 Maybe one way to deal with that is to write up the research done on bugs
 soon after they are solved, and publish that to the mailing list. Even the
 retrospective view is useful because we can all learn from it without
 having to live through it. The mailing list is a fairly ephemeral medium,
 and something very old in the archives is understood to have a good chance
 of being out of date so we don’t have to keep adding disclaimers.
 
  Sure. Matt's actually working up a blog post describing the thing he
  nailed earlier in the week.
 
  Yes, I appreciate that both of you are responding to my questions. :-)
 
  I have some more specific questions/comments below. Please take all of
 this in the spirit of trying to make this process easier by pointing out
 where I’ve found it hard, and not just me complaining. I’d like to work on
 fixing any of these things that can be fixed, by writing or reviewing
 patches for early in kilo.
 
 
  Here is my off the cuff set of guidelines:
 
  #1 - is it a test failure or a setup failure
 
  This should be pretty easy to figure out. Test failures come at the end
  of console log and say that tests failed (after you see a bunch of
  passing tempest tests).
 
  Always start at *the end* of files and work backwards.
 
  That’s 

Re: [openstack-dev] [nova] Is the BP approval process broken?

2014-08-28 Thread Joe Gordon
On Thu, Aug 28, 2014 at 2:40 AM, Daniel P. Berrange berra...@redhat.com
wrote:

 On Thu, Aug 28, 2014 at 01:04:57AM +, Dugger, Donald D wrote:
  I'll try and not whine about my pet project but I do think there
  is a problem here.  For the Gantt project to split out the scheduler
  there is a crucial BP that needs to be implemented (
  https://review.openstack.org/#/c/89893/ ) and, unfortunately, the
  BP has been rejected and we'll have to try again for Kilo.  My question
  is did we do something wrong or is the process broken?
 
  Note that we originally proposed the BP on 4/23/14, went through 10
  iterations to the final version on 7/25/14 and the final version got
  three +1s and a +2 by 8/5.  Unfortunately, even after reaching out
  to specific people, we didn't get the second +2, hence the rejection.

 I see at that it did not even get one +2 at the time of the feature
 proposal approval freeze. You then successfully requested an exception
 and after a couple more minor updates got a +2 from John but from no
 one else.

 I do think this shows a flaw in our (core teams) handling of the
 blueprint. When we agreed upon the freeze exception, that should
 have included a firm commitment for a least 2 core devs to review
 it. IOW I think it is reasonable to say that either your feature
 should have ended up with two +2s and +A, or you should have seen
 a -1 from another core dev. I don't think it is acceptable that
 after the exception was approved it only got feedback from one
 core dev.   I actually thought that when approving exceptions, we
 always got 2 cores to agree to review the item to avoid this, so
 I'm not sure why we failed here.

  I understand that reviews are a burden and very hard but it seems
  wrong that a BP with multiple positive reviews and no negative
  reviews is dropped because of what looks like indifference.  Given
  that there is still time to review the actual code patches it seems
  like there should be a simpler way to get a BP approved.  Without
  an approved BP it's difficult to even start the coding process.


So the question is the BP approval process broken doesn't have a simple
answer. There are definitely things we should change, but in this case I
think the process sort of worked. The problem you hit is we just don't have
enough people doing reviews. Your blueprint didn't get approved in part
because the ratio of reviews needed to reviewers is off. If we don't even
have enough bandwidth to approve this spec we certainly don't have enough
bandwidth to review the code associated with the spec.



 
  I see 2 possibilities here:
 
 
  1)  This is an isolated case specific to this BP.  If so,
  there's no need to change the procedures but I would like to
  know what we should be doing differently.  We got a +2 review
  on 8/4 and then silence for 3 weeks.
 
  2)  This is a process problem that other people encounter.
  Maybe there are times when silence means assent.  Something
  like a BP with multiple +1s and at least one +2 should
  automatically be accepted if no one reviews it 2 weeks after
  the +2 is given.

 My two thoughts are

  - When we approve something for exception should actively monitor
progress of review to ensure it gets the neccessary attention to
either approve or reject it. It makes no sense to approve an
exception and then let it lie silently waiting for weeks with no
attention. I'd expect that any time exceptions are approved we
should babysit them and actively review their status in the weekly
meeting to ensure they are followed up on.

  - Core reviewers should prioritize reviews of things which already
have a +2 on them. I wrote about this in the context of code reviews
last week, but all my points apply equally to spec reviews I believe.


 http://lists.openstack.org/pipermail/openstack-dev/2014-August/043657.html

 Also note that in Kilo the process will be slightly less heavyweight in
 that we're going to try allow some features changes into tree without
 first requiring a spec/blueprint to be written. I can't say offhand
 whether this particular feature would have qualifying for the lighter
 process, but in general by reducing need for specs for the more trivial
 items, we'll have more time available for review of things which do
 require specs.


Under the proposed changes to the spec/blueprint process, this would still
need a spec.



 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
 :|
 |: http://libvirt.org  -o- http://virt-manager.org
 :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
 :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
 :|

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___

Re: [openstack-dev] [nova] Is the BP approval process broken?

2014-08-28 Thread Joe Gordon
On Thu, Aug 28, 2014 at 2:43 PM, Alan Kavanagh alan.kavan...@ericsson.com
wrote:

 I share Donald's points here, I believe what would help is to clearly
 describe in the Wiki the process and workflow for the BP approval process
 and build in this process how to deal with discrepancies/disagreements and
 build timeframes for each stage and process of appeal etc.
 The current process would benefit from some fine tuning and helping to
 build safe guards and time limits/deadlines so folks can expect responses
 within a reasonable time and not be left waiting in the cold.



This is a resource problem, the nova team simply does not have enough
people doing enough reviews to make this possible.


 My 2cents!
 /Alan

 -Original Message-
 From: Dugger, Donald D [mailto:donald.d.dug...@intel.com]
 Sent: August-28-14 10:43 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] Is the BP approval process broken?

 I would contend that that right there is an indication that there's a
 problem with the process.  You submit a BP and then you have no idea of
 what is happening and no way of addressing any issues.  If the priority is
 wrong I can explain why I think the priority should be higher, getting
 stonewalled leaves me with no idea what's wrong and no way to address any
 problems.

 I think, in general, almost everyone is more than willing to adjust
 proposals based upon feedback.  Tell me what you think is wrong and I'll
 either explain why the proposal is correct or I'll change it to address the
 concerns.

 Trying to deal with silence is really hard and really frustrating.
 Especially given that we're not supposed to spam the mailing it's really
 hard to know what to do.  I don't know the solution but we need to do
 something.  More core team members would help, maybe something like an
 automatic timeout where BPs/patches with no negative scores and no activity
 for a week get flagged for special handling.

 I feel we need to change the process somehow.

 --
 Don Dugger
 Censeo Toto nos in Kansa esse decisse. - D. Gale
 Ph: 303/443-3786

 -Original Message-
 From: Jay Pipes [mailto:jaypi...@gmail.com]
 Sent: Thursday, August 28, 2014 1:44 PM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Is the BP approval process broken?

 On 08/27/2014 09:04 PM, Dugger, Donald D wrote:
  I'll try and not whine about my pet project but I do think there is a
  problem here.  For the Gantt project to split out the scheduler there
  is a crucial BP that needs to be implemented (
  https://review.openstack.org/#/c/89893/ ) and, unfortunately, the BP
  has been rejected and we'll have to try again for Kilo.  My question
  is did we do something wrong or is the process broken?
 
  Note that we originally proposed the BP on 4/23/14, went through 10
  iterations to the final version on 7/25/14 and the final version got
  three +1s and a +2 by 8/5.  Unfortunately, even after reaching out to
  specific people, we didn't get the second +2, hence the rejection.
 
  I understand that reviews are a burden and very hard but it seems
  wrong that a BP with multiple positive reviews and no negative reviews
  is dropped because of what looks like indifference.

 I would posit that this is not actually indifference. The reason that
 there may not have been 1 +2 from a core team member may very well have
 been that the core team members did not feel that the blueprint's priority
 was high enough to put before other work, or that the core team members did
 have the time to comment on the spec (due to them not feeling the blueprint
 had the priority to justify the time to do a full review).

 Note that I'm not a core drivers team member.

 Best,
 -jay


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Kilo Specs Schedule

2014-08-28 Thread Joe Gordon
We just finished discussing when to open up Kilo specs at the nova meeting
today [0], and Kilo specs will open right after we cut Juno RC1 (around
Sept 25th [1]). Additionally, the spec template will most likely be revised.

We still have a huge amount of work to do for Juno and the nova team is
mostly concerned with the 50 blueprints we have up for review [2] and the
1000 open bugs [3] (186 of which have patches up for review). The RC1
timeframe is the right fit for when we can start to move our focus out to
upcoming kilo items.


[0]
http://eavesdrop.openstack.org/meetings/nova/2014/nova.2014-08-28-21.01.log.html
[1] https://wiki.openstack.org/wiki/Juno_Release_Schedule
[2] https://blueprints.launchpad.net/nova/juno
[3] http://54.201.139.117/nova-bugs.html

best,
Joe
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


<    1   2   3   4   5   6   7   8   >