Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-28 Thread Day, Phil
> -Original Message-
> From: Robert Collins [mailto:robe...@robertcollins.net]
> Sent: 29 December 2013 05:36
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] minimum review period for functional
> changes that break backwards compatibility
> 
> On 29 December 2013 05:15, John Griffith 
> wrote:
> > I think Sean made some good recommendations in the review (waiting 24
> > hours as well as suggesting ML etc).  It seems that cases like this
> > don't necessarily need mandated time requirements for review but just
> > need good core reviewers to say "hey, this is a big deal... we should
> > probably get some feedback here" etc.
> >
> > One thing I am curious about however, Gary made a good point about
> > using the "default_ephemeral_format=" config setting to make this
> > pretty easy and straight forward.  I didn't see any other responses to
> > that, and it looks like the patch still uses a default of "none".
> > Quick look at the code it seems like this would be a clean way to go
> > about things, any reason why this wasn't discussed further?
> 
> We make a point of running defaults in TripleO: if the defaults aren't
> generally production suitable, they aren't suitable defaults. If/when we find
> a place where there is no sane default, we'll push for having no default and
> forcing a choice to be made.
> 
> ext3 wasn't a sane default :).
>

ext3 may no longer be the best choice of a default, but that the fact that is 
already established as the default means that we have to plan any changes 
carefully.
 
> In fact, for CD environments, the ability to set ext3 via config options means
> this change is easy to convert into an arbitrary-time warning period to users,
> if a cloud needs to.

IMO that puts the emphasis in the wrong place - yes given sufficient notice a 
CD user can make changes to their existing images to protect them from this 
change, but that requires them to have sufficient notification to make and test 
that change.  The responsibility should be on reviewers to not allow though 
changes that break backwards compatibility without some form of notice / 
deprecation period - not on the operators to have to monitor for and react to 
changes as they come through.

Phil

> 
> -Rob
> 
> --
> Robert Collins 
> Distinguished Technologist
> HP Converged Cloud
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-28 Thread Andreas Jaeger
On 12/29/2013 07:50 AM, Robert Collins wrote:
> On 29 December 2013 04:21, Day, Phil  wrote:
>> Hi Folks,
>>
>>
>>
>> I know it may seem odd to be arguing for slowing down a part of the review
>> process, but I’d like to float the idea that there should be a minimum
>> review period for patches that change existing functionality in a way that
>> isn’t backwards compatible.
> 
> What is the minimum review period intended to accomplish? I mean:
> everyone that reviewed this *knew* it changed a default, and that
> guest OS's that did support ext3 but don't support ext4 would be
> broken. Would you like to have seen a different judgement call - e.g.
> 'Because this is a backward breaking change, it has to go through one
> release of deprecation warning, and *then* can be made' ?
> 
> One possible reason to want a different judgment call is that the
> logic about impacted OS's was wrong - I claimed (correctly) that every
> OS has support for ext4, but neglected to consider the 13 year
> lifespan of RHEL...
> https://access.redhat.com/site/support/policy/updates/errata/ shows
> that RHEL 3 and 4 are both still supported, and neither support ext4.
> So folk that are running apps in those legacy environments indeed
> cannot move.

SUSE Linux Enterprise Server 11 comes with ext3 as default as well - and
does not include ext4 support, so this really a bad change for SLES,

Andreas
-- 
 Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
  SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
   GF: Jeff Hawn,Jennifer Guild,Felix Imendörffer,HRB16746 (AG Nürnberg)
GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-28 Thread Robert Collins
On 29 December 2013 04:21, Day, Phil  wrote:
> Hi Folks,
>
>
>
> I know it may seem odd to be arguing for slowing down a part of the review
> process, but I’d like to float the idea that there should be a minimum
> review period for patches that change existing functionality in a way that
> isn’t backwards compatible.

What is the minimum review period intended to accomplish? I mean:
everyone that reviewed this *knew* it changed a default, and that
guest OS's that did support ext3 but don't support ext4 would be
broken. Would you like to have seen a different judgement call - e.g.
'Because this is a backward breaking change, it has to go through one
release of deprecation warning, and *then* can be made' ?

One possible reason to want a different judgment call is that the
logic about impacted OS's was wrong - I claimed (correctly) that every
OS has support for ext4, but neglected to consider the 13 year
lifespan of RHEL...
https://access.redhat.com/site/support/policy/updates/errata/ shows
that RHEL 3 and 4 are both still supported, and neither support ext4.
So folk that are running apps in those legacy environments indeed
cannot move.

Another possible reason is that we should have a strict
no-exceptions-by-default approach to backwards incompatible changes,
even when there are config settings to override them. Whatever the nub
is - lets surface that and target it.

Basically, I'm not sure what problem you're trying to solve - lets
tease that out, and then talk about how to solve it. "Backwards
incompatible change landed" might be the problem - but since every
reviewer knew it, having a longer review period is clearly not
connected to solving the problem :).


> The specific change that got me thinking about this is
> https://review.openstack.org/#/c/63209/ which changes the default fs type
> from ext3 to ext4.I agree with the comments in the commit message that
> ext4 is a much better filesystem, and it probably does make sense to move to
> that as the new default at some point, however there are some old OS’s that
> may still be in use that don’t support ext4.  By making this change to the

Per above, these seem to be solely RHEL3 and RHEL4.

> default without any significant notification period this change has the
> potential to brake existing images and snapshots.  It was already possible
> to use ext4 via existing configuration values, so there was no urgency to
> this change (and no urgency implied in the commit messages, which is neither
> a bug or blueprint).

Indeed - the reason for putting up the change was the positive
reception on the list. If the change was requested to wait, we would
have ensured there was a bug (because running non-default for no good
reason is a bug in TripleO+the component whose default is wrong) used
the config setting, and moved on.

> I’m not trying to pick out the folks involved in this change in particular,

I don't feel picked out :).

> it just happened to serve as a good and convenient example of something that
> I think we need to be more aware of and think about having some specific
> policy around.  On the plus side the reviewers did say they would wait 24
> hours to see if anyone objected, and the actual review went over 4 days –
> but I’d suggest that is still far too quick even in a non-holiday period for
> something which is low priority (the functionality could already be achieved
> via existing configuration options) and which is a change in default
> behaviour.  (In the period around a major holiday there probable needs to be
> an even longer wait). I know there are those that don’t want to see
> blueprints for every minor functional change to the system, but maybe this
> is a case where a blueprint being proposed and reviewed may have caught the
> impact of the change.With a number of people now using a continual

I'm extremely skeptical of 'wait longer' and 'use blueprints' as tools
to get 'big impact noticed': blueprints will get the PTL to see the
change, but not all reviewers. Waiting longer likewise: One could wait
3 weeks and not get reviewed. If the existing system isn't working,
doing more of it will just not work more :).

> deployment approach any change in default behaviour needs to be considered
> not just  for the benefits it brings but what it might break.  The advantage
> we have as a community is that there are lot of different perspectives that
> can be brought to bear on the impact of functional changes, but we equally
> have to make sure there is sufficient time for those perspectives to emerge.

Sure. What does this break though? Specifically, from this mail and
the research it's prompted me to do I can see that RHEL3 and RHEL4
users would stop having ephemeral work *by default*. They can still
work by requesting an ext3 drive from their cloud provider.

> Somehow it feels that we’re getting the priorities on reviews wrong when a
> low priority changes like this which can  go through in a matter of days,
> when there are 

Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-28 Thread Robert Collins
On 29 December 2013 05:15, John Griffith  wrote:
> I think Sean made some good recommendations in the review (waiting 24
> hours as well as suggesting ML etc).  It seems that cases like this
> don't necessarily need mandated time requirements for review but just
> need good core reviewers to say "hey, this is a big deal... we should
> probably get some feedback here" etc.
>
> One thing I am curious about however, Gary made a good point about
> using the "default_ephemeral_format=" config setting to make this
> pretty easy and straight forward.  I didn't see any other responses to
> that, and it looks like the patch still uses a default of "none".
> Quick look at the code it seems like this would be a clean way to go
> about things, any reason why this wasn't discussed further?

We make a point of running defaults in TripleO: if the defaults aren't
generally production suitable, they aren't suitable defaults. If/when
we find a place where there is no sane default, we'll push for having
no default and forcing a choice to be made.

ext3 wasn't a sane default :).

In fact, for CD environments, the ability to set ext3 via config
options means this change is easy to convert into an arbitrary-time
warning period to users, if a cloud needs to.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic]Communication between Nova and Ironic

2013-12-28 Thread LeslieWang
Hi Client, 
Current ironic call is for add/delete baremetl server, not with auto-scale. As 
we discussed in another thread. What I'm thinking is related with auto-scale 
baremetal server. In my mind, the logic can be   1. Nova scheduler determines 
scale up one baremetal server.  2. Nova scheduler notify ironic (or other API?) 
to power up the server.  3. if ironic (or other service?) returns success, nova 
scheduler can call ironic to add the baremetal server into cluster.
Of course, this is not a sole way for auto-scale. As you specified in another 
thread, auto-scale can be triggered from under-cloud or other monitoring 
service. Just try to bring up the interesting discussion. :-)
Best RegardsLeslie

> From: cl...@fewbar.com
> To: openstack-dev@lists.openstack.org
> Date: Sat, 28 Dec 2013 13:40:08 -0800
> Subject: Re: [openstack-dev] [Ironic]Communication between Nova and Ironic
> 
> Excerpts from LeslieWang's message of 2013-12-24 03:01:51 -0800:
> > Hi Oleg,
> > 
> > Thanks for your promptly reply and detail explanation. Merry Christmas and 
> > wish you have a happy new year!
> > 
> > At the same time, I think we can discuss more on Ironic is for backend 
> > driver for nova. I'm new in ironic. Per my understanding, the purpose of 
> > bare metal as a backend driver is to solve the problem that some appliance 
> > systems can not be virtualized, but operator still wants same cloud 
> > management system to manage these systems. With the help of ironic, 
> > operator can achieve the goal, and use one openstack to manage these 
> > systems as VMs, create, delete, deploy image etc. this is one typical use 
> > case.
> > 
> > In addition, actually I'm thinking another interesting use case. Currently 
> > openstack requires ops to pre-install all servers. TripleO try to solve 
> > this problem and bootstrap openstack using openstack. However, what is 
> > missing here is dynamic power on VM/switches/storage only. Imagine 
> > initially lab only had one all-in-one openstack controller. The whole work 
> > flow can be:
> >   1. Users request one VM or baremetal server through portal.
> >   2. Horizon sends request to nova-scheduler
> >   3. Nova-scheduler finds no server, then invoke ironic api to power on one 
> > through IPMI, and install either hyper visor or appliance directly.
> >   4. If it need create VM, Nova-scheduler will find one compute node, and 
> > send message for further processing.
> > 
> > Based on this use case, I'm thinking whether it makes sense to embed this 
> > ironic invokation logic in nova-scheduler, or another approach is as 
> > overall orchestration manager, TripleO project has a TripleO-scheduler to 
> > firstly intercept the message, invoke ironic api, then heat api which calls 
> > nova api, neutron api, storage api.  In this case, TripleO only powers on 
> > baremetal server running VM, nova is responsible to power on baremetal 
> > server running appliance system. Sounds like latter one is a good solution, 
> > however the prior one also works. So can you please comment on it? Thanks!
> > 
> 
> I think this basically already works the way you desire. The scheduler
> _does_ decide to call ironic, it just does so through nova-compute RPC
> calls. That is important, as this allows the scheduler to hand-off the
> entire work-flow of provisioning a machine to nova-compute in the exact
> same way as is done for regular cloud workloads.
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  ___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] UnderCloud & OverCloud

2013-12-28 Thread LeslieWang
Hi Clint,
Thanks for your reply. Please see inline.
Best RegardsLeslie

> From: cl...@fewbar.com
> To: openstack-dev@lists.openstack.org
> Date: Sat, 28 Dec 2013 08:23:45 -0800
> Subject: Re: [openstack-dev] [Spam]  [TripleO] UnderCloud & OverCloud
> 
> Excerpts from LeslieWang's message of 2013-12-24 19:19:52 -0800:
> > Dear All,
> > Merry Christmas & Happy New Year!
> > I'm new in TripleO. After some investigation, I have one question on 
> > UnderCloud & OverCloud. Per my understanding, UnderCloud will pre-install 
> > and set up all baremetal servers used for OverCloud. Seems like it assumes 
> > all baremetal server should be installed in advance. Then my question is 
> > from green and elasticity point of view. Initially OverCloud should have 
> > zero baremetal server. Per user demands, OverCloud Nova Scheduler should 
> > decide if I need more baremetal server, then talk to UnderCloud to allocate 
> > more baremetal servers, which will use Heat to orchestrate baremetal server 
> > starts. Does it make senses? Does it already plan in the roadmap?
> > If UnderCloud resources are created/deleted elastically, why not OverCloud 
> > talks to Ironic to allocate resource directly? Seems like it can achieve 
> > same goal. What else features UnderCloud will provide? Thanks in advance.
> > Best RegardsLeslie   
> 
> Having the overcloud scheduler ask for new servers would be pretty
> interesting. It takes most large scale servers several minutes just to
> POST though, so I'm not sure it is going to work out well if you care
> about latency for booting VMs.
Leslie - Nova API can add one option (latency sensitive or not) to aid 
scheduler decision. If client is sensitive about latency for booting VM, it can 
pass one parameter to specify booting VM immediately. Then scheduler can start 
VM from running baremetal server. Otherwise, if client doesn't create latency, 
scheduler can start new servers, then start VM on top. 
> 
> What might work is to use an auto-scaler in the undercloud though, perhaps
> having it informed by the overcloud in some way for more granular policy
> possibilities, but even just knowing how much RAM and CPU are allocated
> across the compute nodes would help to inform us when it is time for
> more compute nodes.
> 
> Also the scale-up is fun, but scaling down is tougher. One can only scale
> down off nodes that have no more compute workloads. If you have live
> migration then you can kick off live migration before scale down, but
> in a highly utilized cluster I think that will be a net loss over time
> as the extra load caused by a large scale live migration will outweigh
> the savings from turning off machines. The story might be different for
> a system built on network based volumes like CEPH,  I'm not sure.
Leslie - agree.
> 
> Anyway, this is really interesting to think about, but it is not
> something we're quite ready for yet. We're just getting to the point
> of being able to deploy software updates using images, and then I hope
> to focus on improving usage of Heat with rolling updates and the new
> software config capabilities. After that it may be that we can look at
> how to scale down a compute cluster automatically. :)
Leslie - understand. Roma is not build in the one day.
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  ___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic]Communication between Nova and Ironic

2013-12-28 Thread Clint Byrum
Excerpts from LeslieWang's message of 2013-12-24 03:01:51 -0800:
> Hi Oleg,
> 
> Thanks for your promptly reply and detail explanation. Merry Christmas and 
> wish you have a happy new year!
> 
> At the same time, I think we can discuss more on Ironic is for backend driver 
> for nova. I'm new in ironic. Per my understanding, the purpose of bare metal 
> as a backend driver is to solve the problem that some appliance systems can 
> not be virtualized, but operator still wants same cloud management system to 
> manage these systems. With the help of ironic, operator can achieve the goal, 
> and use one openstack to manage these systems as VMs, create, delete, deploy 
> image etc. this is one typical use case.
> 
> In addition, actually I'm thinking another interesting use case. Currently 
> openstack requires ops to pre-install all servers. TripleO try to solve this 
> problem and bootstrap openstack using openstack. However, what is missing 
> here is dynamic power on VM/switches/storage only. Imagine initially lab only 
> had one all-in-one openstack controller. The whole work flow can be:
>   1. Users request one VM or baremetal server through portal.
>   2. Horizon sends request to nova-scheduler
>   3. Nova-scheduler finds no server, then invoke ironic api to power on one 
> through IPMI, and install either hyper visor or appliance directly.
>   4. If it need create VM, Nova-scheduler will find one compute node, and 
> send message for further processing.
> 
> Based on this use case, I'm thinking whether it makes sense to embed this 
> ironic invokation logic in nova-scheduler, or another approach is as overall 
> orchestration manager, TripleO project has a TripleO-scheduler to firstly 
> intercept the message, invoke ironic api, then heat api which calls nova api, 
> neutron api, storage api.  In this case, TripleO only powers on baremetal 
> server running VM, nova is responsible to power on baremetal server running 
> appliance system. Sounds like latter one is a good solution, however the 
> prior one also works. So can you please comment on it? Thanks!
> 

I think this basically already works the way you desire. The scheduler
_does_ decide to call ironic, it just does so through nova-compute RPC
calls. That is important, as this allows the scheduler to hand-off the
entire work-flow of provisioning a machine to nova-compute in the exact
same way as is done for regular cloud workloads.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Announce of Rally - benchmarking system for OpenStack

2013-12-28 Thread Boris Pavlovic
Tim,

First of all we should finish pure OpenStack profiling system. Soon I am
going to raise another thread about it.
This will allow us not only to detect that we have some issues in some
OpenStack API, but also help us to find the real reason.

Secondly we should cover all main functionality with benchmarks. (I think
this will be done pretty soon)


Thirdly we (openstack community) should start running benchmark against
different clouds against real customers installations & dev. environments,
investigate profiling/benchmark results and as a result we will be able to:
1) Find the best/optimal hardware set for OS
2) Tune code
3) Find optimal deployment arch in case of specific load/hardware

So we will be able to test different arch/code/hardware and collect all
this information on some OpenStack wiki pages.
But this will require a lot of work of whole community.. I hope Rally team
will get it..


Best regards,
Boris Pavlovic



On Sun, Dec 29, 2013 at 12:08 AM, Tim Bell  wrote:

>
>
> Thanks.. can you advise where the accumulated experience from Rally will
> be assembled ?
>
>
>
> Rally gives me the method to test my cloud but we also need to have a set
> of documentation on how to build clouds for scale so we don’t all have to
> tune (and end up with different approaches)
>
>
>
> Tim
>
>
>
> *From:* bo...@pavlovic.ru [mailto:bo...@pavlovic.ru] *On Behalf Of *Boris
> Pavlovic
> *Sent:* 28 December 2013 21:02
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Cc:* Ali Beddah; Tim Bell
>
> *Subject:* Re: [openstack-dev] Announce of Rally - benchmarking system
> for OpenStack
>
>
>
> Ali Gamal,
>
>
>
> ?
>
>
>
>
>
>
>
> Tim,
>
>
>
> Yes it fits.
>
>
>
> There are couple of use cases that should be covered by Rally:
>
>
>
> 1) Easy way to find & fix bottlenecks/scale issues & improve performance
> of OS (without having tons of servers)
>
> 2) Find the best Arch for your hardware and your typical loads
>
> 3) Ensure that existing installation pass SLA
>
> 4) Ensure that OpenStack VMs have their resources and work as expected
> (already started discussion)
>
>
>
>
>
>
>
> Best regards,
>
> Boris Pavlovic
>
>
>
>
>
>
>
> On Sat, Dec 28, 2013 at 11:40 PM, Tim Bell  wrote:
>
>
>
> I think there also needs to be a scalability best practise and reference
> architecture.
>
>
>
> Benchmarking allows us to identify problems with the code but we also need
> some community wisdom on how to deploy at scale.
>
>
>
> Does this fit within Rally or can you advise where this community wisdom
> should be accumulated ?
>
>
>
> Tim
>
>
>
>
>
> *From:* Ali Gamal [mailto:aga...@itsyn.com]
> *Sent:* 28 December 2013 20:31
>
>
> *To:* OpenStack Development Mailing List
>
> *Cc:* Ali Beddah
> *Subject:* Re: [openstack-dev] Announce of Rally - benchmarking system
> for OpenStack
>
>
>
> On Oct 17, 2013 12:45 AM, "Boris Pavlovic"  wrote:
>
>  Hi Stackers,
>
>
> We are thrilled to present to you Rally, the benchmarking system for
> OpenStack.
>
>
> It is not a secret that we have performance & scaling issues and that
> OpenStack won’t scale out of box. It is also well known that if you get
> your super big DC (5k-15k servers) you are able to find & fix all OpenStack
> issues in few months (like Rackspace, BlueHost & others have proved). So
> the problem with performance at scale is solvable.
>
>
> The main blocker to fix such issues in community is that there is no
> simple way to get relevant and repeatable “numbers” that represent
> OpenStack performance at scale. It is not enough to tune an individual
> OpenStack component, because its performance at scale is no guarantee that
> it will not introduce a bottleneck somewhere else.
>
>
> The correct approach to comprehensively test OpenStack scalability, in our
> opinion, consists of the following four steps:
>
> 1)  Deploy OpenStack
> 2)  Create load by simultaneously making OpenStack API calls
> 3)  Collect performance and profile data
> 4)  Make data easy to consume by presenting it in a humanly readable form
>
>
> Rally is the system that implements all the steps above plus it maintains
> an extendable repository of standard performance tests. To use Rally, a
> user has to specify where to deploy OS, select the deployment mechanism
> (DevStack, Triple-O, Fuel, Etc.) and the set of benchmarking tests to run.
>
> For more details and how to use it take a look at our wiki
> https://wiki.openstack.org/wiki/Rally it should already work out of box.
>
>
> Happy hunting!
>
>
> Links:
>
> 1. Code: https://github.com/stackforge/rally
>
>
>
> 2. Wiki: https://wiki.openstack.org/wiki/Rally
>
> 2. Launchpad: https://launchpad.net/rally
>
> 3. Statistics:
> http://stackalytics.com/?release=havana&project_type=All&module=rally
>
> 4. RoadMap: https://wiki.openstack.org/wiki/Rally/RoadMap
>
>
> Best regards,
> Boris Pavlovic
> ---
> Mirantis Inc.
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://list

Re: [openstack-dev] Announce of Rally - benchmarking system for OpenStack

2013-12-28 Thread Tim Bell

Thanks.. can you advise where the accumulated experience from Rally will be 
assembled ?

Rally gives me the method to test my cloud but we also need to have a set of 
documentation on how to build clouds for scale so we don't all have to tune 
(and end up with different approaches)

Tim

From: bo...@pavlovic.ru [mailto:bo...@pavlovic.ru] On Behalf Of Boris Pavlovic
Sent: 28 December 2013 21:02
To: OpenStack Development Mailing List (not for usage questions)
Cc: Ali Beddah; Tim Bell
Subject: Re: [openstack-dev] Announce of Rally - benchmarking system for 
OpenStack

Ali Gamal,

?



Tim,

Yes it fits.

There are couple of use cases that should be covered by Rally:

1) Easy way to find & fix bottlenecks/scale issues & improve performance of OS 
(without having tons of servers)
2) Find the best Arch for your hardware and your typical loads
3) Ensure that existing installation pass SLA
4) Ensure that OpenStack VMs have their resources and work as expected (already 
started discussion)



Best regards,
Boris Pavlovic



On Sat, Dec 28, 2013 at 11:40 PM, Tim Bell 
mailto:tim.b...@cern.ch>> wrote:

I think there also needs to be a scalability best practise and reference 
architecture.

Benchmarking allows us to identify problems with the code but we also need some 
community wisdom on how to deploy at scale.

Does this fit within Rally or can you advise where this community wisdom should 
be accumulated ?

Tim


From: Ali Gamal [mailto:aga...@itsyn.com]
Sent: 28 December 2013 20:31

To: OpenStack Development Mailing List
Cc: Ali Beddah
Subject: Re: [openstack-dev] Announce of Rally - benchmarking system for 
OpenStack

On Oct 17, 2013 12:45 AM, "Boris Pavlovic" 
mailto:bpavlo...@mirantis.com>> wrote:
Hi Stackers,


We are thrilled to present to you Rally, the benchmarking system for OpenStack.


It is not a secret that we have performance & scaling issues and that OpenStack 
won't scale out of box. It is also well known that if you get your super big DC 
(5k-15k servers) you are able to find & fix all OpenStack issues in few months 
(like Rackspace, BlueHost & others have proved). So the problem with 
performance at scale is solvable.


The main blocker to fix such issues in community is that there is no simple way 
to get relevant and repeatable "numbers" that represent OpenStack performance 
at scale. It is not enough to tune an individual OpenStack component, because 
its performance at scale is no guarantee that it will not introduce a 
bottleneck somewhere else.


The correct approach to comprehensively test OpenStack scalability, in our 
opinion, consists of the following four steps:

1)  Deploy OpenStack
2)  Create load by simultaneously making OpenStack API calls
3)  Collect performance and profile data
4)  Make data easy to consume by presenting it in a humanly readable form


Rally is the system that implements all the steps above plus it maintains an 
extendable repository of standard performance tests. To use Rally, a user has 
to specify where to deploy OS, select the deployment mechanism (DevStack, 
Triple-O, Fuel, Etc.) and the set of benchmarking tests to run.

For more details and how to use it take a look at our wiki 
https://wiki.openstack.org/wiki/Rally it should already work out of box.


Happy hunting!


Links:

1. Code: https://github.com/stackforge/rally

2. Wiki: https://wiki.openstack.org/wiki/Rally
2. Launchpad: https://launchpad.net/rally
3. Statistics: 
http://stackalytics.com/?release=havana&project_type=All&module=rally
4. RoadMap: https://wiki.openstack.org/wiki/Rally/RoadMap


Best regards,
Boris Pavlovic
---
Mirantis Inc.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Announce of Rally - benchmarking system for OpenStack

2013-12-28 Thread Boris Pavlovic
Ali Gamal,

?



Tim,

Yes it fits.

There are couple of use cases that should be covered by Rally:

1) Easy way to find & fix bottlenecks/scale issues & improve performance of
OS (without having tons of servers)
2) Find the best Arch for your hardware and your typical loads
3) Ensure that existing installation pass SLA
4) Ensure that OpenStack VMs have their resources and work as expected
(already started discussion)



Best regards,
Boris Pavlovic




On Sat, Dec 28, 2013 at 11:40 PM, Tim Bell  wrote:

>
>
> I think there also needs to be a scalability best practise and reference
> architecture.
>
>
>
> Benchmarking allows us to identify problems with the code but we also need
> some community wisdom on how to deploy at scale.
>
>
>
> Does this fit within Rally or can you advise where this community wisdom
> should be accumulated ?
>
>
>
> Tim
>
>
>
>
>
> *From:* Ali Gamal [mailto:aga...@itsyn.com]
> *Sent:* 28 December 2013 20:31
>
> *To:* OpenStack Development Mailing List
> *Cc:* Ali Beddah
> *Subject:* Re: [openstack-dev] Announce of Rally - benchmarking system
> for OpenStack
>
>
>
> On Oct 17, 2013 12:45 AM, "Boris Pavlovic"  wrote:
>
>  Hi Stackers,
>
>
> We are thrilled to present to you Rally, the benchmarking system for
> OpenStack.
>
>
> It is not a secret that we have performance & scaling issues and that
> OpenStack won’t scale out of box. It is also well known that if you get
> your super big DC (5k-15k servers) you are able to find & fix all OpenStack
> issues in few months (like Rackspace, BlueHost & others have proved). So
> the problem with performance at scale is solvable.
>
>
> The main blocker to fix such issues in community is that there is no
> simple way to get relevant and repeatable “numbers” that represent
> OpenStack performance at scale. It is not enough to tune an individual
> OpenStack component, because its performance at scale is no guarantee that
> it will not introduce a bottleneck somewhere else.
>
>
> The correct approach to comprehensively test OpenStack scalability, in our
> opinion, consists of the following four steps:
>
> 1)  Deploy OpenStack
> 2)  Create load by simultaneously making OpenStack API calls
> 3)  Collect performance and profile data
> 4)  Make data easy to consume by presenting it in a humanly readable form
>
>
> Rally is the system that implements all the steps above plus it maintains
> an extendable repository of standard performance tests. To use Rally, a
> user has to specify where to deploy OS, select the deployment mechanism
> (DevStack, Triple-O, Fuel, Etc.) and the set of benchmarking tests to run.
>
> For more details and how to use it take a look at our wiki
> https://wiki.openstack.org/wiki/Rally it should already work out of box.
>
>
> Happy hunting!
>
>
> Links:
>
> 1. Code: https://github.com/stackforge/rally
>
>
>
> 2. Wiki: https://wiki.openstack.org/wiki/Rally
>
> 2. Launchpad: https://launchpad.net/rally
>
> 3. Statistics:
> http://stackalytics.com/?release=havana&project_type=All&module=rally
>
> 4. RoadMap: https://wiki.openstack.org/wiki/Rally/RoadMap
>
>
> Best regards,
> Boris Pavlovic
> ---
> Mirantis Inc.
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Announce of Rally - benchmarking system for OpenStack

2013-12-28 Thread Tim Bell

I think there also needs to be a scalability best practise and reference 
architecture.

Benchmarking allows us to identify problems with the code but we also need some 
community wisdom on how to deploy at scale.

Does this fit within Rally or can you advise where this community wisdom should 
be accumulated ?

Tim


From: Ali Gamal [mailto:aga...@itsyn.com]
Sent: 28 December 2013 20:31
To: OpenStack Development Mailing List
Cc: Ali Beddah
Subject: Re: [openstack-dev] Announce of Rally - benchmarking system for 
OpenStack

On Oct 17, 2013 12:45 AM, "Boris Pavlovic" 
mailto:bpavlo...@mirantis.com>> wrote:
Hi Stackers,


We are thrilled to present to you Rally, the benchmarking system for OpenStack.


It is not a secret that we have performance & scaling issues and that OpenStack 
won't scale out of box. It is also well known that if you get your super big DC 
(5k-15k servers) you are able to find & fix all OpenStack issues in few months 
(like Rackspace, BlueHost & others have proved). So the problem with 
performance at scale is solvable.


The main blocker to fix such issues in community is that there is no simple way 
to get relevant and repeatable "numbers" that represent OpenStack performance 
at scale. It is not enough to tune an individual OpenStack component, because 
its performance at scale is no guarantee that it will not introduce a 
bottleneck somewhere else.


The correct approach to comprehensively test OpenStack scalability, in our 
opinion, consists of the following four steps:

1)  Deploy OpenStack
2)  Create load by simultaneously making OpenStack API calls
3)  Collect performance and profile data
4)  Make data easy to consume by presenting it in a humanly readable form


Rally is the system that implements all the steps above plus it maintains an 
extendable repository of standard performance tests. To use Rally, a user has 
to specify where to deploy OS, select the deployment mechanism (DevStack, 
Triple-O, Fuel, Etc.) and the set of benchmarking tests to run.

For more details and how to use it take a look at our wiki 
https://wiki.openstack.org/wiki/Rally it should already work out of box.


Happy hunting!


Links:

1. Code: https://github.com/stackforge/rally

2. Wiki: https://wiki.openstack.org/wiki/Rally
2. Launchpad: https://launchpad.net/rally
3. Statistics: 
http://stackalytics.com/?release=havana&project_type=All&module=rally
4. RoadMap: https://wiki.openstack.org/wiki/Rally/RoadMap


Best regards,
Boris Pavlovic
---
Mirantis Inc.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Announce of Rally - benchmarking system for OpenStack

2013-12-28 Thread Ali Gamal
ali.bed...@gmail.com
On Oct 17, 2013 12:45 AM, "Boris Pavlovic"  wrote:

> Hi Stackers,
>
>
> We are thrilled to present to you Rally, the benchmarking system for
> OpenStack.
>
>
> It is not a secret that we have performance & scaling issues and that
> OpenStack won’t scale out of box. It is also well known that if you get
> your super big DC (5k-15k servers) you are able to find & fix all OpenStack
> issues in few months (like Rackspace, BlueHost & others have proved). So
> the problem with performance at scale is solvable.
>
>
> The main blocker to fix such issues in community is that there is no
> simple way to get relevant and repeatable “numbers” that represent
> OpenStack performance at scale. It is not enough to tune an individual
> OpenStack component, because its performance at scale is no guarantee that
> it will not introduce a bottleneck somewhere else.
>
>
> The correct approach to comprehensively test OpenStack scalability, in our
> opinion, consists of the following four steps:
>
> 1)  Deploy OpenStack
> 2)  Create load by simultaneously making OpenStack API calls
> 3)  Collect performance and profile data
> 4)  Make data easy to consume by presenting it in a humanly readable form
>
>
> Rally is the system that implements all the steps above plus it maintains
> an extendable repository of standard performance tests. To use Rally, a
> user has to specify where to deploy OS, select the deployment mechanism
> (DevStack, Triple-O, Fuel, Etc.) and the set of benchmarking tests to run.
>
> For more details and how to use it take a look at our wiki
> https://wiki.openstack.org/wiki/Rally it should already work out of box.
>
>
> Happy hunting!
>
>
> Links:
>
> 1. Code: https://github.com/stackforge/rally
>
> 2. Wiki: https://wiki.openstack.org/wiki/Rally
>
> 2. Launchpad: https://launchpad.net/rally
>
> 3. Statistics:
> http://stackalytics.com/?release=havana&project_type=All&module=rally
>
> 4. RoadMap: https://wiki.openstack.org/wiki/Rally/RoadMap
>
>
> Best regards,
> Boris Pavlovic
> ---
> Mirantis Inc.
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Announce of Rally - benchmarking system for OpenStack

2013-12-28 Thread Ali Gamal
On Oct 17, 2013 12:45 AM, "Boris Pavlovic"  wrote:

> Hi Stackers,
>
>
> We are thrilled to present to you Rally, the benchmarking system for
> OpenStack.
>
>
> It is not a secret that we have performance & scaling issues and that
> OpenStack won’t scale out of box. It is also well known that if you get
> your super big DC (5k-15k servers) you are able to find & fix all OpenStack
> issues in few months (like Rackspace, BlueHost & others have proved). So
> the problem with performance at scale is solvable.
>
>
> The main blocker to fix such issues in community is that there is no
> simple way to get relevant and repeatable “numbers” that represent
> OpenStack performance at scale. It is not enough to tune an individual
> OpenStack component, because its performance at scale is no guarantee that
> it will not introduce a bottleneck somewhere else.
>
>
> The correct approach to comprehensively test OpenStack scalability, in our
> opinion, consists of the following four steps:
>
> 1)  Deploy OpenStack
> 2)  Create load by simultaneously making OpenStack API calls
> 3)  Collect performance and profile data
> 4)  Make data easy to consume by presenting it in a humanly readable form
>
>
> Rally is the system that implements all the steps above plus it maintains
> an extendable repository of standard performance tests. To use Rally, a
> user has to specify where to deploy OS, select the deployment mechanism
> (DevStack, Triple-O, Fuel, Etc.) and the set of benchmarking tests to run.
>
> For more details and how to use it take a look at our wiki
> https://wiki.openstack.org/wiki/Rally it should already work out of box.
>
>
> Happy hunting!
>
>
> Links:
>
> 1. Code: https://github.com/stackforge/rally
>
> 2. Wiki: https://wiki.openstack.org/wiki/Rally
>
> 2. Launchpad: https://launchpad.net/rally
>
> 3. Statistics:
> http://stackalytics.com/?release=havana&project_type=All&module=rally
>
> 4. RoadMap: https://wiki.openstack.org/wiki/Rally/RoadMap
>
>
> Best regards,
> Boris Pavlovic
> ---
> Mirantis Inc.
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-28 Thread John Griffith
On Sat, Dec 28, 2013 at 8:57 AM, Clint Byrum  wrote:
> Hi Phil. Thanks for the well reasoned and poignant message urging
> caution and forethought in change management. I agree with all of the
> sentiments and think that we can do better in reasoning about the impact
> of changes. I think this just puts further exposure on the fact that
> Nova needs reviewers desperately so that reviewers can slow down.
>
> However, I think this is primarily an exposure in our gate testing. If
> there are older OS's we want to be able to support, we should be booting
> them in the gate and testing that the ephemeral disk works on them. What
> is a cloud that can't boot workloads?
>
> While our ability to reason is a quite effective way to stop emergent
> problems, we know these are precious and scarce resources, and thus
> we should use mechanical methods before falling back to reviewers and
> developers.
>
> So, I'd suggest that we add a test that the ephemeral disk mounts in
> any desired OS's to tempest. If that is infeasible (due to nested KVM
> in the gate being slllo) then I'm afraid I don't have a solution.
>
> Excerpts from Day, Phil's message of 2013-12-28 07:21:16 -0800:
>> Hi Folks,
>>
>> I know it may seem odd to be arguing for slowing down a part of the review 
>> process, but I'd like to float the idea that there should be a minimum 
>> review period for patches that change existing functionality in a way that 
>> isn't backwards compatible.
>>
>> The specific change that got me thinking about this is 
>> https://review.openstack.org/#/c/63209/ which changes the default fs type 
>> from ext3 to ext4.I agree with the comments in the commit message that 
>> ext4 is a much better filesystem, and it probably does make sense to move to 
>> that as the new default at some point, however there are some old OS's that 
>> may still be in use that don't support ext4.  By making this change to the 
>> default without any significant notification period this change has the 
>> potential to brake existing images and snapshots.  It was already possible 
>> to use ext4 via existing configuration values, so there was no urgency to 
>> this change (and no urgency implied in the commit messages, which is neither 
>> a bug or blueprint).
>>
>> I'm not trying to pick out the folks involved in this change in particular, 
>> it just happened to serve as a good and convenient example of something that 
>> I think we need to be more aware of and think about having some specific 
>> policy around.  On the plus side the reviewers did say they would wait 24 
>> hours to see if anyone objected, and the actual review went over 4 days - 
>> but I'd suggest that is still far too quick even in a non-holiday period for 
>> something which is low priority (the functionality could already be achieved 
>> via existing configuration options) and which is a change in default 
>> behaviour.  (In the period around a major holiday there probable needs to be 
>> an even longer wait). I know there are those that don't want to see 
>> blueprints for every minor functional change to the system, but maybe this 
>> is a case where a blueprint being proposed and reviewed may have caught the 
>> impact of the change.With a number of people now using a continual 
>> deployment approach any cha
 n
>  ge in default behaviour needs to be considered not just  for the benefits it 
> brings but what it might break.  The advantage we have as a community is that 
> there are lot of different perspectives that can be brought to bear on the 
> impact of functional changes, but we equally have to make sure there is 
> sufficient time for those perspectives to emerge.
>>
>> Somehow it feels that we're getting the priorities on reviews wrong when a 
>> low priority changes like this which can  go through in a matter of days, 
>> when there are bug fixes such as https://review.openstack.org/#/c/57708/ 
>> which have been sitting for over a month with a number of +1's which don't 
>> seem to be making any progress.
>>
>> Cheers,
>> Phil
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

I think Sean made some good recommendations in the review (waiting 24
hours as well as suggesting ML etc).  It seems that cases like this
don't necessarily need mandated time requirements for review but just
need good core reviewers to say "hey, this is a big deal... we should
probably get some feedback here" etc.

One thing I am curious about however, Gary made a good point about
using the "default_ephemeral_format=" config setting to make this
pretty easy and straight forward.  I didn't see any other responses to
that, and it looks like the patch still uses a default of "none".
Quick look at the code it seems like this would be a clean way to go
about things, any reason why this wasn't discussed further?

John

_

Re: [openstack-dev] [Spam] [TripleO] UnderCloud & OverCloud

2013-12-28 Thread Clint Byrum
Excerpts from LeslieWang's message of 2013-12-24 19:19:52 -0800:
> Dear All,
> Merry Christmas & Happy New Year!
> I'm new in TripleO. After some investigation, I have one question on 
> UnderCloud & OverCloud. Per my understanding, UnderCloud will pre-install and 
> set up all baremetal servers used for OverCloud. Seems like it assumes all 
> baremetal server should be installed in advance. Then my question is from 
> green and elasticity point of view. Initially OverCloud should have zero 
> baremetal server. Per user demands, OverCloud Nova Scheduler should decide if 
> I need more baremetal server, then talk to UnderCloud to allocate more 
> baremetal servers, which will use Heat to orchestrate baremetal server 
> starts. Does it make senses? Does it already plan in the roadmap?
> If UnderCloud resources are created/deleted elastically, why not OverCloud 
> talks to Ironic to allocate resource directly? Seems like it can achieve same 
> goal. What else features UnderCloud will provide? Thanks in advance.
> Best RegardsLeslie   

Having the overcloud scheduler ask for new servers would be pretty
interesting. It takes most large scale servers several minutes just to
POST though, so I'm not sure it is going to work out well if you care
about latency for booting VMs.

What might work is to use an auto-scaler in the undercloud though, perhaps
having it informed by the overcloud in some way for more granular policy
possibilities, but even just knowing how much RAM and CPU are allocated
across the compute nodes would help to inform us when it is time for
more compute nodes.

Also the scale-up is fun, but scaling down is tougher. One can only scale
down off nodes that have no more compute workloads. If you have live
migration then you can kick off live migration before scale down, but
in a highly utilized cluster I think that will be a net loss over time
as the extra load caused by a large scale live migration will outweigh
the savings from turning off machines. The story might be different for
a system built on network based volumes like CEPH,  I'm not sure.

Anyway, this is really interesting to think about, but it is not
something we're quite ready for yet. We're just getting to the point
of being able to deploy software updates using images, and then I hope
to focus on improving usage of Heat with rolling updates and the new
software config capabilities. After that it may be that we can look at
how to scale down a compute cluster automatically. :)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-28 Thread Tim Bell

I think there is a need for an incompatible change review process which 
includes more of the community than just those performing the code reviews. 
This kind of change can cause a lot of disruption for those of us running 
clouds so it is great to see that you are looking for more input.

In the past, it has been proposed to also highlight incompatible changes to the 
openstack-operators list which is likely to reach those of us who will be most 
affected by the change. A similar process for API changes could also be applied 
to reach out for those who use OpenStack clouds. The change can then be 
reviewed as to how to minimise the impact (if significant) along with getting a 
larger group of people involved in understanding the merits of the change 
compared to the risks/effort for those running clouds in production.

Are there any other proposals for how to handle incompatible changes ?

Tim

From: Day, Phil [mailto:philip@hp.com] 
Sent: 28 December 2013 16:21
To: OpenStack Development Mailing List (openstack-dev@lists.openstack.org)
Subject: [openstack-dev] [nova] minimum review period for functional changes 
that break backwards compatibility

Hi Folks,

I know it may seem odd to be arguing for slowing down a part of the review 
process, but I'd like to float the idea that there should be a minimum review 
period for patches that change existing functionality in a way that isn't 
backwards compatible.   

The specific change that got me thinking about this is 
https://review.openstack.org/#/c/63209/ which changes the default fs type from 
ext3 to ext4.    I agree with the comments in the commit message that ext4 is a 
much better filesystem, and it probably does make sense to move to that as the 
new default at some point, however there are some old OS's that may still be in 
use that don't support ext4.  By making this change to the default without any 
significant notification period this change has the potential to brake existing 
images and snapshots.  It was already possible to use ext4 via existing 
configuration values, so there was no urgency to this change (and no urgency 
implied in the commit messages, which is neither a bug or blueprint). 

I'm not trying to pick out the folks involved in this change in particular, it 
just happened to serve as a good and convenient example of something that I 
think we need to be more aware of and think about having some specific policy 
around.  On the plus side the reviewers did say they would wait 24 hours to see 
if anyone objected, and the actual review went over 4 days - but I'd suggest 
that is still far too quick even in a non-holiday period for something which is 
low priority (the functionality could already be achieved via existing 
configuration options) and which is a change in default behaviour.  (In the 
period around a major holiday there probable needs to be an even longer wait).  
   I know there are those that don't want to see blueprints for every minor 
functional change to the system, but maybe this is a case where a blueprint 
being proposed and reviewed may have caught the impact of the change.    With a 
number of people now using a continual deployment approach any change in 
default behaviour needs to be considered not just  for the benefits it brings 
but what it might break.  The advantage we have as a community is that there 
are lot of different perspectives that can be brought to bear on the impact of 
functional changes, but we equally have to make sure there is sufficient time 
for those perspectives to emerge.

Somehow it feels that we're getting the priorities on reviews wrong when a low 
priority changes like this which can  go through in a matter of days, when 
there are bug fixes such as https://review.openstack.org/#/c/57708/ which have 
been sitting for over a month with a number of +1's which don't seem to be 
making any progress.

Cheers,
Phil 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] time consuming of listing resource

2013-12-28 Thread Jay Pipes

On 12/28/2013 05:51 AM, 刘胜 wrote:

Hi all:
I have reported a bug about time consuming of “resource-list” in
ceilometer CLI:
https://bugs.launchpad.net/ceilometer/+bug/1264434

In order to Identify the causes of this phenomenon, I have pdb the codes
in my invironment(configured  mysql as db driver):
the most import part of process of listing resource is implemented in
following codes:

code of get_resources() in /ceilometer/storage/impl_sqlalchemy.py:

  for meter, first_ts, last_ts in query.all():
 yield api_models.Resource(
 resource_id=meter.resource_id,
 project_id=meter.project_id,
 first_sample_timestamp=first_ts,
 last_sample_timestamp=last_ts,
 source=meter.sources[0].id,
 user_id=meter.user_id,
 metadata=meter.resource_metadata,
 meter=[
 api_models.ResourceMeter(
 counter_name=m.counter_name,
 counter_type=m.counter_type,
 counter_unit=m.counter_unit,
 )
for m in meter.resource.meters
 ],
 )
The method  generate iterator of object of api_models.Resource for
ceilometer API to show.
1.The operation “query.all()” will query the DB table “meter” with the
expression generated forward,in my invironment the DB table “meter” have
more than 30 items, so this operation may consume about 30 seconds;
2.The operation"for m in meter.resource.meters" will circulate the
meters of this resource . a resource of server may have more than 10
meter iterms in my invironment.  So the time of whole process is too
long. I think the meter of Resource object can be reduced and I have
tested this modification, it is OK for listing resource,and reduce the
most time consumption

I have noticed that there are many methods of db operation may time
consumption.

ps: I have configured the ceilometer pulling interval from 600s to 60s
in /etc/ceilometer/pipeline.yaml, but the invironment has just run 10 days!

I'm a beginner of ceilometer,and want to fix this bug,but I haven't
found a suitable way
may be someone can help me with this?


Yep. The performance of the SQL driver in Ceilometer out-of-the-box with 
that particular line is unusable in our experience. We have our Chef 
cookbook literally patch Ceilometer's source code and comment out that 
particular line because it makes performance of Ceilometer unusable.


I hate to say it, but the SQL driver in Ceilometer really needs an 
overhaul, both at the schema level and the code level:


On the schema level:

* The indexes, especially on sourceassoc, are wrong:
 ** The order of the columns in the multi-column indexes like idx_sr, 
idx_sm, idx_su, idx_sp is incorrect. Columns used in predicates should 
*precede* columns (like source_id) that are used in joins. The way the 
indexes are structured now makes them unusable by the query optimizer 
for 99% of queries on the sourceassoc table, which means any queries on 
sourceassoc trigger a full table scan of the hundreds of millions of 
records in the table. Things are made worse by the fact that INSERT 
operations are slowed for each index on a table, and the fact that none 
of these indexes are used just means we're wasting cycles on each INSERT 
for no reason.
 ** The indexes are across the entire VARCHAR(255) field width. This 
isn't necessary (and I would argue that the base field type should be 
smaller). Index width can be reduced (and performance increased) by 
limiting the indexable width to 32 (or smaller).


The solution to the main indexing issues is to do the following:

DROP INDEX idx_sr ON sourceassoc;
CREATE INDEX idx_sr ON sourceassoc (resource_id(32), source_id(32));
DROP INDEX idx_sp ON sourceassoc;
CREATE INDEX idx_sp ON sourceassoc (project_id(32), source_id(32));
DROP INDEX idx_su ON sourceassoc;
CREATE INDEX idx_su ON sourceassoc (user_id(32), source_id(32));
DROP INDEX idx_sm ON sourceassoc;
CREATE INDEX idx_sm ON sourceassoc (meter_id, source_id(32));

Keep in mind if you have (hundreds of) millions of records in the 
sourceassoc table, the above will take a long time to run. It will take 
hours, but you'll be happy you did it. You'll see the database 
performance increase dramatically.


* The columns that refer to IDs of various kinds should not be UTF8. 
Changing these columns to a latin1 or even binary charset would cut the 
space requirements for the data and index storage by 65%. This means you 
can fit around 3x as many records in the same data and index pages. The 
more records you fit into an index page, the faster seeks and scans will be.


* sourceassoc has no primary key.

* The meter table has the following:

  KEY ix_meter_id (id)

  which is entirely redundant (id is the primary key) and does nothing 
but slow down insert operations for every record in the meter table.


* The meter table mixes frequently s

Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-28 Thread Clint Byrum
Hi Phil. Thanks for the well reasoned and poignant message urging
caution and forethought in change management. I agree with all of the
sentiments and think that we can do better in reasoning about the impact
of changes. I think this just puts further exposure on the fact that
Nova needs reviewers desperately so that reviewers can slow down.

However, I think this is primarily an exposure in our gate testing. If
there are older OS's we want to be able to support, we should be booting
them in the gate and testing that the ephemeral disk works on them. What
is a cloud that can't boot workloads?

While our ability to reason is a quite effective way to stop emergent
problems, we know these are precious and scarce resources, and thus
we should use mechanical methods before falling back to reviewers and
developers.

So, I'd suggest that we add a test that the ephemeral disk mounts in
any desired OS's to tempest. If that is infeasible (due to nested KVM
in the gate being slllo) then I'm afraid I don't have a solution.

Excerpts from Day, Phil's message of 2013-12-28 07:21:16 -0800:
> Hi Folks,
> 
> I know it may seem odd to be arguing for slowing down a part of the review 
> process, but I'd like to float the idea that there should be a minimum review 
> period for patches that change existing functionality in a way that isn't 
> backwards compatible.
> 
> The specific change that got me thinking about this is 
> https://review.openstack.org/#/c/63209/ which changes the default fs type 
> from ext3 to ext4.I agree with the comments in the commit message that 
> ext4 is a much better filesystem, and it probably does make sense to move to 
> that as the new default at some point, however there are some old OS's that 
> may still be in use that don't support ext4.  By making this change to the 
> default without any significant notification period this change has the 
> potential to brake existing images and snapshots.  It was already possible to 
> use ext4 via existing configuration values, so there was no urgency to this 
> change (and no urgency implied in the commit messages, which is neither a bug 
> or blueprint).
> 
> I'm not trying to pick out the folks involved in this change in particular, 
> it just happened to serve as a good and convenient example of something that 
> I think we need to be more aware of and think about having some specific 
> policy around.  On the plus side the reviewers did say they would wait 24 
> hours to see if anyone objected, and the actual review went over 4 days - but 
> I'd suggest that is still far too quick even in a non-holiday period for 
> something which is low priority (the functionality could already be achieved 
> via existing configuration options) and which is a change in default 
> behaviour.  (In the period around a major holiday there probable needs to be 
> an even longer wait). I know there are those that don't want to see 
> blueprints for every minor functional change to the system, but maybe this is 
> a case where a blueprint being proposed and reviewed may have caught the 
> impact of the change.With a number of people now using a continual 
> deployment approach any chan
 ge in default behaviour needs to be considered not just  for the benefits it 
brings but what it might break.  The advantage we have as a community is that 
there are lot of different perspectives that can be brought to bear on the 
impact of functional changes, but we equally have to make sure there is 
sufficient time for those perspectives to emerge.
> 
> Somehow it feels that we're getting the priorities on reviews wrong when a 
> low priority changes like this which can  go through in a matter of days, 
> when there are bug fixes such as https://review.openstack.org/#/c/57708/ 
> which have been sitting for over a month with a number of +1's which don't 
> seem to be making any progress.
> 
> Cheers,
> Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack] Some question about image provision in openstack

2013-12-28 Thread Clint Byrum
Excerpts from Pengfei Zhang's message of 2013-12-25 21:47:29 -0800:
> Hi,
>  I come across two question about the image provision and distribute in 
> openstack(nova),
> 1.Afaik, in current version, nova-compute use the curl to download image from 
> glance (or other places). Are there any alternative methods to choose (such 
> torrent)?

We've been looking into alternatives as part of the TripleO project to
deploy OpenStack on top of itself. Torrent-like methods do look
promising.

> 2.In fact, to boot a VM above hypervisor, there is no need to transfer the 
> whole image-file to local. Will the mechanism to transfer the image on demand 
> makes sense?
>

The simplest solution like that today that I know of is to put your
image storage in CEPH, setup Cinder to use CEPH, and then boot from
volume with thin provisioning. The image will never be "transferred",
the root volume will just always live on a writable CEPH snapshot.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

2013-12-28 Thread Day, Phil
Hi Folks,

I know it may seem odd to be arguing for slowing down a part of the review 
process, but I'd like to float the idea that there should be a minimum review 
period for patches that change existing functionality in a way that isn't 
backwards compatible.

The specific change that got me thinking about this is 
https://review.openstack.org/#/c/63209/ which changes the default fs type from 
ext3 to ext4.I agree with the comments in the commit message that ext4 is a 
much better filesystem, and it probably does make sense to move to that as the 
new default at some point, however there are some old OS's that may still be in 
use that don't support ext4.  By making this change to the default without any 
significant notification period this change has the potential to brake existing 
images and snapshots.  It was already possible to use ext4 via existing 
configuration values, so there was no urgency to this change (and no urgency 
implied in the commit messages, which is neither a bug or blueprint).

I'm not trying to pick out the folks involved in this change in particular, it 
just happened to serve as a good and convenient example of something that I 
think we need to be more aware of and think about having some specific policy 
around.  On the plus side the reviewers did say they would wait 24 hours to see 
if anyone objected, and the actual review went over 4 days - but I'd suggest 
that is still far too quick even in a non-holiday period for something which is 
low priority (the functionality could already be achieved via existing 
configuration options) and which is a change in default behaviour.  (In the 
period around a major holiday there probable needs to be an even longer wait).  
   I know there are those that don't want to see blueprints for every minor 
functional change to the system, but maybe this is a case where a blueprint 
being proposed and reviewed may have caught the impact of the change.With a 
number of people now using a continual deployment approach any change in 
default behaviour needs to be considered not just  for the benefits it brings 
but what it might break.  The advantage we have as a community is that there 
are lot of different perspectives that can be brought to bear on the impact of 
functional changes, but we equally have to make sure there is sufficient time 
for those perspectives to emerge.

Somehow it feels that we're getting the priorities on reviews wrong when a low 
priority changes like this which can  go through in a matter of days, when 
there are bug fixes such as https://review.openstack.org/#/c/57708/ which have 
been sitting for over a month with a number of +1's which don't seem to be 
making any progress.

Cheers,
Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][Testr] Brand new checkout of Neutron... getting insane unit test run results

2013-12-28 Thread Jay Pipes

On 12/27/2013 11:11 PM, Robert Collins wrote:

I'm really sorry about the horrid UI - we're in the middle of fixing
the plumbing to report this and support things like tempest better -
from the bottom up. The subunit listing -> testr reporting of listing
errors is fixed on the subunit side, but not on the the testr side
yet.

If you look at the end of the error:

\rimport 
errors4neutron.tests.unit.linuxbridge.test_lb_neutron_agent\x85\xc5\x1a\\',
stderr=None
error: testr failed (3)

You can see this^

which translates as
import errors
neutron.tests.unit.linuxbridge.test_lb_neutron_agent

so

neutron/tests/unit/linuxbridge/test_lb_neutron_agent.py

is failing to import.


Phew, thanks Rob! I was a bit stumped there :) I have identified the 
import issue (this is on a fresh checkout of Neutron, BTW, so I'm a 
little confused how this made it through the gate...


(.venv)jpipes@uberbox:~/repos/openstack/neutron$ python
Python 2.7.4 (default, Sep 26 2013, 03:20:26)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import neutron.tests.unit.linuxbridge.test_lb_neutron_agent
Traceback (most recent call last):
  File "", line 1, in 
  File "neutron/tests/unit/linuxbridge/test_lb_neutron_agent.py", line 
29, in 

from neutron.plugins.linuxbridge.agent import linuxbridge_neutron_agent
  File 
"neutron/plugins/linuxbridge/agent/linuxbridge_neutron_agent.py", line 
33, in 

import pyudev
ImportError: No module named pyudev

Looks like pyudev needs to be added to requirements.txt... I've filed a bug:

https://bugs.launchpad.net/neutron/+bug/1264687

with a patch here:

https://review.openstack.org/#/c/64333/

Thanks again, much appreciated!
-jay


On 28 December 2013 13:41, Jay Pipes  wrote:

Please see:

http://paste.openstack.org/show/57627/

This is on a brand new git clone of neutron and then running ./run_tests.sh
-V (FWIW, the same behavior occurs when running with tox -epy27 as well...)

I have zero idea what to do...any help would be appreciated!

It's almost like the subunit stream is being dumped as-is into the console.

Best!
-jay







___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ceilometer] time consuming of listing resource

2013-12-28 Thread Haomai Wang
I think the better way is save meters as a field in resource table.

You can look at MongoDB model and may get some ideas.

Beside above, sql backend can introduce Memcache to improve performance.
IMO, the best way may be redesign the sql model to match workload.



On Sat, Dec 28, 2013 at 6:51 PM, 刘胜  wrote:

> Hi all:
> I have reported a bug about time consuming of “resource-list” in
> ceilometer CLI:
> https://bugs.launchpad.net/ceilometer/+bug/1264434
>
> In order to Identify the causes of this phenomenon, I have pdb the codes
> in my invironment(configured  mysql as db driver):
> the most import part of process of listing resource is implemented in
> following codes:
>
> code of get_resources() in /ceilometer/storage/impl_sqlalchemy.py:
> 
>  for meter, first_ts, last_ts in query.all():
> yield api_models.Resource(
> resource_id=meter.resource_id,
> project_id=meter.project_id,
> first_sample_timestamp=first_ts,
> last_sample_timestamp=last_ts,
> source=meter.sources[0].id,
> user_id=meter.user_id,
> metadata=meter.resource_metadata,
> meter=[
> api_models.ResourceMeter(
> counter_name=m.counter_name,
> counter_type=m.counter_type,
> counter_unit=m.counter_unit,
> )
> for m in meter.resource.meters
> ],
> )
> The method  generate iterator of object of   api_models.Resource for
> ceilometer API to show.
> 1.The operation “query.all()” will query the DB table “meter” with the
> expression generated forward,in my invironment the DB table “meter” have
> more than 30 items, so this operation may consume about 30 seconds;
> 2.The operation  "for m in meter.resource.meters" will circulate the
> meters of this resource . a resource of server may have more than 10
> meter iterms in my invironment.  So the time of whole process is too
> long. I think the meter of Resource object can be reduced and I have
> tested this modification, it is OK for listing resource,and reduce the
> most time consumption
>
> I have noticed that there are many methods of db operation may time
> consumption.
>
> ps: I have configured the ceilometer pulling interval from 600s to 60s in 
> /etc/ceilometer/pipeline.yaml,
> but the invironment has just run 10 days!
>
> I'm a beginner of ceilometer,and want to fix this bug,but I haven't found
> a suitable way
> may be someone can help me with this?
>
> Best Regards
> liusheng
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 

Best regards,

Haomai Wang, UnitedStack Inc.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ceilometer] time consuming of listing resource

2013-12-28 Thread 刘胜
Hi all:
I have reported a bug about time consuming of “resource-list” in ceilometer CLI:
https://bugs.launchpad.net/ceilometer/+bug/1264434


In order to Identify the causes of this phenomenon, I have pdb the codes in my 
invironment(configured  mysql as db driver):
the most import part of process of listing resource is implemented in following 
codes:


code of get_resources() in /ceilometer/storage/impl_sqlalchemy.py:

 for meter, first_ts, last_ts in query.all():
yield api_models.Resource(
resource_id=meter.resource_id,
project_id=meter.project_id,
first_sample_timestamp=first_ts,
last_sample_timestamp=last_ts,
source=meter.sources[0].id,
user_id=meter.user_id,
metadata=meter.resource_metadata,
meter=[
api_models.ResourceMeter(
counter_name=m.counter_name,
counter_type=m.counter_type,
counter_unit=m.counter_unit,
)
for m in meter.resource.meters
],
)
The method  generate iterator of object of   api_models.Resource for ceilometer 
API to show.
1.The operation “query.all()” will query the DB table “meter” with the 
expression generated forward,in my invironment the DB table “meter” have more 
than 30 items, so this operation may consume about 30 seconds;
2.The operation  "for m in meter.resource.meters" will circulate the meters of 
this resource . a resource of server may have more than 10 meter iterms in 
my invironment.  So the time of whole process is too long. I think the meter of 
Resource object can be reduced and I have tested this modification, it is OK 
for listing resource,and reduce the most time consumption


I have noticed that there are many methods of db operation may time 
consumption. 


ps: I have configured the ceilometer pulling interval from 600s to 60s in 
/etc/ceilometer/pipeline.yaml, but the invironment has just run 10 days!


I'm a beginner of ceilometer,and want to fix this bug,but I haven't found a 
suitable way
may be someone can help me with this?


Best Regards
liusheng

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack] Quota delegation tool (for nova) ?

2013-12-28 Thread Tim Bell

I'm not sure how Climate would map to the non-predictable nature of the 
workload. I had understood Climate as providing a booking system to reserve 
resources in the future (which is a valuable use case but not quite the problem 
Ulrich is describing of delegation of quota).

Looking at https://blueprints.launchpad.net/nova/+spec/domain-quota-driver, it 
appears that there is a quota driver using Domains being developed for Icehouse 
in Nova. I don't know if it completely covers the use case (i.e. quotas on both 
projects and domains), but if this is the case, the delegation might be handled 
with the domain/project structure and an appropriate policy definition 
(http://docs.openstack.org/trunk/openstack-ops/content/customize_auth.html) 
where the domain manager has the rights to modify the quota of the projects as 
well as the project manager.

With the groups functions mapping onto roles, I think this functionality could 
be built using the domain quota driver (or a derivative of it), policies and 
groups and allow other kinds of delegation in addition to quotas (such as 
shared image upload).

Tim


On 26 Dec 2013, at 20:51, Dina Belova 
mailto:dbel...@mirantis.com>> wrote:

That quota staff has been following me from summit where we discussed that with 
Tim. Also, Ulrich, Sylvain is right - speaking about one piece of cake for one 
customer, our Climate (Reservation-as-a-Service) might help that. That piece 
might be some amount of hosts with specific (customer specific) 
characteristics, or just some already created and reserved virtual capacity 
measured in certain amount of VMs, volumes, etc.

I'll be here in mailing list (and, probably, on our IRC channel 
#openstack-climate) during all holidays, so you are welcome! Now I'm working on 
better documentation for Climate just to give link and that's it, but now I may 
only explain that by mails and so on :)

[Climate Launchpad] https://launchpad.net/climate
[Hosts Reservation BP] 
https://wiki.openstack.org/wiki/Blueprint-nova-planned-resource-reservation-api
[Climate wiki (not compete one)] 
https://wiki.openstack.org/wiki/Resource-reservation-service


On Thu, Dec 26, 2013 at 9:44 PM, Sylvain Bauza 
mailto:sylvain.ba...@gmail.com>> wrote:

Hi Ulrich,
I already discussed with Tim during last Swiss meetup at CERN about how Climate 
could maybe help you on your use cases. There are still many things to discuss 
and a demo to run out so we could see if it match your needs.

Basically, Climate is a new Stackforge project planning to implement resource 
reservations in OpenStack, including but not exhaustively Nova instances or 
nova-compute nodes. Resources can be allocated to either full tenants or to a 
specific user and can be provisioned now or in a certain period of time.

About quotas, that's something not yet planned but kind of nice feature to have.

Sorry but as I'm being in vacations, I don't have way to give you more inputs 
on this (typing from my very limited phone...) but should you be interested in, 
just give a shot and search on ML, you'll find previous pointers.

-Sylvain

Le 26 déc. 2013 08:04, "Ulrich Schwickerath" 
mailto:ulrich.schwicker...@cern.ch>> a écrit :

Dear all,

I'd like to trigger a new discussion about the future of quota management in 
OpenStack. Let me start with our main user story to clarify what we need.
I'm working for CERN for the IT departement. We're providing computing 
resources to our customers, either through traditional batch farms or through 
an OpenStack IaaS
infrastructure. Our main customers are the LHC experiments, which by themselves 
are fairly large dynamic organizations with complex internal structures, with 
specific requirements
and many thousand users from many different countries and regions. Computing 
resources are centralized, and each customer organization gets it's share of 
the cake.

Instead of trying to keep track of the internal structures of our customers and 
their changing needs, we need a way to allocate one piece of the big cake to 
each customer (and adjust it regularely), and give them the possibility to 
manage these resources themselves. What I have in mind here is the idea of a 
"Quota delegation":

- The main resource manager determines the fractions of the resources for each 
customer
- He allocates a quota to each customer by giving it to a "computing 
coordinater" which is nominated by the customer
- the computing coordinater in turn takes his piece of the cake, chops it up 
and gives it to the coordinators of the different research groups in his 
experiment

and so on.

I'd like to ask people for their opinion on how such a schema should be 
implemented. There are several aspects which need to be taken into account here:
- There are people with different roles in this game:
  +- the main resource manager role is a super user role which can but does not 
have to be identical to the cloud manager.
 Persons with this role should be able to change all numbers down in