Re: [Openstack-operators] [Scale][Performance] / compute_nodes ratio experience

2016-03-15 Thread Gustavo Randich
We recently had a power outage, and perhaps one of the scenarios of
controller capacity planning is starting all of the compute nodes at once
or in large batches (when power was restored).

We painfully learned about our nova-conductor being low on workers/cores,
but still we doubted whether it was a problem of our deployment. Now we
know nova-conductor is very resource hungry.

Official recommendations about node ratios would be very appreciated.



On Thu, Nov 19, 2015 at 8:36 PM, Rochelle Grober  wrote:

> Sorry this doesn't thread properly, but cut and pasted out of the digest...
>
>
>
> > As providing OpenStack community with understandable recommendations
>
> > and instructions on performant OpenStack cloud deployments is part of
>
> > Performance Team mission, I'm kindly asking you to share your
>
> > experience on safe cloud deployment ratio between various types of
>
> > nodes you're having right now and the possible issues you observed (as
>
> > an example: discussed GoDaddy's cloud is having 3 conductor boxes vs
>
> > 250 computes in the cell, and there was an opinion that it's simply
>
> > not enough and that's it).
>
>
>
> That was my opinion, and it was based on an apparently incorrect
> assumption that they had a lot of things coming and going on their cloud. I
> think they've demonstrated at this point that (other issues
>
> aside) three is enough for them, given their environment, workload, and
> configuration.
>
>
>
> This information is great for building rules of thumb, so to speak.
> GoDaddy has an example configuraton that is adequate for low frequency
> construct/destruct (low number of vm create/destroy) cloud architectures.
> This provides a lower bounds and might be representative of a lot of
> enterprise cloud deployments.
>
>
>
> The problem with coming up with any sort of metric that will apply to
> everyone is that it's highly variable. If you have 250 compute nodes and
> never create or destroy any instances, you'll be able to get away with
>
> *many* fewer conductors than if you have a very active cloud. Similarly,
> during a live upgrade (or following any upgrade where we do some online
> migration of data), your conductor load will be higher than normal. Of
> course, 4-core and 96-core conductor nodes aren't equal either.
>
>
>
> And here we have another rule of thumb, but no numbers put to it yet.  If
> you have a low frequency construct/destruct cloud model, you will need to
> temporarily increase your number of conductors by {x amount OR x%} when
> performing OpenStack live upgrades.
>
>
>
> So, by all means, we should gather information on what people are doing
> successfully, but keep in mind that it depends *a lot* on what sort of
> workloads the cloud is supporting.
>
>
>
> Right, but we can start applying fuzzy logic (the human kind, not machine)
> and get a better understanding of working configurations and **why** they
> work, then start examining where the transition states between
> configurations are.   You need data before you can create information ;-)
>
>
>
> --Rocky
>
>
>
> --Dan
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Scale][Performance] / compute_nodes ratio experience

2015-11-19 Thread Rochelle Grober
Sorry this doesn't thread properly, but cut and pasted out of the digest...



> As providing OpenStack community with understandable recommendations

> and instructions on performant OpenStack cloud deployments is part of

> Performance Team mission, I'm kindly asking you to share your

> experience on safe cloud deployment ratio between various types of

> nodes you're having right now and the possible issues you observed (as

> an example: discussed GoDaddy's cloud is having 3 conductor boxes vs

> 250 computes in the cell, and there was an opinion that it's simply

> not enough and that's it).



That was my opinion, and it was based on an apparently incorrect assumption 
that they had a lot of things coming and going on their cloud. I think they've 
demonstrated at this point that (other issues

aside) three is enough for them, given their environment, workload, and 
configuration.



This information is great for building rules of thumb, so to speak.  GoDaddy 
has an example configuraton that is adequate for low frequency 
construct/destruct (low number of vm create/destroy) cloud architectures.  This 
provides a lower bounds and might be representative of a lot of enterprise 
cloud deployments.



The problem with coming up with any sort of metric that will apply to everyone 
is that it's highly variable. If you have 250 compute nodes and never create or 
destroy any instances, you'll be able to get away with

*many* fewer conductors than if you have a very active cloud. Similarly, during 
a live upgrade (or following any upgrade where we do some online migration of 
data), your conductor load will be higher than normal. Of course, 4-core and 
96-core conductor nodes aren't equal either.



And here we have another rule of thumb, but no numbers put to it yet.  If you 
have a low frequency construct/destruct cloud model, you will need to 
temporarily increase your number of conductors by {x amount OR x%} when 
performing OpenStack live upgrades.



So, by all means, we should gather information on what people are doing 
successfully, but keep in mind that it depends *a lot* on what sort of 
workloads the cloud is supporting.



Right, but we can start applying fuzzy logic (the human kind, not machine) and 
get a better understanding of working configurations and *why* they work, then 
start examining where the transition states between configurations are.   You 
need data before you can create information ;-)



--Rocky


--Dan
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Scale][Performance] / compute_nodes ratio experience

2015-11-19 Thread Dina Belova
Dan,

sure, I did not mean that we should collect this information without
understanding the workloads happening on the cloud, but still this is very
interesting information to gather.

Cheers,
Dina

On Thu, Nov 19, 2015 at 1:52 AM, Dan Smith  wrote:

> > As providing OpenStack community with understandable recommendations
> > and instructions on performant OpenStack cloud deployments is part
> > of Performance Team mission, I'm kindly asking you to share your
> > experience on safe cloud deployment ratio between various types of
> > nodes you're having right now and the possible issues you observed
> > (as an example: discussed GoDaddy's cloud is having 3 conductor boxes
> > vs 250 computes in the cell, and there was an opinion that it's
> > simply not enough and that's it).
>
> That was my opinion, and it was based on an apparently incorrect
> assumption that they had a lot of things coming and going on their
> cloud. I think they've demonstrated at this point that (other issues
> aside) three is enough for them, given their environment, workload, and
> configuration.
>
> The problem with coming up with any sort of metric that will apply to
> everyone is that it's highly variable. If you have 250 compute nodes and
> never create or destroy any instances, you'll be able to get away with
> *many* fewer conductors than if you have a very active cloud. Similarly,
> during a live upgrade (or following any upgrade where we do some online
> migration of data), your conductor load will be higher than normal. Of
> course, 4-core and 96-core conductor nodes aren't equal either.
>
> So, by all means, we should gather information on what people are doing
> successfully, but keep in mind that it depends *a lot* on what sort of
> workloads the cloud is supporting.
>
> --Dan
>



-- 

Best regards,

Dina Belova

Software Engineer

Mirantis Inc.
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Scale][Performance] / compute_nodes ratio experience

2015-11-18 Thread Dan Smith
> As providing OpenStack community with understandable recommendations
> and instructions on performant OpenStack cloud deployments is part
> of Performance Team mission, I'm kindly asking you to share your
> experience on safe cloud deployment ratio between various types of
> nodes you're having right now and the possible issues you observed
> (as an example: discussed GoDaddy's cloud is having 3 conductor boxes
> vs 250 computes in the cell, and there was an opinion that it's
> simply not enough and that's it).

That was my opinion, and it was based on an apparently incorrect
assumption that they had a lot of things coming and going on their
cloud. I think they've demonstrated at this point that (other issues
aside) three is enough for them, given their environment, workload, and
configuration.

The problem with coming up with any sort of metric that will apply to
everyone is that it's highly variable. If you have 250 compute nodes and
never create or destroy any instances, you'll be able to get away with
*many* fewer conductors than if you have a very active cloud. Similarly,
during a live upgrade (or following any upgrade where we do some online
migration of data), your conductor load will be higher than normal. Of
course, 4-core and 96-core conductor nodes aren't equal either.

So, by all means, we should gather information on what people are doing
successfully, but keep in mind that it depends *a lot* on what sort of
workloads the cloud is supporting.

--Dan

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Scale][Performance] / compute_nodes ratio experience

2015-11-18 Thread Belmiro Moreira
Hi,
we are still running nova Juno and I don't see this performance issue.
(I can comment on Kilo next week).

Per cell, we have a node that runs conductor + other control plane services.
The number of conductor workers can change between 16 to 48.
We try to not have more than 200 compute nodes per cell.

Belmiro
CERN

On Wed, Nov 18, 2015 at 10:56 AM, Dina Belova  wrote:

> Dear operators,
>
> yesterday we (Performance Team) had weekly IRC meeting
> ,
> one of the things on board to discuss was nova-conductor performance issue
>  raised by
> Kris Lindgren (GoDaddy).
>
> There are still things to investigate and several suggestions about nature
> of this behaviour, but one of the questions/ideas raised was "If the
> conductor nodes to compute nodes ratio was ok?".
>
> In fact I never saw any recommendations on safe enough ratio for conductor
> nodes or networking nodes or whatever nodes vs compute ones.
>
> As providing OpenStack community with understandable recommendations and
> instructions on performant OpenStack cloud deployments is part of
> Performance Team mission, I'm kindly asking you to share your experience on
> safe cloud deployment ratio between various types of nodes you're having
> right now and the possible issues you observed (as an example: discussed
> GoDaddy's cloud is having 3 conductor boxes vs 250 computes in the cell,
> and there was an opinion that it's simply not enough and that's it).
>
> Thanks in advance!
>
> Cheers,
> Dina
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [Scale][Performance] / compute_nodes ratio experience

2015-11-18 Thread Dina Belova
Dear operators,

yesterday we (Performance Team) had weekly IRC meeting
,
one of the things on board to discuss was nova-conductor performance issue
 raised by
Kris Lindgren (GoDaddy).

There are still things to investigate and several suggestions about nature
of this behaviour, but one of the questions/ideas raised was "If the
conductor nodes to compute nodes ratio was ok?".

In fact I never saw any recommendations on safe enough ratio for conductor
nodes or networking nodes or whatever nodes vs compute ones.

As providing OpenStack community with understandable recommendations and
instructions on performant OpenStack cloud deployments is part of
Performance Team mission, I'm kindly asking you to share your experience on
safe cloud deployment ratio between various types of nodes you're having
right now and the possible issues you observed (as an example: discussed
GoDaddy's cloud is having 3 conductor boxes vs 250 computes in the cell,
and there was an opinion that it's simply not enough and that's it).

Thanks in advance!

Cheers,
Dina
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators