Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-03 Thread Zane Bitter

On 03/09/15 02:56, Angus Salkeld wrote:

On Thu, Sep 3, 2015 at 3:53 AM Zane Bitter > wrote:

On 02/09/15 04:55, Steven Hardy wrote:
 > On Wed, Sep 02, 2015 at 04:33:36PM +1200, Robert Collins wrote:
 >> On 2 September 2015 at 11:53, Angus Salkeld
> wrote:
 >>
 >>> 1. limit the number of resource actions in parallel (maybe base
on the
 >>> number of cores)
 >>
 >> I'm having trouble mapping that back to 'and heat-engine is
running on
 >> 3 separate servers'.
 >
 > I think Angus was responding to my test feedback, which was a
different
 > setup, one 4-core laptop running heat-engine with 4 worker processes.
 >
 > In that environment, the level of additional concurrency becomes
a problem
 > because all heat workers become so busy that creating a large stack
 > DoSes the Heat services, and in my case also the DB.
 >
 > If we had a configurable option, similar to num_engine_workers, which
 > enabled control of the number of resource actions in parallel, I
probably
 > could have controlled that explosion in activity to a more
managable series
 > of tasks, e.g I'd set num_resource_actions to
(num_engine_workers*2) or
 > something.

I think that's actually the opposite of what we need.

The resource actions are just sent to the worker queue to get processed
whenever. One day we will get to the point where we are overflowing the
queue, but I guarantee that we are nowhere near that day. If we are
DoSing ourselves, it can only be because we're pulling *everything* off
the queue and starting it in separate greenthreads.


worker does not use a greenthread per job like service.py does.
This issue is if you have actions that are fast you can hit the db hard.

QueuePool limit of size 5 overflow 10 reached, connection timed out,
timeout 30

It seems like it's not very hard to hit this limit. It comes from simply
loading
the resource in the worker:
"/home/angus/work/heat/heat/engine/worker.py", line 276, in check_resource
"/home/angus/work/heat/heat/engine/worker.py", line 145, in _load_resource
"/home/angus/work/heat/heat/engine/resource.py", line 290, in load
resource_objects.Resource.get_obj(context, resource_id)


This is probably me being naive, but that sounds strange. I would have 
thought that there is no way to exhaust the connection pool by doing 
lots of actions in rapid succession. I'd have guessed that the only way 
to exhaust a connection pool would be to have lots of connections open 
simultaneously. That suggests to me that either we are failing to 
expeditiously close connections and return them to the pool, or that we 
are - explicitly or implicitly - processing a bunch of messages in parallel.



In an ideal world, we might only ever pull one task off that queue at a
time. Any time the task is sleeping, we would use for processing stuff
off the engine queue (which needs a quick response, since it is serving
the ReST API). The trouble is that you need a *huge* number of
heat-engines to handle stuff in parallel. In the reductio-ad-absurdum
case of a single engine only processing a single task at a time, we're
back to creating resources serially. So we probably want a higher number
than 1. (Phase 2 of convergence will make tasks much smaller, and may
even get us down to the point where we can pull only a single task at a
time.)

However, the fewer engines you have, the more greenthreads we'll have to
allow to get some semblance of parallelism. To the extent that more
cores means more engines (which assumes all running on one box, but
still), the number of cores is negatively correlated with the number of
tasks that we want to allow.

Note that all of the greenthreads run in a single CPU thread, so having
more cores doesn't help us at all with processing more stuff in
parallel.


Except, as I said above, we are not creating greenthreads in worker.


Well, maybe we'll need to in order to make things still work sanely with 
a low number of engines :) (Should be pretty easy to do with a semaphore.)


I think what y'all are suggesting is limiting the number of jobs that go 
into the queue... that's quite wrong IMO. Apart from the fact it's 
impossible (resources put jobs into the queue entirely independently, 
and have no knowledge of the global state required to throttle inputs), 
we shouldn't implement an in-memory queue with long-running tasks 
containing state that can be lost if the process dies - the whole point 
of convergence is we have... a message queue for that. We need to limit 
the rate that stuff comes *out* of the queue. And, again, since we have 
no knowledge of global state, we can only control the rate at which an 
individual worker processes tasks. The way to avoid killing the DB 

Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-03 Thread Angus Salkeld
On Thu, Sep 3, 2015 at 3:53 AM Zane Bitter  wrote:

> On 02/09/15 04:55, Steven Hardy wrote:
> > On Wed, Sep 02, 2015 at 04:33:36PM +1200, Robert Collins wrote:
> >> On 2 September 2015 at 11:53, Angus Salkeld 
> wrote:
> >>
> >>> 1. limit the number of resource actions in parallel (maybe base on the
> >>> number of cores)
> >>
> >> I'm having trouble mapping that back to 'and heat-engine is running on
> >> 3 separate servers'.
> >
> > I think Angus was responding to my test feedback, which was a different
> > setup, one 4-core laptop running heat-engine with 4 worker processes.
> >
> > In that environment, the level of additional concurrency becomes a
> problem
> > because all heat workers become so busy that creating a large stack
> > DoSes the Heat services, and in my case also the DB.
> >
> > If we had a configurable option, similar to num_engine_workers, which
> > enabled control of the number of resource actions in parallel, I probably
> > could have controlled that explosion in activity to a more managable
> series
> > of tasks, e.g I'd set num_resource_actions to (num_engine_workers*2) or
> > something.
>
> I think that's actually the opposite of what we need.
>
> The resource actions are just sent to the worker queue to get processed
> whenever. One day we will get to the point where we are overflowing the
> queue, but I guarantee that we are nowhere near that day. If we are
> DoSing ourselves, it can only be because we're pulling *everything* off
> the queue and starting it in separate greenthreads.
>

worker does not use a greenthread per job like service.py does.
This issue is if you have actions that are fast you can hit the db hard.

QueuePool limit of size 5 overflow 10 reached, connection timed out,
timeout 30

It seems like it's not very hard to hit this limit. It comes from simply
loading
the resource in the worker:
"/home/angus/work/heat/heat/engine/worker.py", line 276, in check_resource
"/home/angus/work/heat/heat/engine/worker.py", line 145, in _load_resource
"/home/angus/work/heat/heat/engine/resource.py", line 290, in load
resource_objects.Resource.get_obj(context, resource_id)



>
> In an ideal world, we might only ever pull one task off that queue at a
> time. Any time the task is sleeping, we would use for processing stuff
> off the engine queue (which needs a quick response, since it is serving
> the ReST API). The trouble is that you need a *huge* number of
> heat-engines to handle stuff in parallel. In the reductio-ad-absurdum
> case of a single engine only processing a single task at a time, we're
> back to creating resources serially. So we probably want a higher number
> than 1. (Phase 2 of convergence will make tasks much smaller, and may
> even get us down to the point where we can pull only a single task at a
> time.)
>
> However, the fewer engines you have, the more greenthreads we'll have to
> allow to get some semblance of parallelism. To the extent that more
> cores means more engines (which assumes all running on one box, but
> still), the number of cores is negatively correlated with the number of
> tasks that we want to allow.
>
> Note that all of the greenthreads run in a single CPU thread, so having
> more cores doesn't help us at all with processing more stuff in parallel.
>

Except, as I said above, we are not creating greenthreads in worker.

-A


>
> cheers,
> Zane.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-03 Thread Angus Salkeld
On Fri, Sep 4, 2015 at 12:48 AM Zane Bitter  wrote:

> On 03/09/15 02:56, Angus Salkeld wrote:
> > On Thu, Sep 3, 2015 at 3:53 AM Zane Bitter  > > wrote:
> >
> > On 02/09/15 04:55, Steven Hardy wrote:
> >  > On Wed, Sep 02, 2015 at 04:33:36PM +1200, Robert Collins wrote:
> >  >> On 2 September 2015 at 11:53, Angus Salkeld
> > > wrote:
> >  >>
> >  >>> 1. limit the number of resource actions in parallel (maybe base
> > on the
> >  >>> number of cores)
> >  >>
> >  >> I'm having trouble mapping that back to 'and heat-engine is
> > running on
> >  >> 3 separate servers'.
> >  >
> >  > I think Angus was responding to my test feedback, which was a
> > different
> >  > setup, one 4-core laptop running heat-engine with 4 worker
> processes.
> >  >
> >  > In that environment, the level of additional concurrency becomes
> > a problem
> >  > because all heat workers become so busy that creating a large
> stack
> >  > DoSes the Heat services, and in my case also the DB.
> >  >
> >  > If we had a configurable option, similar to num_engine_workers,
> which
> >  > enabled control of the number of resource actions in parallel, I
> > probably
> >  > could have controlled that explosion in activity to a more
> > managable series
> >  > of tasks, e.g I'd set num_resource_actions to
> > (num_engine_workers*2) or
> >  > something.
> >
> > I think that's actually the opposite of what we need.
> >
> > The resource actions are just sent to the worker queue to get
> processed
> > whenever. One day we will get to the point where we are overflowing
> the
> > queue, but I guarantee that we are nowhere near that day. If we are
> > DoSing ourselves, it can only be because we're pulling *everything*
> off
> > the queue and starting it in separate greenthreads.
> >
> >
> > worker does not use a greenthread per job like service.py does.
> > This issue is if you have actions that are fast you can hit the db hard.
> >
> > QueuePool limit of size 5 overflow 10 reached, connection timed out,
> > timeout 30
> >
> > It seems like it's not very hard to hit this limit. It comes from simply
> > loading
> > the resource in the worker:
> > "/home/angus/work/heat/heat/engine/worker.py", line 276, in
> check_resource
> > "/home/angus/work/heat/heat/engine/worker.py", line 145, in
> _load_resource
> > "/home/angus/work/heat/heat/engine/resource.py", line 290, in load
> > resource_objects.Resource.get_obj(context, resource_id)
>
> This is probably me being naive, but that sounds strange. I would have
> thought that there is no way to exhaust the connection pool by doing
> lots of actions in rapid succession. I'd have guessed that the only way
> to exhaust a connection pool would be to have lots of connections open
> simultaneously. That suggests to me that either we are failing to
> expeditiously close connections and return them to the pool, or that we
> are - explicitly or implicitly - processing a bunch of messages in
> parallel.
>

I suspect we are leaking sessions, I have updated this bug to make sure we
focus on figuring out the root cause of this before jumping to conclusions:
https://bugs.launchpad.net/heat/+bug/1491185

-A


>
> > In an ideal world, we might only ever pull one task off that queue
> at a
> > time. Any time the task is sleeping, we would use for processing
> stuff
> > off the engine queue (which needs a quick response, since it is
> serving
> > the ReST API). The trouble is that you need a *huge* number of
> > heat-engines to handle stuff in parallel. In the reductio-ad-absurdum
> > case of a single engine only processing a single task at a time,
> we're
> > back to creating resources serially. So we probably want a higher
> number
> > than 1. (Phase 2 of convergence will make tasks much smaller, and may
> > even get us down to the point where we can pull only a single task
> at a
> > time.)
> >
> > However, the fewer engines you have, the more greenthreads we'll
> have to
> > allow to get some semblance of parallelism. To the extent that more
> > cores means more engines (which assumes all running on one box, but
> > still), the number of cores is negatively correlated with the number
> of
> > tasks that we want to allow.
> >
> > Note that all of the greenthreads run in a single CPU thread, so
> having
> > more cores doesn't help us at all with processing more stuff in
> > parallel.
> >
> >
> > Except, as I said above, we are not creating greenthreads in worker.
>
> Well, maybe we'll need to in order to make things still work sanely with
> a low number of engines :) (Should be pretty easy to do with a semaphore.)
>
> I think what y'all are suggesting is limiting the number of jobs that go
> into 

Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-02 Thread Steven Hardy
On Wed, Sep 02, 2015 at 04:33:36PM +1200, Robert Collins wrote:
> On 2 September 2015 at 11:53, Angus Salkeld  wrote:
> 
> > 1. limit the number of resource actions in parallel (maybe base on the
> > number of cores)
> 
> I'm having trouble mapping that back to 'and heat-engine is running on
> 3 separate servers'.

I think Angus was responding to my test feedback, which was a different
setup, one 4-core laptop running heat-engine with 4 worker processes.

In that environment, the level of additional concurrency becomes a problem
because all heat workers become so busy that creating a large stack
DoSes the Heat services, and in my case also the DB.

If we had a configurable option, similar to num_engine_workers, which
enabled control of the number of resource actions in parallel, I probably
could have controlled that explosion in activity to a more managable series
of tasks, e.g I'd set num_resource_actions to (num_engine_workers*2) or
something.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-02 Thread Zane Bitter

On 02/09/15 04:55, Steven Hardy wrote:

On Wed, Sep 02, 2015 at 04:33:36PM +1200, Robert Collins wrote:

On 2 September 2015 at 11:53, Angus Salkeld  wrote:


1. limit the number of resource actions in parallel (maybe base on the
number of cores)


I'm having trouble mapping that back to 'and heat-engine is running on
3 separate servers'.


I think Angus was responding to my test feedback, which was a different
setup, one 4-core laptop running heat-engine with 4 worker processes.

In that environment, the level of additional concurrency becomes a problem
because all heat workers become so busy that creating a large stack
DoSes the Heat services, and in my case also the DB.

If we had a configurable option, similar to num_engine_workers, which
enabled control of the number of resource actions in parallel, I probably
could have controlled that explosion in activity to a more managable series
of tasks, e.g I'd set num_resource_actions to (num_engine_workers*2) or
something.


I think that's actually the opposite of what we need.

The resource actions are just sent to the worker queue to get processed 
whenever. One day we will get to the point where we are overflowing the 
queue, but I guarantee that we are nowhere near that day. If we are 
DoSing ourselves, it can only be because we're pulling *everything* off 
the queue and starting it in separate greenthreads.


In an ideal world, we might only ever pull one task off that queue at a 
time. Any time the task is sleeping, we would use for processing stuff 
off the engine queue (which needs a quick response, since it is serving 
the ReST API). The trouble is that you need a *huge* number of 
heat-engines to handle stuff in parallel. In the reductio-ad-absurdum 
case of a single engine only processing a single task at a time, we're 
back to creating resources serially. So we probably want a higher number 
than 1. (Phase 2 of convergence will make tasks much smaller, and may 
even get us down to the point where we can pull only a single task at a 
time.)


However, the fewer engines you have, the more greenthreads we'll have to 
allow to get some semblance of parallelism. To the extent that more 
cores means more engines (which assumes all running on one box, but 
still), the number of cores is negatively correlated with the number of 
tasks that we want to allow.


Note that all of the greenthreads run in a single CPU thread, so having 
more cores doesn't help us at all with processing more stuff in parallel.


cheers,
Zane.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-01 Thread Clint Byrum
Excerpts from Anant Patil's message of 2015-08-30 23:01:29 -0700:
> Hi Angus,
> 
> Thanks for doing the tests with convergence. We are now assured that
> convergence has not impacted the performance in a negative way. Given
> that, in convergence, a stack provisioning process goes through a lot of
> RPC calls, it puts a lot of load on the message broker and the request
> looses time in network traversal etc., and in effect would hamper the
> performance. As the results show, having more than 2 engines will always
> yield better results with convergence. Since the deployments usually
> have 2 or more engines, this works in favor of convergence.
> 
> I have always held that convergence is more for scale (how much/many)
> than for performance (response time), due to it's design of distributing
> load (resource provisioning from single stack) among heat engines and
> also due to the fact that heat actually spends a lot of time waiting for
> the delegated resource request to be completed, not doing much
> computation. However, with these tests, we can eliminate any
> apprehension of performance issues which would have inadvertently
> sneaked in, with our focus more on scalability and reliability, than on
> performance.
> 
> I was thinking we should be doing some scale testing where we have many
> bigger stacks provisioned and compare the results with legacy, where we
> measure memory, CPU and network bandwidth.
> 

Convergence would be worth it if it was 2x slower in response time, and
scaled 10% worse. Because while scalability is super important, the main
point is resilience to failure of an engine. Add in engine restarts,
failures, etc, to these tests, and I think the advantages will be quite
a bit more skewed toward convergence.

Really nice work everyone!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-01 Thread Fox, Kevin M
You can default it to the number of cores, but please make it configurable. 
Some ops cram lots of services onto one node, and one service doesn't get to 
monopolize all cores.

Thanks,
Kevin

From: Angus Salkeld [asalk...@mirantis.com]
Sent: Tuesday, September 01, 2015 4:53 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Heat] convergence rally test results (so far)

On Tue, Sep 1, 2015 at 10:45 PM Steven Hardy 
<sha...@redhat.com<mailto:sha...@redhat.com>> wrote:
On Fri, Aug 28, 2015 at 01:35:52AM +, Angus Salkeld wrote:
>Hi
>I have been running some rally tests against convergence and our existing
>implementation to compare.
>So far I have done the following:
> 1. defined a template with a resource
>groupA 
> https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template
> 2. the inner resource looks like
>this:A 
> https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.templateA
>  (it
>uses TestResource to attempt to be a reasonable simulation of a
>server+volume+floatingip)
> 3. defined a rally
>job:A 
> https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yamlA
>  that
>creates X resources then updates to X*2 then deletes.
> 4. I then ran the above with/without convergence and with 2,4,8
>heat-engines
>Here are the results compared:
>
> https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing
>Some notes on the results so far:
>  * A convergence with only 2 engines does suffer from RPC overload (it
>gets message timeouts on larger templates). I wonder if this is the
>problem in our convergence gate...
>  * convergence does very well with a reasonable number of engines
>running.
>  * delete is slightly slower on convergence
>Still to test:
>  * the above, but measure memory usage
>  * many small templates (run concurrently)

So, I tried running my many-small-templates here with convergence enabled:

https://bugs.launchpad.net/heat/+bug/1489548

In heat.conf I set:

max_resources_per_stack = -1
convergence_engine = true

Most other settings (particularly RPC and DB settings) are defaults.

Without convergence (but with max_resources_per_stack disabled) I see the
time to create a ResourceGroup of 400 nested stacks (each containing one
RandomString resource) is about 2.5 minutes (core i7 laptop w/SSD, 4 heat
workers e.g the default for a 4 core machine).

With convergence enabled, I see these errors from sqlalchemy:

File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 652, in
_checkout\nfairy = _ConnectionRecord.checkout(pool)\n', u'  File
"/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 444, in
checkout\nrec = pool._do_get()\n', u'  File
"/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 980, in
_do_get\n(self.size(), self.overflow(), self._timeout))\n',
u'TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection
timed out, timeout 30\n'].

I assume this means we're loading the DB much more in the convergence case
and overflowing the QueuePool?

Yeah, looks like it.


This seems to happen when the RPC call from the ResourceGroup tries to
create some of the 400 nested stacks.

Interestingly after this error, the parent stack moves to CREATE_FAILED,
but the engine remains (very) busy, to the point of being partially
responsive, so it looks like maybe the cancel-on-fail isnt' working (I'm
assuming it isn't error_wait_time because the parent stack has been marked
FAILED and I'm pretty sure it's been more than 240s).

I'll dig a bit deeper when I get time, but for now you might like to try
the stress test too.  It's a bit of a synthetic test, but it turns out to
be a reasonable proxy for some performance issues we observed when creating
large-ish TripleO deployments (which also create a large number of nested
stacks concurrently).

Thanks a lot for testing Steve! I'll make 2 bugs for what you have raised
1. limit the number of resource actions in parallel (maybe base on the number 
of cores)
2. the cancel on fail error

-Angus


Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-01 Thread Angus Salkeld
On Tue, Sep 1, 2015 at 10:45 PM Steven Hardy  wrote:

> On Fri, Aug 28, 2015 at 01:35:52AM +, Angus Salkeld wrote:
> >Hi
> >I have been running some rally tests against convergence and our
> existing
> >implementation to compare.
> >So far I have done the following:
> > 1. defined a template with a resource
> >groupA
> https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template
> > 2. the inner resource looks like
> >this:A
> https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.templateA
> (it
> >uses TestResource to attempt to be a reasonable simulation of a
> >server+volume+floatingip)
> > 3. defined a rally
> >job:A
> https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yamlA
> that
> >creates X resources then updates to X*2 then deletes.
> > 4. I then ran the above with/without convergence and with 2,4,8
> >heat-engines
> >Here are the results compared:
> >
> https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing
> >Some notes on the results so far:
> >  * A convergence with only 2 engines does suffer from RPC overload
> (it
> >gets message timeouts on larger templates). I wonder if this is
> the
> >problem in our convergence gate...
> >  * convergence does very well with a reasonable number of engines
> >running.
> >  * delete is slightly slower on convergence
> >Still to test:
> >  * the above, but measure memory usage
> >  * many small templates (run concurrently)
>
> So, I tried running my many-small-templates here with convergence enabled:
>
> https://bugs.launchpad.net/heat/+bug/1489548
>
> In heat.conf I set:
>
> max_resources_per_stack = -1
> convergence_engine = true
>
> Most other settings (particularly RPC and DB settings) are defaults.
>
> Without convergence (but with max_resources_per_stack disabled) I see the
> time to create a ResourceGroup of 400 nested stacks (each containing one
> RandomString resource) is about 2.5 minutes (core i7 laptop w/SSD, 4 heat
> workers e.g the default for a 4 core machine).
>
> With convergence enabled, I see these errors from sqlalchemy:
>
> File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 652, in
> _checkout\nfairy = _ConnectionRecord.checkout(pool)\n', u'  File
> "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 444, in
> checkout\nrec = pool._do_get()\n', u'  File
> "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 980, in
> _do_get\n(self.size(), self.overflow(), self._timeout))\n',
> u'TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection
> timed out, timeout 30\n'].
>
> I assume this means we're loading the DB much more in the convergence case
> and overflowing the QueuePool?
>

Yeah, looks like it.


>
> This seems to happen when the RPC call from the ResourceGroup tries to
> create some of the 400 nested stacks.
>
> Interestingly after this error, the parent stack moves to CREATE_FAILED,
> but the engine remains (very) busy, to the point of being partially
> responsive, so it looks like maybe the cancel-on-fail isnt' working (I'm
> assuming it isn't error_wait_time because the parent stack has been marked
> FAILED and I'm pretty sure it's been more than 240s).
>
> I'll dig a bit deeper when I get time, but for now you might like to try
> the stress test too.  It's a bit of a synthetic test, but it turns out to
> be a reasonable proxy for some performance issues we observed when creating
> large-ish TripleO deployments (which also create a large number of nested
> stacks concurrently).
>

Thanks a lot for testing Steve! I'll make 2 bugs for what you have raised
1. limit the number of resource actions in parallel (maybe base on the
number of cores)
2. the cancel on fail error

-Angus


>
> Steve
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-01 Thread Steven Hardy
On Fri, Aug 28, 2015 at 01:35:52AM +, Angus Salkeld wrote:
>Hi
>I have been running some rally tests against convergence and our existing
>implementation to compare.
>So far I have done the following:
> 1. defined a template with a resource
>groupA 
> https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template
> 2. the inner resource looks like
>this:A 
> https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.templateA
>  (it
>uses TestResource to attempt to be a reasonable simulation of a
>server+volume+floatingip)
> 3. defined a rally
>job:A 
> https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yamlA
>  that
>creates X resources then updates to X*2 then deletes.
> 4. I then ran the above with/without convergence and with 2,4,8
>heat-engines
>Here are the results compared:
>
> https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing
>Some notes on the results so far:
>  * A convergence with only 2 engines does suffer from RPC overload (it
>gets message timeouts on larger templates). I wonder if this is the
>problem in our convergence gate...
>  * convergence does very well with a reasonable number of engines
>running.
>  * delete is slightly slower on convergence
>Still to test:
>  * the above, but measure memory usage
>  * many small templates (run concurrently)

So, I tried running my many-small-templates here with convergence enabled:

https://bugs.launchpad.net/heat/+bug/1489548

In heat.conf I set:

max_resources_per_stack = -1
convergence_engine = true

Most other settings (particularly RPC and DB settings) are defaults.

Without convergence (but with max_resources_per_stack disabled) I see the
time to create a ResourceGroup of 400 nested stacks (each containing one
RandomString resource) is about 2.5 minutes (core i7 laptop w/SSD, 4 heat
workers e.g the default for a 4 core machine).

With convergence enabled, I see these errors from sqlalchemy:

File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 652, in
_checkout\nfairy = _ConnectionRecord.checkout(pool)\n', u'  File
"/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 444, in
checkout\nrec = pool._do_get()\n', u'  File
"/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 980, in
_do_get\n(self.size(), self.overflow(), self._timeout))\n',
u'TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection
timed out, timeout 30\n'].

I assume this means we're loading the DB much more in the convergence case
and overflowing the QueuePool?

This seems to happen when the RPC call from the ResourceGroup tries to
create some of the 400 nested stacks.

Interestingly after this error, the parent stack moves to CREATE_FAILED,
but the engine remains (very) busy, to the point of being partially
responsive, so it looks like maybe the cancel-on-fail isnt' working (I'm
assuming it isn't error_wait_time because the parent stack has been marked
FAILED and I'm pretty sure it's been more than 240s).

I'll dig a bit deeper when I get time, but for now you might like to try
the stress test too.  It's a bit of a synthetic test, but it turns out to
be a reasonable proxy for some performance issues we observed when creating
large-ish TripleO deployments (which also create a large number of nested
stacks concurrently).

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-01 Thread Anant Patil
When the stack fails, it is marked as FAILED and all the sync points
that are needed to trigger the next set of resources are deleted. The
resources at same level in the graph, like here, they are suppose to
timeout or fail for an exception. Many DB hits means that the cache
data we were maintaining is not being used in the way we intended.

I don't see if really need 1; if it works with legacy w/o putting any such
constraints, it should work with convergence as well.

--
Anant

On Wed, Sep 2, 2015 at 5:23 AM, Angus Salkeld  wrote:

> On Tue, Sep 1, 2015 at 10:45 PM Steven Hardy  wrote:
>
>> On Fri, Aug 28, 2015 at 01:35:52AM +, Angus Salkeld wrote:
>> >Hi
>> >I have been running some rally tests against convergence and our
>> existing
>> >implementation to compare.
>> >So far I have done the following:
>> > 1. defined a template with a resource
>> >groupA
>> https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template
>> > 2. the inner resource looks like
>> >this:A
>> https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.templateA
>> (it
>> >uses TestResource to attempt to be a reasonable simulation of a
>> >server+volume+floatingip)
>> > 3. defined a rally
>> >job:A
>> https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yamlA
>> that
>> >creates X resources then updates to X*2 then deletes.
>> > 4. I then ran the above with/without convergence and with 2,4,8
>> >heat-engines
>> >Here are the results compared:
>> >
>> https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing
>> >Some notes on the results so far:
>> >  * A convergence with only 2 engines does suffer from RPC overload
>> (it
>> >gets message timeouts on larger templates). I wonder if this is
>> the
>> >problem in our convergence gate...
>> >  * convergence does very well with a reasonable number of engines
>> >running.
>> >  * delete is slightly slower on convergence
>> >Still to test:
>> >  * the above, but measure memory usage
>> >  * many small templates (run concurrently)
>>
>> So, I tried running my many-small-templates here with convergence enabled:
>>
>> https://bugs.launchpad.net/heat/+bug/1489548
>>
>> In heat.conf I set:
>>
>> max_resources_per_stack = -1
>> convergence_engine = true
>>
>> Most other settings (particularly RPC and DB settings) are defaults.
>>
>> Without convergence (but with max_resources_per_stack disabled) I see the
>> time to create a ResourceGroup of 400 nested stacks (each containing one
>> RandomString resource) is about 2.5 minutes (core i7 laptop w/SSD, 4 heat
>> workers e.g the default for a 4 core machine).
>>
>> With convergence enabled, I see these errors from sqlalchemy:
>>
>> File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 652, in
>> _checkout\nfairy = _ConnectionRecord.checkout(pool)\n', u'  File
>> "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 444, in
>> checkout\nrec = pool._do_get()\n', u'  File
>> "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 980, in
>> _do_get\n(self.size(), self.overflow(), self._timeout))\n',
>> u'TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection
>> timed out, timeout 30\n'].
>>
>> I assume this means we're loading the DB much more in the convergence case
>> and overflowing the QueuePool?
>>
>
> Yeah, looks like it.
>
>
>>
>> This seems to happen when the RPC call from the ResourceGroup tries to
>> create some of the 400 nested stacks.
>>
>> Interestingly after this error, the parent stack moves to CREATE_FAILED,
>> but the engine remains (very) busy, to the point of being partially
>> responsive, so it looks like maybe the cancel-on-fail isnt' working (I'm
>> assuming it isn't error_wait_time because the parent stack has been marked
>> FAILED and I'm pretty sure it's been more than 240s).
>>
>> I'll dig a bit deeper when I get time, but for now you might like to try
>> the stress test too.  It's a bit of a synthetic test, but it turns out to
>> be a reasonable proxy for some performance issues we observed when
>> creating
>> large-ish TripleO deployments (which also create a large number of nested
>> stacks concurrently).
>>
>
> Thanks a lot for testing Steve! I'll make 2 bugs for what you have raised
> 1. limit the number of resource actions in parallel (maybe base on the
> number of cores)
> 2. the cancel on fail error
>
> -Angus
>
>
>>
>> Steve
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> 

Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-09-01 Thread Robert Collins
On 2 September 2015 at 11:53, Angus Salkeld  wrote:

> 1. limit the number of resource actions in parallel (maybe base on the
> number of cores)

I'm having trouble mapping that back to 'and heat-engine is running on
3 separate servers'.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-08-31 Thread Anant Patil
Hi Angus,

Thanks for doing the tests with convergence. We are now assured that
convergence has not impacted the performance in a negative way. Given
that, in convergence, a stack provisioning process goes through a lot of
RPC calls, it puts a lot of load on the message broker and the request
looses time in network traversal etc., and in effect would hamper the
performance. As the results show, having more than 2 engines will always
yield better results with convergence. Since the deployments usually
have 2 or more engines, this works in favor of convergence.

I have always held that convergence is more for scale (how much/many)
than for performance (response time), due to it's design of distributing
load (resource provisioning from single stack) among heat engines and
also due to the fact that heat actually spends a lot of time waiting for
the delegated resource request to be completed, not doing much
computation. However, with these tests, we can eliminate any
apprehension of performance issues which would have inadvertently
sneaked in, with our focus more on scalability and reliability, than on
performance.

I was thinking we should be doing some scale testing where we have many
bigger stacks provisioned and compare the results with legacy, where we
measure memory, CPU and network bandwidth.

--
Anant


On Fri, Aug 28, 2015 at 2:45 PM, Angus Salkeld 
wrote:

> On Fri, Aug 28, 2015 at 6:35 PM Sergey Lukjanov 
> wrote:
>
>> Hi,
>>
>> great, it seems like migration to convergence could happen soon.
>>
>> How many times you were running each test case? Does time changing with
>> number of iterations? Are you planning to test parallel stacks creation?
>>
>
> Given the test matrix convergence/non-convergence and 2,4,8 engines, I
> have not done a lot of iterations - it's just time consuming. I might kill
> off the 2-engine case to gain more iterations.
> But from what I have observed the duration does not vary significantly.
>
> I'll test smaller stacks with lots of iterations and with a high
> concurrency. All this testing is currently on just one host so it is
> somewhat limited. Hopefully this is at least giving a useful comparison
> with these limitations.
>
> -Angus
>
>
>>
>> Thanks.
>>
>> On Fri, Aug 28, 2015 at 10:17 AM, Sergey Kraynev 
>> wrote:
>>
>>> Angus!
>>>
>>> it's Awesome!  Thank you for the investigation.
>>> I had a talk with guys from Sahara team and we decided to start testing
>>> convergence with Sahara after L release.
>>> I suppose, that Murano can also join to this process.
>>>
>>> Also AFAIK Sahara team plan to create functional tests with Heat-engine.
>>> We may add it as a non-voting job for our gate.
>>> Probably it will be good to have two different type of this job: with
>>> convergence and with default Heat.
>>>
>>> On 28 August 2015 at 04:35, Angus Salkeld  wrote:
>>>
 Hi

 I have been running some rally tests against convergence and our
 existing implementation to compare.

 So far I have done the following:

1. defined a template with a resource group

 https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template
2. the inner resource looks like this:

 https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.template
  (it
uses TestResource to attempt to be a reasonable simulation of a
server+volume+floatingip)
3. defined a rally job:

 https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yaml
  that
creates X resources then updates to X*2 then deletes.
4. I then ran the above with/without convergence and with 2,4,8
heat-engines

 Here are the results compared:

 https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing

>>>
>>> Results look pretty nice (especially for create) :)
>>> The strange thing for me: why on update "8 engines" shows worse results
>>> then with "4 engines"? (may be mistake in graph... ?)
>>>
>>>
>>>


 Some notes on the results so far:

-  convergence with only 2 engines does suffer from RPC overload
(it gets message timeouts on larger templates). I wonder if this is the
problem in our convergence gate...

 Good spotting. If it's true, probably we should try to change  number
>>> of engines... (not sure, how gate hardware react on it).
>>>

- convergence does very well with a reasonable number of engines
running.
- delete is slightly slower on convergence


>>> Also about delete - may be we may to optimize it later, when convergence
>>> way get more feedback.
>>>

 Still to test:

- the above, but measure memory usage
- many small templates (run 

Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-08-28 Thread Sergey Kraynev
Angus!

it's Awesome!  Thank you for the investigation.
I had a talk with guys from Sahara team and we decided to start testing
convergence with Sahara after L release.
I suppose, that Murano can also join to this process.

Also AFAIK Sahara team plan to create functional tests with Heat-engine. We
may add it as a non-voting job for our gate.
Probably it will be good to have two different type of this job: with
convergence and with default Heat.

On 28 August 2015 at 04:35, Angus Salkeld asalk...@mirantis.com wrote:

 Hi

 I have been running some rally tests against convergence and our existing
 implementation to compare.

 So far I have done the following:

1. defined a template with a resource group

 https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template
2. the inner resource looks like this:

 https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.template
  (it
uses TestResource to attempt to be a reasonable simulation of a
server+volume+floatingip)
3. defined a rally job:

 https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yaml
  that
creates X resources then updates to X*2 then deletes.
4. I then ran the above with/without convergence and with 2,4,8
heat-engines

 Here are the results compared:

 https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing


Results look pretty nice (especially for create) :)
The strange thing for me: why on update 8 engines shows worse results
then with 4 engines? (may be mistake in graph... ?)





 Some notes on the results so far:

-  convergence with only 2 engines does suffer from RPC overload (it
gets message timeouts on larger templates). I wonder if this is the problem
in our convergence gate...

 Good spotting. If it's true, probably we should try to change  number of
engines... (not sure, how gate hardware react on it).


- convergence does very well with a reasonable number of engines
running.
- delete is slightly slower on convergence


Also about delete - may be we may to optimize it later, when convergence
way get more feedback.


 Still to test:

- the above, but measure memory usage
- many small templates (run concurrently)
- we need to ask projects using Heat to try with convergence (Murano,
TripleO, Magnum, Sahara, etc..)

 Any feedback welcome (suggestions on what else to test).

 -Angus

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




Regards,
Sergey.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-08-28 Thread Angus Salkeld
On Fri, Aug 28, 2015 at 6:35 PM Sergey Lukjanov slukja...@mirantis.com
wrote:

 Hi,

 great, it seems like migration to convergence could happen soon.

 How many times you were running each test case? Does time changing with
 number of iterations? Are you planning to test parallel stacks creation?


Given the test matrix convergence/non-convergence and 2,4,8 engines, I have
not done a lot of iterations - it's just time consuming. I might kill off
the 2-engine case to gain more iterations.
But from what I have observed the duration does not vary significantly.

I'll test smaller stacks with lots of iterations and with a high
concurrency. All this testing is currently on just one host so it is
somewhat limited. Hopefully this is at least giving a useful comparison
with these limitations.

-Angus



 Thanks.

 On Fri, Aug 28, 2015 at 10:17 AM, Sergey Kraynev skray...@mirantis.com
 wrote:

 Angus!

 it's Awesome!  Thank you for the investigation.
 I had a talk with guys from Sahara team and we decided to start testing
 convergence with Sahara after L release.
 I suppose, that Murano can also join to this process.

 Also AFAIK Sahara team plan to create functional tests with Heat-engine.
 We may add it as a non-voting job for our gate.
 Probably it will be good to have two different type of this job: with
 convergence and with default Heat.

 On 28 August 2015 at 04:35, Angus Salkeld asalk...@mirantis.com wrote:

 Hi

 I have been running some rally tests against convergence and our
 existing implementation to compare.

 So far I have done the following:

1. defined a template with a resource group

 https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template
2. the inner resource looks like this:

 https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.template
  (it
uses TestResource to attempt to be a reasonable simulation of a
server+volume+floatingip)
3. defined a rally job:

 https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yaml
  that
creates X resources then updates to X*2 then deletes.
4. I then ran the above with/without convergence and with 2,4,8
heat-engines

 Here are the results compared:

 https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing


 Results look pretty nice (especially for create) :)
 The strange thing for me: why on update 8 engines shows worse results
 then with 4 engines? (may be mistake in graph... ?)





 Some notes on the results so far:

-  convergence with only 2 engines does suffer from RPC overload (it
gets message timeouts on larger templates). I wonder if this is the 
 problem
in our convergence gate...

 Good spotting. If it's true, probably we should try to change  number of
 engines... (not sure, how gate hardware react on it).


- convergence does very well with a reasonable number of engines
running.
- delete is slightly slower on convergence


 Also about delete - may be we may to optimize it later, when convergence
 way get more feedback.


 Still to test:

- the above, but measure memory usage
- many small templates (run concurrently)
- we need to ask projects using Heat to try with convergence
(Murano, TripleO, Magnum, Sahara, etc..)

 Any feedback welcome (suggestions on what else to test).

 -Angus


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




 Regards,
 Sergey.



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




 --
 Sincerely yours,
 Sergey Lukjanov
 Sahara Technical Lead
 (OpenStack Data Processing)
 Principal Software Engineer
 Mirantis Inc.
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence rally test results (so far)

2015-08-28 Thread Sergey Lukjanov
Hi,

great, it seems like migration to convergence could happen soon.

How many times you were running each test case? Does time changing with
number of iterations? Are you planning to test parallel stacks creation?

Thanks.

On Fri, Aug 28, 2015 at 10:17 AM, Sergey Kraynev skray...@mirantis.com
wrote:

 Angus!

 it's Awesome!  Thank you for the investigation.
 I had a talk with guys from Sahara team and we decided to start testing
 convergence with Sahara after L release.
 I suppose, that Murano can also join to this process.

 Also AFAIK Sahara team plan to create functional tests with Heat-engine.
 We may add it as a non-voting job for our gate.
 Probably it will be good to have two different type of this job: with
 convergence and with default Heat.

 On 28 August 2015 at 04:35, Angus Salkeld asalk...@mirantis.com wrote:

 Hi

 I have been running some rally tests against convergence and our existing
 implementation to compare.

 So far I have done the following:

1. defined a template with a resource group

 https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template
2. the inner resource looks like this:

 https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.template
  (it
uses TestResource to attempt to be a reasonable simulation of a
server+volume+floatingip)
3. defined a rally job:

 https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yaml
  that
creates X resources then updates to X*2 then deletes.
4. I then ran the above with/without convergence and with 2,4,8
heat-engines

 Here are the results compared:

 https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing


 Results look pretty nice (especially for create) :)
 The strange thing for me: why on update 8 engines shows worse results
 then with 4 engines? (may be mistake in graph... ?)





 Some notes on the results so far:

-  convergence with only 2 engines does suffer from RPC overload (it
gets message timeouts on larger templates). I wonder if this is the 
 problem
in our convergence gate...

 Good spotting. If it's true, probably we should try to change  number of
 engines... (not sure, how gate hardware react on it).


- convergence does very well with a reasonable number of engines
running.
- delete is slightly slower on convergence


 Also about delete - may be we may to optimize it later, when convergence
 way get more feedback.


 Still to test:

- the above, but measure memory usage
- many small templates (run concurrently)
- we need to ask projects using Heat to try with convergence (Murano,
TripleO, Magnum, Sahara, etc..)

 Any feedback welcome (suggestions on what else to test).

 -Angus

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




 Regards,
 Sergey.



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
Sincerely yours,
Sergey Lukjanov
Sahara Technical Lead
(OpenStack Data Processing)
Principal Software Engineer
Mirantis Inc.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Heat] convergence rally test results (so far)

2015-08-27 Thread Angus Salkeld
Hi

I have been running some rally tests against convergence and our existing
implementation to compare.

So far I have done the following:

   1. defined a template with a resource group
   
https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template
   2. the inner resource looks like this:
   
https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.template
(it
   uses TestResource to attempt to be a reasonable simulation of a
   server+volume+floatingip)
   3. defined a rally job:
   
https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yaml
that
   creates X resources then updates to X*2 then deletes.
   4. I then ran the above with/without convergence and with 2,4,8
   heat-engines

Here are the results compared:
https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing

Some notes on the results so far:

   -  convergence with only 2 engines does suffer from RPC overload (it
   gets message timeouts on larger templates). I wonder if this is the problem
   in our convergence gate...
   - convergence does very well with a reasonable number of engines running.
   - delete is slightly slower on convergence


Still to test:

   - the above, but measure memory usage
   - many small templates (run concurrently)
   - we need to ask projects using Heat to try with convergence (Murano,
   TripleO, Magnum, Sahara, etc..)

Any feedback welcome (suggestions on what else to test).

-Angus
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev