Re: [openstack-dev] [magnum][heat] 2 million requests / sec, 100s of nodes

Tim Bell Mon, 08 Aug 2016 03:21:36 -0700

On 08 Aug 2016, at 11:51, Ricardo Rocha 
<[email protected]<mailto:[email protected]>> wrote:


Hi.

On Mon, Aug 8, 2016 at 1:52 AM, Clint Byrum 
<[email protected]<mailto:[email protected]>> wrote:
Excerpts from Steve Baker's message of 2016-08-08 10:11:29 +1200:
On 05/08/16 21:48, Ricardo Rocha wrote:
Hi.

Quick update is 1000 nodes and 7 million reqs/sec :) - and the number
of requests should be higher but we had some internal issues. We have
a submission for barcelona to provide a lot more details.

But a couple questions came during the exercise:

1. Do we really need a volume in the VMs? On large clusters this is a
burden, and local storage only should be enough?

2. We observe a significant delay (~10min, which is half the total
time to deploy the cluster) on heat when it seems to be crunching the
kube_minions nested stacks. Once it's done, it still adds new stacks
gradually, so it doesn't look like it precomputed all the info in advance

Anyone tried to scale Heat to stacks this size? We end up with a stack
with:
* 1000 nested stacks (depth 2)
* 22000 resources
* 47008 events

And already changed most of the timeout/retrial values for rpc to get
this working.

This delay is already visible in clusters of 512 nodes, but 40% of the
time in 1000 nodes seems like something we could improve. Any hints on
Heat configuration optimizations for large stacks very welcome.

Yes, we recommend you set the following in /etc/heat/heat.conf [DEFAULT]:
max_resources_per_stack = -1

Enforcing this for large stacks has a very high overhead, we make this
change in the TripleO undercloud too.


Wouldn't this necessitate having a private Heat just for Magnum? Not
having a resource limit per stack would leave your Heat engines
vulnerable to being DoS'd by malicious users, since one can create many
many thousands of resources, and thus python objects, in just a couple
of cleverly crafted templates (which is why I added the setting).

This makes perfect sense in the undercloud of TripleO, which is a
private, single tenant OpenStack. But, for Magnum.. now you're talking
about the Heat that users have access to.

We have it already at -1 for these tests. As you say a malicious user
could DoS, right now this is manageable in our environment. But maybe
move it to a per tenant value, or some special policy? The stacks are
created under a separate domain for magnum (for trustees), we could
also use that for separation.


If there was a quota system within Heat for items like stacks and resources, 
this could be
controlled through that.

Looks like https://blueprints.launchpad.net/heat/+spec/add-quota-api-for-heat 
did not make it into upstream though.

Tim

A separate heat instance sounds like an overkill.

Cheers,
Ricardo


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
[email protected]<mailto:[email protected]>?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
[email protected]<mailto:[email protected]>?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [magnum][heat] 2 million requests / sec, 100s of nodes

Reply via email to