On 07/08/16 19:52, Clint Byrum wrote:
Excerpts from Steve Baker's message of 2016-08-08 10:11:29 +1200:
On 05/08/16 21:48, Ricardo Rocha wrote:
Hi.
Quick update is 1000 nodes and 7 million reqs/sec :) - and the number
of requests should be higher but we had some internal issues. We have
a submission for barcelona to provide a lot more details.
But a couple questions came during the exercise:
1. Do we really need a volume in the VMs? On large clusters this is a
burden, and local storage only should be enough?
2. We observe a significant delay (~10min, which is half the total
time to deploy the cluster) on heat when it seems to be crunching the
kube_minions nested stacks. Once it's done, it still adds new stacks
gradually, so it doesn't look like it precomputed all the info in advance
Anyone tried to scale Heat to stacks this size? We end up with a stack
with:
* 1000 nested stacks (depth 2)
* 22000 resources
* 47008 events
And already changed most of the timeout/retrial values for rpc to get
this working.
This delay is already visible in clusters of 512 nodes, but 40% of the
time in 1000 nodes seems like something we could improve. Any hints on
Heat configuration optimizations for large stacks very welcome.
Yes, we recommend you set the following in /etc/heat/heat.conf [DEFAULT]:
max_resources_per_stack = -1
Enforcing this for large stacks has a very high overhead, we make this
change in the TripleO undercloud too.
Wouldn't this necessitate having a private Heat just for Magnum? Not
having a resource limit per stack would leave your Heat engines
vulnerable to being DoS'd by malicious users, since one can create many
many thousands of resources, and thus python objects, in just a couple
of cleverly crafted templates (which is why I added the setting).
Although when you added it, all of the resources in a tree of nested
stacks got handled by a single engine, so sending a really big tree of
nested stacks was an easy way to DoS Heat. That's no longer the case
since Kilo; we farm the child stacks out over RPC, so the difficulty of
carrying out a DoS increases in proportion to the number of cores you
have running Heat whereas before it was constant. (This is also the
cause of the performance problem, since counting all the resources in
the tree when then entire thing was already loaded in-memory was easy.)
Convergence splits it up even further, farming out each _resource_ as
well as each stack over RPC.
I had the thought that having a per-tenant resource limit might be both
more effective at both protecting the limited resource and more
efficient to calculate, since we could have the DB simply count the
Resource rows for stacks in a given tenant instead of recursively
loading all of the stacks in a tree and counting the resources in
heat-engine. However, the tenant isn't stored directly in the Stack
table, and people who know databases tell me the resulting joins would
be fearsome.
I'm still not convinced it'd be worse than what we have now, even after
Steve did a lot of work to make it much, much better than it was at one
point ;)
This makes perfect sense in the undercloud of TripleO, which is a
private, single tenant OpenStack. But, for Magnum.. now you're talking
about the Heat that users have access to.
Indeed, and now that we're seeing other users of very large stacks
(Sahara is another) I think we need to come up with a solution that is
both efficient enough to use on a large/deep tree of nested stacks but
can still be tuned to protect against DoS at whatever scale Heat is
deployed at.
cheers,
Zane.
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev