Excerpts from Steven Hardy's message of 2014-08-11 11:40:07 -0700: > On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote: > > Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700: > > > On 11/08/14 10:46, Clint Byrum wrote: > > > > Right now we're stuck with an update that just doesn't work. It isn't > > > > just about update-failure-recovery, which is coming along nicely, but > > > > it is also about the lack of signals to control rebuild, poor support > > > > for addressing machines as groups, and unacceptable performance in > > > > large stacks. > > > > > > Are there blueprints/bugs filed for all of these issues? > > > > > > > Convergnce addresses the poor performance for large stacks in general. > > We also have this: > > > > https://bugs.launchpad.net/heat/+bug/1306743 > > > > Which shows how slow metadata access can get. I have worked on patches > > but haven't been able to complete them. We made big strides but we are > > at a point where 40 nodes polling Heat every 30s is too much for one CPU > > to handle. When we scaled Heat out onto more CPUs on one box by forking > > we ran into eventlet issues. We also ran into issues because even with > > many processes we can only use one to resolve templates for a single > > stack during update, which was also excessively slow. > > Related to this, and a discussion we had recently at the TripleO meetup is > this spec I raised today: > > https://review.openstack.org/#/c/113296/ > > It's following up on the idea that we could potentially address (or at > least mitigate, pending the fully convergence-ified heat) some of these > scalability concerns, if TripleO moves from the one-giant-template model > to a more modular nested-stack/provider model (e.g what Tomas has been > working on) > > I've not got into enough detail on that yet to be sure if it's acheivable > for Juno, but it seems initially to be complex-but-doable. > > I'd welcome feedback on that idea and how it may fit in with the more > granular convergence-engine model. > > Can you link to the eventlet/forking issues bug please? I thought since > bug #1321303 was fixed that multiple engines and multiple workers should > work OK, and obviously that being true is a precondition to expending > significant effort on the nested stack decoupling plan above. >
That was the issue. So we fixed that bug, but we never un-reverted the patch that forks enough engines to use up all the CPU's on a box by default. That would likely help a lot with metadata access speed (we could manually do it in TripleO but we tend to push defaults. :) _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev