Hey folks, it's me again with the latest news on performance :).
As some of you probably now: Our current loadbalancer strategy is quite "simple" and doesn't take load in the system into account at all. It hops to the next available invoker after you've invoked an action X times (where X is a fixed value defined at deployment time). For many many cases that's suboptimal behavior and induces lots of cold-starts, even in a fairly unused system. To improve on this here is a proposal to take the loadbalancer state we already have and make something out of it. In a nutshell, the plan is: Before you schedule to an invoker, take into account how much load is on the invoker you want to schedule to. If it seems full already (determined by outstanding active-ack responses) search for another invoker. Via hashing, we define a home invoker to for every subject/action combination. That is the invoker with the highest probability of having a warm container for that action. If that invoker is already busy, choose another invoker. "Stepping" through the invokers should be stable as well, as in: For a given subject/action it should always try the invokers in the same order. That way, the probability of getting a warm container is higher than if we chose randomly, but of course it gets lower the more "hops" you need to make. The step-width is determined via hashing into a series of coprime numbers to the amount of invokers in the system to minimize collisions and chasing. The proposal is expected to lead to a more stable warm-container rate and lead to a better utilization of the system as a whole. I already took a stab at implementing the proposal above. The pull-request can be found here: https://github.com/apache/incubator-openwhisk/pull/2360 As always, comments, objections, praise. All feedback is very welcome :) Cheers, Markus
