Hey folks,

it's me again with the latest news on performance :).

As some of you probably now: Our current loadbalancer strategy is quite 
"simple" and doesn't take load in the system into account at all. It hops to 
the next available invoker after you've invoked an action X times (where X is a fixed 
value defined at deployment time). For many many cases that's suboptimal behavior and 
induces lots of cold-starts, even in a fairly unused system. 

To improve on this here is a proposal to take the loadbalancer state we already 
have and make something out of it.

In a nutshell, the plan is: Before you schedule to an invoker, take into 
account how much load is on the invoker you want to schedule to. If it seems 
full already (determined by outstanding active-ack responses) search for 
another invoker.
Via hashing, we define a home invoker to for every subject/action combination. That is the invoker 
with the highest probability of having a warm container for that action. If that invoker is already 
busy, choose another invoker. "Stepping" through the invokers should be stable as well, 
as in: For a given subject/action it should always try the invokers in the same order. That way, 
the probability of getting a warm container is higher than if we chose randomly, but of course it 
gets lower the more "hops" you need to make.
The step-width is determined via hashing into a series of coprime numbers to 
the amount of invokers in the system to minimize collisions and chasing.

The proposal is expected to lead to a more stable warm-container rate and lead 
to a better utilization of the system as a whole.

I already took a stab at implementing the proposal above. The pull-request can 
be found here: https://github.com/apache/incubator-openwhisk/pull/2360

As always, comments, objections, praise. All feedback is very welcome :)

Cheers,
Markus

Reply via email to