Great discussion; I'm not entirely convinced on part of this point though.


> We need a work-stealing queue here to dynamically rebalance between the
> Routers since the layer above the Routers has no idea about capacity and
> (at least that's my assumption) schedules randomly.

I agree we can't really keep track of actual current capacity outside of
the individual Router.  But I don't want to jump immediately from that to
assuming truly random scheduling at the layer above because it pushes a
pretty key problem down into the ContainerManager/ContainerRouter layer
(dealing with the "edge case" of finding hot containers for the very long
tail of actions that can be serviced by a very small number of running
containers).

The layer above could route based on runtime kind to increase the
probability of container reuse.

The layer above could still do some hash-based scheme to map an initial
"home" Router (or subset of Routers on a very large deployment) and rely on
work-stealing/overflow queue to deal with "noisy neighbor" hash collisions
if a Router gets badly overloaded.

Each Router is potentially managing a fairly large pool of containers.  The
pools don't have to be the same size between Routers.  More crazily, the
Routers could even autoscale themselves to deal with uneven load (in
effect, hierarchical routing).

Lots of half-baked ideas are possible here :)

--dave

Reply via email to