Re: Kafka and Proposal on a future architecture of OpenWhisk

Markus Thömmes Thu, 23 Aug 2018 01:24:20 -0700

Hi Tyson,

Am Do., 23. Aug. 2018 um 00:33 Uhr schrieb Tyson Norris
<tnor...@adobe.com.invalid>:


> Hi - thanks for the discussion! More inline...
>
> On 8/22/18, 2:55 PM, "Markus Thömmes" <markusthoem...@apache.org> wrote:
>
>     Hi Tyson,
>
>     Am Mi., 22. Aug. 2018 um 23:37 Uhr schrieb Tyson Norris
>     <tnor...@adobe.com.invalid>:
>
>     > Hi -
>     >     >
>     >     > When exactly is the case that a ContainerRouter should put a
> blocking
>     >     > activation to a queue for stealing? Since a) it is not spawning
>     > containers
>     >     > and b) it is not parsing request/response bodies, can we say
> this
>     > would
>     >     > only happen when a ContainerRouter maxes out its incoming
> request
>     > handling?
>     >     >
>     >
>     >     That's exactly the idea! The work-stealing queue will only be
> used if
>     > the
>     >     Router where to request landed cannot serve the demand right
> now. For
>     >     example, if it maxed out the slots it has for a certain action
> (all
>     >     containers are working to their full extent) it requests more
>     > resources and
>     >     puts the request-token on the work-stealing queue.
>     >
>     > So to clarify, ContainerRouter "load" (which can trigger use of
> queue) is
>     > mostly (only?) based on:
>     > * the number of Container references
>     > * the number of outstanding inbound  HTTP requests, e.g. when lots of
>     > requests can be routed to the same container
>     > * the number of outstand outbound HTTP requests to remote action
>     > containers (assume all are remote)
>     > It is unclear the order of magnitude considered for "maxed out
> slots",
>     > since container refs should be simple (like ip+port, action metadata,
>     > activation count, and warm state), inbound connection handling is
> basically
>     > a http server, and outbound is a connection pool per action container
>     > (let's presume connection reuse for the moment).
>     > I think it will certainly need testing to determine these and to be
>     > configurable in any case, for each of these separate stats.. Is there
>     > anything else that affects the load for ContainerRouter?
>     >
>
>     "Overload" is determined by the availability of free slots on any
> container
>     being able to serve the current action invocation (or rather the
> absence
>     thereof). An example:
>     Say RouterA has 2 containers for action X. Each container has an
> allowed
>     concurrency of 10. On each of those 2 there are 10 active invocations
>     already running (the ContainerRouter knows this, these are open
> connections
>     to the containers). If another request comes in for X, we know we don't
>     have capacity for it. We request more resources and offer the work we
> got
>     for stealing.
>
>     I don't think there are tweaks needed here. The Router keeps an
>     "activeInvocations" number per container and compares that to the
> allowed
>     concurrency on that container. If activeInvocations ==
> allowedConcurrency
>     we're out of capacity and need more.
>
>     We need a work-stealing queue here to dynamically rebalance between the
>     Routers since the layer above the Routers has no idea about capacity
> and
>     (at least that's my assumption) schedules randomly.
>
> I think it is confusing to say that the ContainerRouter doesn't have
> capacity for it - rather, the existing set of continers in the
> ContainerRouter don't have capacity for it. I understand now, in any case.
>

Noted, will adjust future wording on this, thanks!


> So there are a couple of active paths in ContainerRouter, still only
> considering sync/blocking activations:
> * warmpath - run immediately
> * coldpath - send to queue
>
> And each ContainerRouter has a queue consumer that presumably pulls from
> the queue constantly? Or is consumption based on something else? If all
> ContainerRouters are consuming at the same rate, then while this does
> distribute the load across ContainerRouters, it doesn't really guarantee
> any similar state (number of containers, active connections, etc) at each
> ContainerRouter, I think. Maybe I am missing something here?
>


The idea is that ContainerRouters do **not** pull from the queue
constantly. They pull work for actions that they have idle containers for.

Similar state in terms of number of containers is done via the
ContainerManager. Active connections should roughly even out with the queue
being pulled on idle.


>
>

>
>
>     >
>     >     That request-token will then be taken by any Router that has free
>     > capacity
>     >     for that action (note: this is not simple with kafka, but might
> be
>     > simpler
>     >     with other MQ technologies). Since new resources have been
> requested,
>     > it is
>     >     guaranteed that one Router will eventually become free.
>     >
>     > Is "requests resources" here requesting new action containers, which
> it
>     > won't be able to process itself immediately, but should startup +
> warm and
>     > be provided to "any ContainerRouter"? This makes, sense, just want to
>     > clarify that "resources == containers".
>     >
>
>     Yes, resources == containers.
>
>
>     >
>     >     >
>     >     > If ContainerManager has enough awareness of ContainerRouters'
>     > states, I'm
>     >     > not sure where using a queue would be used (for redirecting to
> other
>     >     > ContainerRouters) vs ContainerManager responding with a
>     > ContainerRouters
>     >     > reference (instead of an action container reference) - I'm not
>     > following
>     >     > the logic of the edge case in the proposal - there is mention
> of
>     > "which
>     >     > controller the request needs to go", but maybe this is a typo
> and
>     > should
>     >     > say ContainerRouter?
>     >     >
>     >
>     >     Indeed that's a typo, it should say ContainerRouter.
>     >
>     >     The ContainerManager only knows which Router has which
> Container. It
>     > does
>     >     not know whether the respective Router has capacity on that
> container
>     > (the
>     >     capacity metric is very hard to share since it's ever changing).
>     >
>     >     Hence, in an edge-case where there are less Containers than
> Routers,
>     > the
>     >     ContainerManager can hand out references to the Routers it gave
>     > Containers
>     >     to the Routers that have none. (This is the edge-case described
> in the
>     >     proposal).
>     >
>     > I'm not sure why in this case the ContainerManager does not just
> create a
>     > new container, instead of sending to another Router? If there is some
>     > intended limit on "number of containers for a particular action",
> that
>     > would be a reason, but given that the ContainerManager cannot know
> the
>     > state of the existing containers, it seems like sending to another
> Router
>     > which has the container, but may not be able to use it immediately,
> may
>     > cause failures in some cases.
>     >
>
>     The edge-case here is for very slow load. It's minimizing the amount of
>     Containers needed. Another example:
>     Say you have 3 Routers. A request for action X comes in, goes to
> Router1.
>     It requests a container, puts the work on the queue, nobody steals it,
> as
>     soon as the Container gets ready, the work is taken from the queue and
>     executed. All nice and dandy.
>
>     Important remark: The Router that requested more Containers is not
>     necessarily the one that's getting the Containers. We need to make
> sure to
>     evenly distribute Containers across the system.
>
>     So back to our example: What happens if requests for action X are made
> one
>     after the other? Well, the layer above the Routers (something needs to
>     loadbalance them, be it DNS or another type of routing layer) isn't
> aware
>     of the locality of the Container that we created to execute action X.
> As it
>     schedules fairly randomly (round-robin in a multi-tenant system is
>     essentially random) the action will hit each Router once very soon. As
>     we're only generating one request after the other, arguably we only
> want to
>     create only one container.
>
>     That's why in this example the 2 remaining Routers with no container
> get a
>     reference to Router1.
>
>     In the case you mentioned:
>     > it seems like sending to another Router which has the container, but
> may
>     not be able to use it immediately, may cause failures in some cases.
>
>     I don't recall if it's in the document or in the discussion on the
>     dev-list: The router would respond to the proxied request with a 503
>     immediatly. That tells the proxying router: Oh, apparently we need more
>     resources. So it requests another container etc etc.
>
>     Does that clarify that specific edge-case?
>
> Yes, but I would not call this an edge-case -  I think it is more of a
> ramp up to maximum container reuse, and will probably dramatically impacted
> by containers that do NOT support concurrency (will get a 503 when a single
> activation is in flight, vs high concurrency container, which would cause
> 503 only once max concurrency reached).
> If each ContainerRouter is as likely to receive the original request, and
> each is also as likely to receive the queued item from the stealing queue,
> then there will be a lot of cross traffic during the ramp up from 1
> container to <Router count> containers. E.g.
>
> From client:
> Request1 -> Router 1 -> queue (no containers)
> Request2 -> Router 2 -> queue (no containers)
> Request3 -> Router 3 -> queue (no containers)
> From queue:
> Request1 -> Router1  -> create and use container
> Reuqest2 -> Router2 -> Router1 -> 503 -> create container
> Request3 -> Router3 -> Router1 -> 503 -> Router2 -> 503 -> create container
>
> In other words - the 503 may help when there is one container existing,
> and it is deemed to be busy, but what if there are 10 containers existing
> (on different Routers other than where the request was pulled from the
> stealing queue) - do you make HTTP requests to all 10 Routers to see if
> they are busy before creating a new container?
>

Good point, haven't really thought about that to be frank. Gut feeling is
that we should only have 1 direct reference per Router/Action to another
Router. If that yields a 503, just generate new resources immediately. That
might overshoot what we really need, but might just be good enough? Maybe
I'm overcomplicating here...

Alternative to the whole work-stealing algorithm (which could unify this
ramp-up phase and work-stealing itself nicely):
What if it is possible to detect that when an event is pushed to the
work-stealing queue, there are no active consumers for that event right
now. If that is possible, we could use that to signal that more Containers
are needed, because apparently nobody has idling resources for a specific
container.
We would then also use this mechanism to generate Containers in general.
The ContainerRouters would **never** request new containers. They would
instead put their work on the stealing queue to see let someone else work
on it. If the system detects that nobody is waiting, it requests a new
Container (maybe the ContainerManager could be used to detect that signal?)

For our "edge-case" that'd mean: No references to other Routers are handed
out at all. If a request arrives at a Router that has no Container to serve
it, it just puts it on the queue. If there's a consumer for it, great,
done. If not, we know  we need more resources.

This boils down to needing an efficient mechanism to signal free capacities
though. Something to think deeper about, thanks for bringing it up!

>
>
>     Memo to self: I need to incooperate all the stuff from these
> discussions
>     into the document.
>
>
>     >
>     >
>     >     The work-stealing queue though is used to rebalance work in case
> one
>     > of the
>     >     Routers get overloaded.
>     >
>     > Got it.
>     >
>     > Thanks
>     > Tyson
>     >
>     >
>     >
>
>
>

Re: Kafka and Proposal on a future architecture of OpenWhisk

Reply via email to