Re: Kafka and Proposal on a future architecture of OpenWhisk

Tyson Norris Thu, 23 Aug 2018 12:28:56 -0700

    >
    > And each ContainerRouter has a queue consumer that presumably pulls from
    > the queue constantly? Or is consumption based on something else? If all
    > ContainerRouters are consuming at the same rate, then while this does
    > distribute the load across ContainerRouters, it doesn't really guarantee
    > any similar state (number of containers, active connections, etc) at each
    > ContainerRouter, I think. Maybe I am missing something here?
    >
    
    
    The idea is that ContainerRouters do **not** pull from the queue
    constantly. They pull work for actions that they have idle containers for.


Router is not pulling at queue for "specific actions", just for any action that 
might replace idle containers - right? This is complicated with concurrency 
though since while a container is not idle (paused + removable), it may be 
useable, but only if the action received is the same as one existing warm 
container, and that container has concurrency slots available for additional 
activations. It may be helpful to diagram some of this stealing queue flow a 
bit more, I'm not seeing how it will work out other than creating more 
containers than is absolutely required, which may be ok, not sure. 
    
    Similar state in terms of number of containers is done via the
    ContainerManager. Active connections should roughly even out with the queue
    being pulled on idle.
    
Yeah carefully defining "idle" may be tricky, if we want to achieve absolute 
minimum containers in use for a specific action at any time.

    
    >
    >     The edge-case here is for very slow load. It's minimizing the amount 
of
    >     Containers needed. Another example:
    >     Say you have 3 Routers. A request for action X comes in, goes to
    > Router1.
    >     It requests a container, puts the work on the queue, nobody steals it,
    > as
    >     soon as the Container gets ready, the work is taken from the queue and
    >     executed. All nice and dandy.
    >
    >     Important remark: The Router that requested more Containers is not
    >     necessarily the one that's getting the Containers. We need to make
    > sure to
    >     evenly distribute Containers across the system.
    >
    >     So back to our example: What happens if requests for action X are made
    > one
    >     after the other? Well, the layer above the Routers (something needs to
    >     loadbalance them, be it DNS or another type of routing layer) isn't
    > aware
    >     of the locality of the Container that we created to execute action X.
    > As it
    >     schedules fairly randomly (round-robin in a multi-tenant system is
    >     essentially random) the action will hit each Router once very soon. As
    >     we're only generating one request after the other, arguably we only
    > want to
    >     create only one container.
    >
    >     That's why in this example the 2 remaining Routers with no container
    > get a
    >     reference to Router1.
    >
    >     In the case you mentioned:
    >     > it seems like sending to another Router which has the container, but
    > may
    >     not be able to use it immediately, may cause failures in some cases.
    >
    >     I don't recall if it's in the document or in the discussion on the
    >     dev-list: The router would respond to the proxied request with a 503
    >     immediatly. That tells the proxying router: Oh, apparently we need 
more
    >     resources. So it requests another container etc etc.
    >
    >     Does that clarify that specific edge-case?
    >
    > Yes, but I would not call this an edge-case -  I think it is more of a
    > ramp up to maximum container reuse, and will probably dramatically 
impacted
    > by containers that do NOT support concurrency (will get a 503 when a 
single
    > activation is in flight, vs high concurrency container, which would cause
    > 503 only once max concurrency reached).
    > If each ContainerRouter is as likely to receive the original request, and
    > each is also as likely to receive the queued item from the stealing queue,
    > then there will be a lot of cross traffic during the ramp up from 1
    > container to <Router count> containers. E.g.
    >
    > From client:
    > Request1 -> Router 1 -> queue (no containers)
    > Request2 -> Router 2 -> queue (no containers)
    > Request3 -> Router 3 -> queue (no containers)
    > From queue:
    > Request1 -> Router1  -> create and use container
    > Reuqest2 -> Router2 -> Router1 -> 503 -> create container
    > Request3 -> Router3 -> Router1 -> 503 -> Router2 -> 503 -> create 
container
    >
    > In other words - the 503 may help when there is one container existing,
    > and it is deemed to be busy, but what if there are 10 containers existing
    > (on different Routers other than where the request was pulled from the
    > stealing queue) - do you make HTTP requests to all 10 Routers to see if
    > they are busy before creating a new container?
    >
    
    Good point, haven't really thought about that to be frank. Gut feeling is
    that we should only have 1 direct reference per Router/Action to another
    Router. If that yields a 503, just generate new resources immediately. That
    might overshoot what we really need, but might just be good enough? Maybe
    I'm overcomplicating here...
    
    Alternative to the whole work-stealing algorithm (which could unify this
    ramp-up phase and work-stealing itself nicely):
    What if it is possible to detect that when an event is pushed to the
    work-stealing queue, there are no active consumers for that event right
    now. If that is possible, we could use that to signal that more Containers
    are needed, because apparently nobody has idling resources for a specific
    container.

I think this is only possible in the case where 0 containers (for that action) 
existed before; in other cases, without having Router state visibility, it will 
be impossible to detect Routers that have capacity to service that event.

    We would then also use this mechanism to generate Containers in general.
    The ContainerRouters would **never** request new containers. They would
    instead put their work on the stealing queue to see let someone else work
    on it. If the system detects that nobody is waiting, it requests a new
    Container (maybe the ContainerManager could be used to detect that signal?)
    
    For our "edge-case" that'd mean: No references to other Routers are handed
    out at all. If a request arrives at a Router that has no Container to serve
    it, it just puts it on the queue. If there's a consumer for it, great,
    done. If not, we know  we need more resources.
    
    This boils down to needing an efficient mechanism to signal free capacities
    though. Something to think deeper about, thanks for bringing it up!

Yes, I think "capacity" is more accurate way to think of it, than "idle", and 
yes I think a 503 should generate container immediately, but I think we need to 
supply more data to intelligently limit the 503 workflows.

What about this:
- ContainerRouters report capacity ("possible capacity", or "busy") to 
ContainerManager periodically (not accurate at point in time, but close enough 
in some cases)
- When no capacity exists in ContainRouter's existing warm containers, 
ContainRouter requests container from ContainerManager
- ContainerManager responds with either ContainerRouter (if one exists with 
"possible capacity"), or Container (if none exists with "possible capacity")
- In case Container is returned, if C > 1, its capacity state should start as 
"possible capacity"; otherwise, "busy" (we know it will be immediately used)
- In case ContainerRouter is returned, attempt (once) proxy to that 
ContainerRouter
- On a proxied request to ContainerRouter CR2 either service the request, OR 
immediately create a container (in CR2), OR we may limit the number of same 
action containers in a ContainerRouter so return 503 at which point immediately 
create a container (in CR1). 

This would:
- encourage multiple containers for same action to be managed in a subset of 
routers (better chance of reusing containers)
- not restrict the number of routers used for a specific action when under load 
(e.g. say that each router can handle active requests for n containers, meaning 
n connection pools).
- allow designating a ContainerRouter capacity config for both a) activations 
on warm containers as configured per action and b) overall number of containers 
(connection pools) as configured at ContainerRouter (to smooth hot spots across 
ContainerRouters).

I think the biggest problem would be cases where containers hover around "busy" 
state (C requests), causing requests to CM, and inaccurate data in CM (causing 
extra containers to be created), but it may be OK for most load patterns, 
nothing will work perfectly for all.

Reporting capacity can either be metrics based or a health ping similar to what 
exists today (but with more details of pool states).

I will try to diagram this, it's getting complicated...

Thanks
Tyson


    >
    >     Memo to self: I need to incooperate all the stuff from these
    > discussions
    >     into the document.
    >
    >
    >     >
    >     >
    >     >     The work-stealing queue though is used to rebalance work in case
    > one
    >     > of the
    >     >     Routers get overloaded.
    >     >
    >     > Got it.
    >     >
    >     > Thanks
    >     > Tyson
    >     >
    >     >
    >     >
    >
    >
    >

Re: Kafka and Proposal on a future architecture of OpenWhisk

Reply via email to