Re: Kafka and Proposal on a future architecture of OpenWhisk

Tyson Norris Wed, 22 Aug 2018 14:38:14 -0700

Hi -     
    >
    > When exactly is the case that a ContainerRouter should put a blocking
    > activation to a queue for stealing? Since a) it is not spawning containers
    > and b) it is not parsing request/response bodies, can we say this would
    > only happen when a ContainerRouter maxes out its incoming request 
handling?
    >
    
    That's exactly the idea! The work-stealing queue will only be used if the
    Router where to request landed cannot serve the demand right now. For
    example, if it maxed out the slots it has for a certain action (all
    containers are working to their full extent) it requests more resources and
    puts the request-token on the work-stealing queue.
    
So to clarify, ContainerRouter "load" (which can trigger use of queue) is 
mostly (only?) based on:
* the number of Container references 
* the number of outstanding inbound  HTTP requests, e.g. when lots of requests 
can be routed to the same container
* the number of outstand outbound HTTP requests to remote action containers 
(assume all are remote)
It is unclear the order of magnitude considered for "maxed out slots", since 
container refs should be simple (like ip+port, action metadata, activation 
count, and warm state), inbound connection handling is basically a http server, 
and outbound is a connection pool per action container (let's presume 
connection reuse for the moment).
I think it will certainly need testing to determine these and to be 
configurable in any case, for each of these separate stats.. Is there anything 
else that affects the load for ContainerRouter?


    That request-token will then be taken by any Router that has free capacity
    for that action (note: this is not simple with kafka, but might be simpler
    with other MQ technologies). Since new resources have been requested, it is
    guaranteed that one Router will eventually become free.
    
Is "requests resources" here requesting new action containers, which it won't 
be able to process itself immediately, but should startup + warm and be 
provided to "any ContainerRouter"? This makes, sense, just want to clarify that 
"resources == containers".
    
    >
    > If ContainerManager has enough awareness of ContainerRouters' states, I'm
    > not sure where using a queue would be used (for redirecting to other
    > ContainerRouters) vs ContainerManager responding with a ContainerRouters
    > reference (instead of an action container reference) - I'm not following
    > the logic of the edge case in the proposal - there is mention of "which
    > controller the request needs to go", but maybe this is a typo and should
    > say ContainerRouter?
    >
    
    Indeed that's a typo, it should say ContainerRouter.
    
    The ContainerManager only knows which Router has which Container. It does
    not know whether the respective Router has capacity on that container (the
    capacity metric is very hard to share since it's ever changing).
    
    Hence, in an edge-case where there are less Containers than Routers, the
    ContainerManager can hand out references to the Routers it gave Containers
    to the Routers that have none. (This is the edge-case described in the
    proposal).

I'm not sure why in this case the ContainerManager does not just create a new 
container, instead of sending to another Router? If there is some intended 
limit on "number of containers for a particular action", that would be a 
reason, but given that the ContainerManager cannot know the state of the 
existing containers, it seems like sending to another Router which has the 
container, but may not be able to use it immediately, may cause failures in 
some cases. 


    The work-stealing queue though is used to rebalance work in case one of the
    Routers get overloaded.
    
Got it.

Thanks
Tyson

Re: Kafka and Proposal on a future architecture of OpenWhisk

Reply via email to