Hi Tyson, Am Do., 23. Aug. 2018 um 21:28 Uhr schrieb Tyson Norris <tnor...@adobe.com.invalid>:
> > > > And each ContainerRouter has a queue consumer that presumably pulls > from > > the queue constantly? Or is consumption based on something else? If > all > > ContainerRouters are consuming at the same rate, then while this does > > distribute the load across ContainerRouters, it doesn't really > guarantee > > any similar state (number of containers, active connections, etc) at > each > > ContainerRouter, I think. Maybe I am missing something here? > > > > > The idea is that ContainerRouters do **not** pull from the queue > constantly. They pull work for actions that they have idle containers > for. > > Router is not pulling at queue for "specific actions", just for any action > that might replace idle containers - right? This is complicated with > concurrency though since while a container is not idle (paused + > removable), it may be useable, but only if the action received is the same > as one existing warm container, and that container has concurrency slots > available for additional activations. It may be helpful to diagram some of > this stealing queue flow a bit more, I'm not seeing how it will work out > other than creating more containers than is absolutely required, which may > be ok, not sure. > Yes, I will diagram things out soonish, I'm a little bit narrow on time currently. The idea is that indeed the Router pulls for *specific* actions. This is a problem when using Kafka, but might be solvable when we don't require Kafka. I have to test this for feasibility though. > > Similar state in terms of number of containers is done via the > ContainerManager. Active connections should roughly even out with the > queue > being pulled on idle. > > Yeah carefully defining "idle" may be tricky, if we want to achieve > absolute minimum containers in use for a specific action at any time. > > > > > > The edge-case here is for very slow load. It's minimizing the > amount of > > Containers needed. Another example: > > Say you have 3 Routers. A request for action X comes in, goes to > > Router1. > > It requests a container, puts the work on the queue, nobody > steals it, > > as > > soon as the Container gets ready, the work is taken from the > queue and > > executed. All nice and dandy. > > > > Important remark: The Router that requested more Containers is > not > > necessarily the one that's getting the Containers. We need to > make > > sure to > > evenly distribute Containers across the system. > > > > So back to our example: What happens if requests for action X > are made > > one > > after the other? Well, the layer above the Routers (something > needs to > > loadbalance them, be it DNS or another type of routing layer) > isn't > > aware > > of the locality of the Container that we created to execute > action X. > > As it > > schedules fairly randomly (round-robin in a multi-tenant system > is > > essentially random) the action will hit each Router once very > soon. As > > we're only generating one request after the other, arguably we > only > > want to > > create only one container. > > > > That's why in this example the 2 remaining Routers with no > container > > get a > > reference to Router1. > > > > In the case you mentioned: > > > it seems like sending to another Router which has the > container, but > > may > > not be able to use it immediately, may cause failures in some > cases. > > > > I don't recall if it's in the document or in the discussion on > the > > dev-list: The router would respond to the proxied request with a > 503 > > immediatly. That tells the proxying router: Oh, apparently we > need more > > resources. So it requests another container etc etc. > > > > Does that clarify that specific edge-case? > > > > Yes, but I would not call this an edge-case - I think it is more of > a > > ramp up to maximum container reuse, and will probably dramatically > impacted > > by containers that do NOT support concurrency (will get a 503 when a > single > > activation is in flight, vs high concurrency container, which would > cause > > 503 only once max concurrency reached). > > If each ContainerRouter is as likely to receive the original > request, and > > each is also as likely to receive the queued item from the stealing > queue, > > then there will be a lot of cross traffic during the ramp up from 1 > > container to <Router count> containers. E.g. > > > > From client: > > Request1 -> Router 1 -> queue (no containers) > > Request2 -> Router 2 -> queue (no containers) > > Request3 -> Router 3 -> queue (no containers) > > From queue: > > Request1 -> Router1 -> create and use container > > Reuqest2 -> Router2 -> Router1 -> 503 -> create container > > Request3 -> Router3 -> Router1 -> 503 -> Router2 -> 503 -> create > container > > > > In other words - the 503 may help when there is one container > existing, > > and it is deemed to be busy, but what if there are 10 containers > existing > > (on different Routers other than where the request was pulled from > the > > stealing queue) - do you make HTTP requests to all 10 Routers to see > if > > they are busy before creating a new container? > > > > Good point, haven't really thought about that to be frank. Gut feeling > is > that we should only have 1 direct reference per Router/Action to > another > Router. If that yields a 503, just generate new resources immediately. > That > might overshoot what we really need, but might just be good enough? > Maybe > I'm overcomplicating here... > > Alternative to the whole work-stealing algorithm (which could unify > this > ramp-up phase and work-stealing itself nicely): > What if it is possible to detect that when an event is pushed to the > work-stealing queue, there are no active consumers for that event right > now. If that is possible, we could use that to signal that more > Containers > are needed, because apparently nobody has idling resources for a > specific > container. > > I think this is only possible in the case where 0 containers (for that > action) existed before; in other cases, without having Router state > visibility, it will be impossible to detect Routers that have capacity to > service that event. > Why? If Routers have a container that has capacity, they will be listening for more work (per my answer above). > > We would then also use this mechanism to generate Containers in > general. > The ContainerRouters would **never** request new containers. They would > instead put their work on the stealing queue to see let someone else > work > on it. If the system detects that nobody is waiting, it requests a new > Container (maybe the ContainerManager could be used to detect that > signal?) > > For our "edge-case" that'd mean: No references to other Routers are > handed > out at all. If a request arrives at a Router that has no Container to > serve > it, it just puts it on the queue. If there's a consumer for it, great, > done. If not, we know we need more resources. > > This boils down to needing an efficient mechanism to signal free > capacities > though. Something to think deeper about, thanks for bringing it up! > > Yes, I think "capacity" is more accurate way to think of it, than "idle", > and yes I think a 503 should generate container immediately, but I think we > need to supply more data to intelligently limit the 503 workflows. > > What about this: > - ContainerRouters report capacity ("possible capacity", or "busy") to > ContainerManager periodically (not accurate at point in time, but close > enough in some cases) > - When no capacity exists in ContainRouter's existing warm containers, > ContainRouter requests container from ContainerManager > - ContainerManager responds with either ContainerRouter (if one exists > with "possible capacity"), or Container (if none exists with "possible > capacity") > I think these 3 points are addressed by being able to pull for specific actions and/or being able to detect that pull signal. In fact, it's the same mechanism: The ContainerManager essentially is your work-stealing backend. > - In case Container is returned, if C > 1, its capacity state should start > as "possible capacity"; otherwise, "busy" (we know it will be immediately > used) > - In case ContainerRouter is returned, attempt (once) proxy to that > ContainerRouter > - On a proxied request to ContainerRouter CR2 either service the request, > OR immediately create a container (in CR2), OR we may limit the number of > same action containers in a ContainerRouter so return 503 at which point > immediately create a container (in CR1). > > This would: > - encourage multiple containers for same action to be managed in a subset > of routers (better chance of reusing containers) - not restrict the number of routers used for a specific action when under > load (e.g. say that each router can handle active requests for n > containers, meaning n connection pools). > - allow designating a ContainerRouter capacity config for both a) > activations on warm containers as configured per action and b) overall > number of containers (connection pools) as configured at ContainerRouter > (to smooth hot spots across ContainerRouters). > > I think the biggest problem would be cases where containers hover around > "busy" state (C requests), causing requests to CM, and inaccurate data in > CM (causing extra containers to be created), but it may be OK for most load > patterns, nothing will work perfectly for all. > > Reporting capacity can either be metrics based or a health ping similar to > what exists today (but with more details of pool states). > > I will try to diagram this, it's getting complicated... > Yes, I will as well. > > Thanks > Tyson > > > > > > Memo to self: I need to incooperate all the stuff from these > > discussions > > into the document. > > > > > > > > > > > > > The work-stealing queue though is used to rebalance work > in case > > one > > > of the > > > Routers get overloaded. > > > > > > Got it. > > > > > > Thanks > > > Tyson > > > > > > > > > > > > > > > > > >