Am So., 19. Aug. 2018 um 18:59 Uhr schrieb TzuChiao Yeh < su3g4284zo...@gmail.com>:
> On Sun, Aug 19, 2018 at 7:13 PM Markus Thömmes <markusthoem...@apache.org> > wrote: > > > Hi Tzu-Chiao, > > > > Am Sa., 18. Aug. 2018 um 06:56 Uhr schrieb TzuChiao Yeh < > > su3g4284zo...@gmail.com>: > > > > > Hi Markus, > > > > > > Nice thoughts on separating logics in this revision! I'm not sure this > > > question has already been clarified, sorry if duplicate. > > > > > > Same question on cluster singleton: > > > > > > I think there will be two possibilities on container deletion: 1. > > > ContainerRouter removes it (when error or idle-state) 2. > ContainerManager > > > decides to remove it (i.e. clear space for new creation). > > > > > > For case 2, how do we ensure the safe deletion in ContainerManager? > > > Consider if there's still a similar model on busy/free/prewarmed pool, > it > > > might require additional states related to containers from busy to free > > > state, then we can safely remove it or reject if nothing found (system > > > overloaded). > > > > > > By paused state or other states/message? There might be some trade-offs > > on > > > granularity (time-slice in scheduling) and performance bottleneck on > > > ClusterSingleton. > > > > > I'm not sure if I quite got the point, but here's an attempt on an > > explanation: > > > > Yes, Container removal in case 2 is triggered from the ContainerManager. > To > > be able to safely remove it, it requests all ContainerRouters owning that > > container to stop serving it and hand it back. Once it's been handed > back, > > the ContainerManager can safely delete it. The contract should also say: > A > > container must be handed back in unpaused state, so it can be deleted > > safely. Since the ContainerRouters handle pause/unpause, they'll need to > > stop serving the container, unpause it, remove it from their state and > > acknowledge to the ContainerManager that they handed it back. > > > > Thank you, it's clear to me. > > > > There is an open question on when to consider a system to be in overflow > > state, or rather: How to handle the edge-situation. If you cannot > generate > > more containers, we need to decide whether we remove another container > (the > > case you're describing) or if we call it quits and say "503, overloaded, > go > > away for now". The logic deciding this is up for discussion as well. The > > heuristic could take into account how many resources in the whole system > > you already own, how many resources do others own and if we want to > decide > > to share those fairly or not-fairly. Note that this is also very much > > related to being able to scale the resources up in themselves (to be able > > to generate new containers). If we assume a bounded system though, yes, > > we'll need to find a strategy on how to handle this case. I believe with > > the state the ContainerManager has, it can provide a more eloquent answer > > to that question than what we can do today (nothing really, we just keep > on > > churning through containers). > > > > I agree. An additional problem is in the case of burst requests, > ContainerManager will "over-estimate" containers allocation, whether > work-stealing between ContainerRouters has been enabled or not. For bounded > system, we have better carefully handle these to avoid frequently > creation/deletion. I'm wondering if sharing message queue between > ContainerManager (since it's not a critical path) or any mechanism for > checking queue size (i.e. checking kafka lags) can possibly eliminate > this? However, this may be only happened in short running tasks and > throttling already being helpful. > Are you saying: It will over-estimate container allocation because it will create a container for each request as they arrive if there are no containers around currently and the actual number of containers needed might be lower for very short running use-cases where requests arrive in short bursts? If so: I agree, I don't see how any system can possibly solve this without taking the estimated runtime of each request into account though. Can you elaborate on how your thoughts on checking queue-size etc? > > > > Does that answer the question? > > > > > > > > Thanks! > > > > > > Tzu-Chiao > > > > > > On Sat, Aug 18, 2018 at 5:55 AM Tyson Norris <tnor...@adobe.com.invalid > > > > > wrote: > > > > > > > Ugh my reply formatting got removed!!! Trying this again with some >> > > > > > > > > On Aug 17, 2018, at 2:45 PM, Tyson Norris <tnor...@adobe.com.INVALID > > > > <mailto:tnor...@adobe.com.INVALID>> wrote: > > > > > > > > > > > > If the failover of the singleton is too long (I think it will be > based > > on > > > > cluster size, oldest node becomes the singleton host iirc), I think > we > > > need > > > > to consider how containers can launch in the meantime. A first step > > might > > > > be to test out the singleton behavior in the cluster of various > sizes. > > > > > > > > > > > > I agree this bit of design is crucial, a few thoughts: > > > > Pre-warm wouldn't help here, the ContainerRouters only know warm > > > > containers. Pre-warming is managed by the ContainerManager. > > > > > > > > > > > > >> Ah right > > > > > > > > > > > > > > > > Considering a fail-over scenario: We could consider sharing the state > > via > > > > EventSourcing. That is: All state lives inside of frequently > > snapshotted > > > > events and thus can be shared between multiple instances of the > > > > ContainerManager seamlessly. Alternatively, we could also think about > > > only > > > > working on persisted state. That way, a cold-standby model could fly. > > We > > > > should make sure that the state is not "slightly stale" but rather > both > > > > instances see the same state at any point in time. I believe on that > > > > cold-path of generating new containers, we can live with the > > > extra-latency > > > > of persisting what we're doing as the path will still be dominated by > > the > > > > container creation latency. > > > > > > > > > > > > > > > > >> Wasn’t clear if you mean not using ClusterSingleton? To be clear > in > > > > ClusterSingleton case there are 2 issues: > > > > - time it takes for akka ClusterSingletonManager to realize it needs > to > > > > start a new actor > > > > - time it takes for the new actor to assume a usable state > > > > > > > > EventSourcing (or ext persistence) may help with the latter, but we > > will > > > > need to be sure the former is tolerable to start with. > > > > Here is an example test from akka source that may be useful > (multi-jvm, > > > > but all local): > > > > > > > > > > > > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fakka%2Fakka%2Fblob%2F009214ae07708e8144a279e71d06c4a504907e31%2Fakka-cluster-tools%2Fsrc%2Fmulti-jvm%2Fscala%2Fakka%2Fcluster%2Fsingleton%2FClusterSingletonManagerChaosSpec.scala&data=02%7C01%7Ctnorris%40adobe.com%7C50be947ede884f3b78e208d6048ac99a%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636701391474213555&sdata=Ojk1yRGCbG4OxD5MXOabmH1ggbgk%2BymZ7%2BUqDQINAPo%3D&reserved=0 > > > > > > > > Some things to consider, that I don’t know details of: > > > > - will the size of cluster affect the singleton behavior in case of > > > > failure? (I think so, but not sure, and what extent); in the simple > > test > > > > above it takes ~6s for the replacement singleton to begin startup, > but > > if > > > > we have 100s of nodes, I’m not sure how much time it will take. (I > > don’t > > > > think this should be hard to test, but I haven’t done it) > > > > - in case of hard crash, what is the singleton behavior? In graceful > > jvm > > > > termination, I know the cluster behavior is good, but there is always > > > this > > > > question about how downing nodes will be handled. If this critical > > piece > > > of > > > > the system relies on akka cluster functionality, we will need to make > > > sure > > > > that the singleton can be reconstituted, both in case of graceful > > > > termination (restart/deployment events) and non-graceful termination > > > (hard > > > > vm crash, hard container crash) . This is ignoring more complicated > > cases > > > > of extended network partitions, which will also have bad affects on > > many > > > of > > > > the downstream systems. > > > > > > > > > > > > > > > > > > > > Handover time as you say is crucial, but I'd say as it only impacts > > > > container creation, we could live with, let's say, 5 seconds of > > > > failover-downtime on this path? What's your experience been on > > singleton > > > > failover? How long did it take? > > > > > > > > > > > > > > > > >> Seconds in the simplest case, so I think we need to test it in a > > > scaled > > > > case (100s of cluster nodes), as well as the hard crash case (where > not > > > > downing the node may affect the cluster state). > > > > > > > > > > > > > > > > > > > > > > > > On Aug 16, 2018, at 11:01 AM, Tyson Norris <tnor...@adobe.com.INVALID > > > > <mailto:tnor...@adobe.com.INVALID><mailto:tnor...@adobe.com.INVALID > >> > > > > wrote: > > > > > > > > A couple comments on singleton: > > > > - use of cluster singleton will introduce a new single point of > failure > > > > - from time of singleton node failure, to single resurrection on a > > > > different instance, will be an outage from the point of view of any > > > > ContainerRouter that does not already have a warm+free container to > > > service > > > > an activation > > > > - resurrecting the singleton will require transferring or rebuilding > > the > > > > state when recovery occurs - in my experience this was tricky, and > > > requires > > > > replicating the data (which will be slightly stale, but better than > > > > rebuilding from nothing); I don’t recall the handover delay (to > > transfer > > > > singleton to a new akka cluster node) when I tried last, but I think > it > > > was > > > > not as fast as I hoped it would be. > > > > > > > > I don’t have a great suggestion for the singleton failure case, but > > > > would like to consider this carefully, and discuss the ramifications > > > (which > > > > may or may not be tolerable) before pursuing this particular aspect > of > > > the > > > > design. > > > > > > > > > > > > On prioritization: > > > > - if concurrency is enabled for an action, this is another > > > > prioritization aspect, of sorts - if the action supports concurrency, > > > there > > > > is no reason (except for destruction coordination…) that it cannot be > > > > shared across shards. This could be added later, but may be worth > > > > considering since there is a general reuse problem where a series of > > > > activations that arrives at different ContainerRouters will create a > > new > > > > container in each, while they could be reused (and avoid creating new > > > > containers) if concurrency is tolerated in that container. This would > > > only > > > > (ha ha) require changing how container destroy works, where it cannot > > be > > > > destroyed until the last ContainerRouter is done with it. And if > > > container > > > > destruction is coordinated in this way to increase reuse, it would > also > > > be > > > > good to coordinate construction (don’t concurrently construct the > same > > > > container for multiple containerRouters IFF a single container would > > > enable > > > > concurrent activations once it is created). I’m not sure if others > are > > > > desiring this level of container reuse, but if so, it would be worth > > > > considering these aspects (sharding/isolation vs > sharing/coordination) > > as > > > > part of any redesign. > > > > > > > > > > > > Yes, I can see where you're heading here. I think this can be > > > generalized: > > > > > > > > Assume intra-container concurrency C and number of ContainerRouters > R. > > > > If C > R: Shard the "slots" on this container evenly across R. The > > > > container can only be destroyed after you receive R acknowledgements > of > > > > doing so. > > > > If C < R: Hand out 1 slot to C Routers, point the remaining Routers > to > > > the > > > > ones that got slots. > > > > > > > > > > > > > > > > >>Yes, mostly - I think there is also a case where destruction > message > > is > > > > revoked by the same router (receiving a new activation for the > > container > > > > which it previously requested destruction of). But I think this is > > > covered > > > > in the details of tracking “after you receive R acks of destructions” > > > > > > > > > > > > > > > > Concurrent creation: Batch creation requests while one container is > > being > > > > created. Say you received a request for a new container that has C > > slots. > > > > If there are more requests for that container arriving while it is > > being > > > > created, don't act on them and fold the creation into the first one. > > Only > > > > start creating a new container if the number of resource requests > > exceed > > > C. > > > > > > > > Does that make sense? I think in that model you can set C=1 and it > > works > > > as > > > > I envisioned it to work, or set it to C=200 and things will be shared > > > even > > > > across routers. > > > > > > > > > > > > >> Side note: One detail about the pending concurrency impl today is > > that > > > > due to the async nature of tracking the active activations within the > > > > container, there is no guarantee (when C>1) that the number is exact, > > so > > > if > > > > you specify C=200, you may actually get a different container at 195 > or > > > > 205. This is not really related to this discussion, but is based on > the > > > > current messaging/future behavior in ContainerPool/ContainerProxy, so > > > > wanted to mention it explicitly, in case it matters to anyone. > > > > > > > > Thanks > > > > Tyson > > > > > > > > > > > > > > -- > > > Tzu-Chiao Yeh (@tz70s) > > > > > > > > -- > Tzu-Chiao Yeh (@tz70s) >