Thinking more about the singleton aspect, I guess this is mostly an issue for blackbox containers, where manifest/managed containers will mitigate at least some of the singleton failure delays by prewarm/stemcell containers.
So in the case of singleton failure, impacts would be: - managed containers once prewarms are exhausted (may be improved by being more intelligent about prewarm pool sizing based on load etc) - managed containers that don’t match any prewarms (similar - if prewarm pool is dynamically configured based on load, this is less problem) - blackbox containers (no help) If the failover of the singleton is too long (I think it will be based on cluster size, oldest node becomes the singleton host iirc), I think we need to consider how containers can launch in the meantime. A first step might be to test out the singleton behavior in the cluster of various sizes. > On Aug 16, 2018, at 11:01 AM, Tyson Norris <tnor...@adobe.com.INVALID> wrote: > > A couple comments on singleton: > - use of cluster singleton will introduce a new single point of failure - > from time of singleton node failure, to single resurrection on a different > instance, will be an outage from the point of view of any ContainerRouter > that does not already have a warm+free container to service an activation > - resurrecting the singleton will require transferring or rebuilding the > state when recovery occurs - in my experience this was tricky, and requires > replicating the data (which will be slightly stale, but better than > rebuilding from nothing); I don’t recall the handover delay (to transfer > singleton to a new akka cluster node) when I tried last, but I think it was > not as fast as I hoped it would be. > > I don’t have a great suggestion for the singleton failure case, but would > like to consider this carefully, and discuss the ramifications (which may or > may not be tolerable) before pursuing this particular aspect of the design. > > > On prioritization: > - if concurrency is enabled for an action, this is another prioritization > aspect, of sorts - if the action supports concurrency, there is no reason > (except for destruction coordination…) that it cannot be shared across > shards. This could be added later, but may be worth considering since there > is a general reuse problem where a series of activations that arrives at > different ContainerRouters will create a new container in each, while they > could be reused (and avoid creating new containers) if concurrency is > tolerated in that container. This would only (ha ha) require changing how > container destroy works, where it cannot be destroyed until the last > ContainerRouter is done with it. And if container destruction is coordinated > in this way to increase reuse, it would also be good to coordinate > construction (don’t concurrently construct the same container for multiple > containerRouters IFF a single container would enable concurrent activations > once it is created). I’m not sure if others are desiring this level of > container reuse, but if so, it would be worth considering these aspects > (sharding/isolation vs sharing/coordination) as part of any redesign. > > > WDYT? > > THanks > Tyson > > On Aug 15, 2018, at 8:55 AM, Carlos Santana > <csantan...@gmail.com<mailto:csantan...@gmail.com>> wrote: > > I think we should add a section on prioritization for blocking vs. async > invokes (none blocking actions a triggers) > > The front door has the luxury of known some intent from the incoming > request, I feel it would make sense to high priority to blocking invokes, > and for async they go straight to the queue to be pick up by the system to > eventually run, even if it takes 10 times longer to execute than a blocking > invoke, for example a webaction would take 10ms vs. a DB trigger fire, or a > async webhook takes 100ms. > > Also the controller takes time to convert a trigger and process the rules, > this is something that can also be taken out of hot path. > > So I'm just saying we could optimize the system because we know if the > incoming request is a hot or hotter path :-) > > -- Carlos > >