Re: Proposal on a future architecture of OpenWhisk

Tyson Norris Thu, 16 Aug 2018 14:14:36 -0700

Thinking more about the singleton aspect, I guess this is mostly an issue for 
blackbox containers, where manifest/managed containers will mitigate at least 
some of the singleton failure delays by prewarm/stemcell containers.


So in the case of singleton failure, impacts would be:
- managed containers once prewarms are exhausted (may be improved by being more 
intelligent about prewarm pool sizing based on load etc)
- managed containers that don’t match any prewarms (similar - if prewarm pool 
is dynamically configured based on load, this is less problem)
- blackbox containers (no help)

If the failover of the singleton is too long (I think it will be based on 
cluster size, oldest node becomes the singleton host iirc), I think we need to 
consider how containers can launch in the meantime. A first step might be to 
test out the singleton behavior in the cluster of various sizes.

> On Aug 16, 2018, at 11:01 AM, Tyson Norris <tnor...@adobe.com.INVALID> wrote:
> 
> A couple comments on singleton:
> - use of cluster singleton will introduce a new single point of failure - 
> from time of singleton node failure, to single resurrection on a different 
> instance, will be an outage from the point of view of any ContainerRouter 
> that does not already have a warm+free container to service an activation
> - resurrecting the singleton will require transferring or rebuilding the 
> state when recovery occurs - in my experience this was tricky, and requires 
> replicating the data (which will be slightly stale, but better than 
> rebuilding from nothing); I don’t recall the handover delay (to transfer 
> singleton to a new akka cluster node) when I tried last, but I think it was 
> not as fast as I hoped it would be.
> 
> I don’t have a great suggestion for the singleton failure case, but would 
> like to consider this carefully, and discuss the ramifications (which may or 
> may not be tolerable) before pursuing this particular aspect of the design.
> 
> 
> On prioritization:
> - if concurrency is enabled for an action, this is another prioritization 
> aspect, of sorts - if the action supports concurrency, there is no reason 
> (except for destruction coordination…) that it cannot be shared across 
> shards. This could be added later, but may be worth considering since there 
> is a general reuse problem where a series of activations that arrives at 
> different ContainerRouters will create a new container in each, while they 
> could be reused (and avoid creating new containers) if concurrency is 
> tolerated in that container. This would only (ha ha) require changing how 
> container destroy works, where it cannot be destroyed until the last 
> ContainerRouter is done with it. And if container destruction is coordinated 
> in this way to increase reuse, it would also be good to coordinate 
> construction (don’t concurrently construct the same container for multiple 
> containerRouters IFF a single container would enable concurrent activations 
> once it is created). I’m not sure if others are desiring this level of 
> container reuse, but if so, it would be worth considering these aspects 
> (sharding/isolation vs sharing/coordination) as part of any redesign.
> 
> 
> WDYT?
> 
> THanks
> Tyson
> 
> On Aug 15, 2018, at 8:55 AM, Carlos Santana 
> <csantan...@gmail.com<mailto:csantan...@gmail.com>> wrote:
> 
> I think we should add a section on prioritization for blocking vs. async
> invokes (none blocking actions a triggers)
> 
> The front door has the luxury of known some intent from the incoming
> request, I feel it would make sense to high priority to blocking invokes,
> and for async they go straight to the queue to be pick up by the system to
> eventually run, even if it takes 10 times longer to execute than a blocking
> invoke, for example a webaction would take 10ms vs. a DB trigger fire, or a
> async webhook takes 100ms.
> 
> Also the controller takes time to convert a trigger and process the rules,
> this is something that can also be taken out of hot path.
> 
> So I'm just saying we could optimize the system because we know if the
> incoming request is a hot or hotter path :-)
> 
> -- Carlos
> 
>

Re: Proposal on a future architecture of OpenWhisk

Reply via email to