Re: Proposal on a future architecture of OpenWhisk

TzuChiao Yeh Sun, 19 Aug 2018 09:59:36 -0700

On Sun, Aug 19, 2018 at 7:13 PM Markus Thömmes <[email protected]>
wrote:


> Hi Tzu-Chiao,
>
> Am Sa., 18. Aug. 2018 um 06:56 Uhr schrieb TzuChiao Yeh <
> [email protected]>:
>
> > Hi Markus,
> >
> > Nice thoughts on separating logics in this revision! I'm not sure this
> > question has already been clarified, sorry if duplicate.
> >
> > Same question on cluster singleton:
> >
> > I think there will be two possibilities on container deletion: 1.
> > ContainerRouter removes it (when error or idle-state) 2. ContainerManager
> > decides to remove it (i.e. clear space for new creation).
> >
> > For case 2, how do we ensure the safe deletion in ContainerManager?
> > Consider if there's still a similar model on busy/free/prewarmed pool, it
> > might require additional states related to containers from busy to free
> > state, then we can safely remove it or reject if nothing found (system
> > overloaded).
> >
> > By paused state or other states/message? There might be some trade-offs
> on
> > granularity (time-slice in scheduling) and performance bottleneck on
> > ClusterSingleton.
> >

I'm not sure if I quite got the point, but here's an attempt on an
> explanation:
>
> Yes, Container removal in case 2 is triggered from the ContainerManager. To
> be able to safely remove it, it requests all ContainerRouters owning that
> container to stop serving it and hand it back. Once it's been handed back,
> the ContainerManager can safely delete it. The contract should also say: A
> container must be handed back in unpaused state, so it can be deleted
> safely. Since the ContainerRouters handle pause/unpause, they'll need to
> stop serving the container, unpause it, remove it from their state and
> acknowledge to the ContainerManager that they handed it back.
>

Thank you, it's clear to me.


> There is an open question on when to consider a system to be in overflow
> state, or rather: How to handle the edge-situation. If you cannot generate
> more containers, we need to decide whether we remove another container (the
> case you're describing) or if we call it quits and say "503, overloaded, go
> away for now". The logic deciding this is up for discussion as well. The
> heuristic could take into account how many resources in the whole system
> you already own, how many resources do others own and if we want to decide
> to share those fairly or not-fairly. Note that this is also very much
> related to being able to scale the resources up in themselves (to be able
> to generate new containers). If we assume a bounded system though, yes,
> we'll need to find a strategy on how to handle this case. I believe with
> the state the ContainerManager has, it can provide a more eloquent answer
> to that question than what we can do today (nothing really, we just keep on
> churning through containers).
>

I agree. An additional problem is in the case of burst requests,
ContainerManager will "over-estimate" containers allocation, whether
work-stealing between ContainerRouters has been enabled or not. For bounded
system, we have better carefully handle these to avoid frequently
creation/deletion. I'm wondering if sharing message queue between
ContainerManager (since it's not a critical path) or any mechanism for
checking queue size (i.e. checking kafka lags) can possibly eliminate
this?  However, this may be only happened in short running tasks and
throttling already being helpful.


> Does that answer the question?


> >
> > Thanks!
> >
> > Tzu-Chiao
> >
> > On Sat, Aug 18, 2018 at 5:55 AM Tyson Norris <[email protected]>
> > wrote:
> >
> > > Ugh my reply formatting got removed!!! Trying this again with some >>
> > >
> > > On Aug 17, 2018, at 2:45 PM, Tyson Norris <[email protected]
> > > <mailto:[email protected]>> wrote:
> > >
> > >
> > > If the failover of the singleton is too long (I think it will be based
> on
> > > cluster size, oldest node becomes the singleton host iirc), I think we
> > need
> > > to consider how containers can launch in the meantime. A first step
> might
> > > be to test out the singleton behavior in the cluster of various sizes.
> > >
> > >
> > > I agree this bit of design is crucial, a few thoughts:
> > > Pre-warm wouldn't help here, the ContainerRouters only know warm
> > > containers. Pre-warming is managed by the ContainerManager.
> > >
> > >
> > > >> Ah right
> > >
> > >
> > >
> > > Considering a fail-over scenario: We could consider sharing the state
> via
> > > EventSourcing. That is: All state lives inside of frequently
> snapshotted
> > > events and thus can be shared between multiple instances of the
> > > ContainerManager seamlessly. Alternatively, we could also think about
> > only
> > > working on persisted state. That way, a cold-standby model could fly.
> We
> > > should make sure that the state is not "slightly stale" but rather both
> > > instances see the same state at any point in time. I believe on that
> > > cold-path of generating new containers, we can live with the
> > extra-latency
> > > of persisting what we're doing as the path will still be dominated by
> the
> > > container creation latency.
> > >
> > >
> > >
> > > >> Wasn’t clear if you mean not using ClusterSingleton? To be clear in
> > > ClusterSingleton case there are 2 issues:
> > > - time it takes for akka ClusterSingletonManager to realize it needs to
> > > start a new actor
> > > - time it takes for the new actor to assume a usable state
> > >
> > > EventSourcing (or ext persistence) may help with the latter, but we
> will
> > > need to be sure the former is tolerable to start with.
> > > Here is an example test from akka source that may be useful (multi-jvm,
> > > but all local):
> > >
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fakka%2Fakka%2Fblob%2F009214ae07708e8144a279e71d06c4a504907e31%2Fakka-cluster-tools%2Fsrc%2Fmulti-jvm%2Fscala%2Fakka%2Fcluster%2Fsingleton%2FClusterSingletonManagerChaosSpec.scala&amp;data=02%7C01%7Ctnorris%40adobe.com%7C50be947ede884f3b78e208d6048ac99a%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636701391474213555&amp;sdata=Ojk1yRGCbG4OxD5MXOabmH1ggbgk%2BymZ7%2BUqDQINAPo%3D&amp;reserved=0
> > >
> > > Some things to consider, that I don’t know details of:
> > > - will the size of cluster affect the singleton behavior in case of
> > > failure? (I think so, but not sure, and what extent); in the simple
> test
> > > above it takes ~6s for the replacement singleton to begin startup, but
> if
> > > we have 100s of nodes, I’m not sure how much time it will take. (I
> don’t
> > > think this should be hard to test, but I haven’t done it)
> > > - in case of hard crash, what is the singleton behavior? In graceful
> jvm
> > > termination, I know the cluster behavior is good, but there is always
> > this
> > > question about how downing nodes will be handled. If this critical
> piece
> > of
> > > the system relies on akka cluster functionality, we will need to make
> > sure
> > > that the singleton can be reconstituted, both in case of graceful
> > > termination (restart/deployment events) and non-graceful termination
> > (hard
> > > vm crash, hard container crash) . This is ignoring more complicated
> cases
> > > of extended network partitions, which will also have bad affects on
> many
> > of
> > > the downstream systems.
> > >
> > >
> > >
> > >
> > > Handover time as you say is crucial, but I'd say as it only impacts
> > > container creation, we could live with, let's say, 5 seconds of
> > > failover-downtime on this path? What's your experience been on
> singleton
> > > failover? How long did it take?
> > >
> > >
> > >
> > > >> Seconds in the simplest case, so I think we need to test it in a
> > scaled
> > > case (100s of cluster nodes), as well as the hard crash case (where not
> > > downing the node may affect the cluster state).
> > >
> > >
> > >
> > >
> > >
> > > On Aug 16, 2018, at 11:01 AM, Tyson Norris <[email protected]
> > > <mailto:[email protected]><mailto:[email protected]>>
> > > wrote:
> > >
> > > A couple comments on singleton:
> > > - use of cluster singleton will introduce a new single point of failure
> > > - from time of singleton node failure, to single resurrection on a
> > > different instance, will be an outage from the point of view of any
> > > ContainerRouter that does not already have a warm+free container to
> > service
> > > an activation
> > > - resurrecting the singleton will require transferring or rebuilding
> the
> > > state when recovery occurs - in my experience this was tricky, and
> > requires
> > > replicating the data (which will be slightly stale, but better than
> > > rebuilding from nothing); I don’t recall the handover delay (to
> transfer
> > > singleton to a new akka cluster node) when I tried last, but I think it
> > was
> > > not as fast as I hoped it would be.
> > >
> > > I don’t have a great suggestion for the singleton failure case, but
> > > would like to consider this carefully, and discuss the ramifications
> > (which
> > > may or may not be tolerable) before pursuing this particular aspect of
> > the
> > > design.
> > >
> > >
> > > On prioritization:
> > > - if concurrency is enabled for an action, this is another
> > > prioritization aspect, of sorts - if the action supports concurrency,
> > there
> > > is no reason (except for destruction coordination…) that it cannot be
> > > shared across shards. This could be added later, but may be worth
> > > considering since there is a general reuse problem where a series of
> > > activations that arrives at different ContainerRouters will create a
> new
> > > container in each, while they could be reused (and avoid creating new
> > > containers) if concurrency is tolerated in that container. This would
> > only
> > > (ha ha) require changing how container destroy works, where it cannot
> be
> > > destroyed until the last ContainerRouter is done with it. And if
> > container
> > > destruction is coordinated in this way to increase reuse, it would also
> > be
> > > good to coordinate construction (don’t concurrently construct the same
> > > container for multiple containerRouters IFF a single container would
> > enable
> > > concurrent activations once it is created). I’m not sure if others are
> > > desiring this level of container reuse, but if so, it would be worth
> > > considering these aspects (sharding/isolation vs sharing/coordination)
> as
> > > part of any redesign.
> > >
> > >
> > > Yes, I can see where you're heading here. I think this can be
> > generalized:
> > >
> > > Assume intra-container concurrency C and number of ContainerRouters R.
> > > If C > R: Shard the "slots" on this container evenly across R. The
> > > container can only be destroyed after you receive R acknowledgements of
> > > doing so.
> > > If C < R: Hand out 1 slot to C Routers, point the remaining Routers to
> > the
> > > ones that got slots.
> > >
> > >
> > >
> > > >>Yes, mostly - I think there is also a case where destruction message
> is
> > > revoked by the same router (receiving a new activation for the
> container
> > > which it previously requested destruction of). But I think this is
> > covered
> > > in the details of tracking “after you receive R acks of destructions”
> > >
> > >
> > >
> > > Concurrent creation: Batch creation requests while one container is
> being
> > > created. Say you received a request for a new container that has C
> slots.
> > > If there are more requests for that container arriving while it is
> being
> > > created, don't act on them and fold the creation into the first one.
> Only
> > > start creating a new container if the number of resource requests
> exceed
> > C.
> > >
> > > Does that make sense? I think in that model you can set C=1 and it
> works
> > as
> > > I envisioned it to work, or set it to C=200 and things will be shared
> > even
> > > across routers.
> > >
> > >
> > > >> Side note: One detail about the pending concurrency impl today is
> that
> > > due to the async nature of tracking the active activations within the
> > > container, there is no guarantee (when C>1) that the number is exact,
> so
> > if
> > > you specify C=200, you may actually get a different container at 195 or
> > > 205. This is not really related to this discussion, but is based on the
> > > current messaging/future behavior in ContainerPool/ContainerProxy, so
> > > wanted to mention it explicitly, in case it matters to anyone.
> > >
> > > Thanks
> > > Tyson
> > >
> > >
> >
> > --
> > Tzu-Chiao Yeh (@tz70s)
> >
>


-- 
Tzu-Chiao Yeh (@tz70s)

Re: Proposal on a future architecture of OpenWhisk

Reply via email to