Re: Proposal on a future architecture of OpenWhisk

2018-08-19 Thread TzuChiao Yeh
On Sun, Aug 19, 2018 at 7:13 PM Markus Thömmes 
wrote:

> Hi Tzu-Chiao,
>
> Am Sa., 18. Aug. 2018 um 06:56 Uhr schrieb TzuChiao Yeh <
> su3g4284zo...@gmail.com>:
>
> > Hi Markus,
> >
> > Nice thoughts on separating logics in this revision! I'm not sure this
> > question has already been clarified, sorry if duplicate.
> >
> > Same question on cluster singleton:
> >
> > I think there will be two possibilities on container deletion: 1.
> > ContainerRouter removes it (when error or idle-state) 2. ContainerManager
> > decides to remove it (i.e. clear space for new creation).
> >
> > For case 2, how do we ensure the safe deletion in ContainerManager?
> > Consider if there's still a similar model on busy/free/prewarmed pool, it
> > might require additional states related to containers from busy to free
> > state, then we can safely remove it or reject if nothing found (system
> > overloaded).
> >
> > By paused state or other states/message? There might be some trade-offs
> on
> > granularity (time-slice in scheduling) and performance bottleneck on
> > ClusterSingleton.
> >

I'm not sure if I quite got the point, but here's an attempt on an
> explanation:
>
> Yes, Container removal in case 2 is triggered from the ContainerManager. To
> be able to safely remove it, it requests all ContainerRouters owning that
> container to stop serving it and hand it back. Once it's been handed back,
> the ContainerManager can safely delete it. The contract should also say: A
> container must be handed back in unpaused state, so it can be deleted
> safely. Since the ContainerRouters handle pause/unpause, they'll need to
> stop serving the container, unpause it, remove it from their state and
> acknowledge to the ContainerManager that they handed it back.
>

Thank you, it's clear to me.


> There is an open question on when to consider a system to be in overflow
> state, or rather: How to handle the edge-situation. If you cannot generate
> more containers, we need to decide whether we remove another container (the
> case you're describing) or if we call it quits and say "503, overloaded, go
> away for now". The logic deciding this is up for discussion as well. The
> heuristic could take into account how many resources in the whole system
> you already own, how many resources do others own and if we want to decide
> to share those fairly or not-fairly. Note that this is also very much
> related to being able to scale the resources up in themselves (to be able
> to generate new containers). If we assume a bounded system though, yes,
> we'll need to find a strategy on how to handle this case. I believe with
> the state the ContainerManager has, it can provide a more eloquent answer
> to that question than what we can do today (nothing really, we just keep on
> churning through containers).
>

I agree. An additional problem is in the case of burst requests,
ContainerManager will "over-estimate" containers allocation, whether
work-stealing between ContainerRouters has been enabled or not. For bounded
system, we have better carefully handle these to avoid frequently
creation/deletion. I'm wondering if sharing message queue between
ContainerManager (since it's not a critical path) or any mechanism for
checking queue size (i.e. checking kafka lags) can possibly eliminate
this?  However, this may be only happened in short running tasks and
throttling already being helpful.


> Does that answer the question?


> >
> > Thanks!
> >
> > Tzu-Chiao
> >
> > On Sat, Aug 18, 2018 at 5:55 AM Tyson Norris 
> > wrote:
> >
> > > Ugh my reply formatting got removed!!! Trying this again with some >>
> > >
> > > On Aug 17, 2018, at 2:45 PM, Tyson Norris  > > > wrote:
> > >
> > >
> > > If the failover of the singleton is too long (I think it will be based
> on
> > > cluster size, oldest node becomes the singleton host iirc), I think we
> > need
> > > to consider how containers can launch in the meantime. A first step
> might
> > > be to test out the singleton behavior in the cluster of various sizes.
> > >
> > >
> > > I agree this bit of design is crucial, a few thoughts:
> > > Pre-warm wouldn't help here, the ContainerRouters only know warm
> > > containers. Pre-warming is managed by the ContainerManager.
> > >
> > >
> > > >> Ah right
> > >
> > >
> > >
> > > Considering a fail-over scenario: We could consider sharing the state
> via
> > > EventSourcing. That is: All state lives inside of frequently
> snapshotted
> > > events and thus can be shared between multiple instances of the
> > > ContainerManager seamlessly. Alternatively, we could also think about
> > only
> > > working on persisted state. That way, a cold-standby model could fly.
> We
> > > should make sure that the state is not "slightly stale" but rather both
> > > instances see the same state at any point in time. I believe on that
> > > cold-path of generating new containers, we can live with the
> > extra-latency
> > > of persisting what we're doing as the pa

Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-19 Thread Dascalita Dragos
“... FWIW I should change that to no longer say "Kafka" but "buffer" or
"message
queue"...”
+1. One idea could be to use Akka Streams and let the OW operator make a
decision on using Kafka with Akka Streams, or not [1]. This would make OW
deployment easier, Kafka becoming optional, while opening the door for
other connectors like AWS Kinesis, Azure Event Hub, and others (see the
link at [1] for a more complete list of connectors )

[1] - https://developer.lightbend.com/docs/alpakka/current/
On Sun, Aug 19, 2018 at 7:30 AM Markus Thömmes 
wrote:

> Hi Tyson, Carlos,
>
> FWIW I should change that to no longer say "Kafka" but "buffer" or "message
> queue".
>
> I see two use-cases for a queue here:
> 1. What you two are alluding to: Buffering asynchronous requests because of
> a different notion of "latency sensitivity" if the system is in an overload
> scenario.
> 2. As a work-stealing type balancing layer between the ContainerRouters. If
> we assume round-robin/least-connected (essentially random) scheduling
> between ContainerRouters, we will get load discrepancies between them. To
> smoothen those out, a ContainerRouter can put the work on a queue to be
> stolen by a Router that actually has space for that work (for example:
> Router1 requests a new container, puts the work on the queue while it waits
> for that container, Router2 already has a free container and executes the
> action by stealing it from the queue). This does has the added complexity
> of breaking a streaming communication between User and Container (to
> support essentially unbounded payloads). A nasty wrinkle that might render
> this design alternative invalid! We could come up with something smarter
> here, i.e. only putting a reference to the work on the queue and the
> stealer connects to the initial owner directly which then streams the
> payload through to the stealer, rather than persisting it somewhere.
>
> It is important to note, that in this design, blocking invokes could
> potentially gain the ability to have unbounded entities, where
> trigger/non-blocking invokes might need to be subject to a bound here to be
> able to support eventual execution efficiently.
>
> Personally, I'm much more torn to the work-stealing type case. It implies a
> wholy different notion of using the queue though and doesn't have much to
> do with the way we use it today, which might be confusing. It could also
> well be the case, that work-stealing type algorithms are easier to back on
> a proper MQ vs. trying to make it work on Kafka.
>
> It might also be important to note that those two use-cases might require
> different technologies (buffering vs. queue-backend for work-stealing) and
> could well be seperated in the design as well. For instance, buffering
> triggers fires etc. does not necessarily need to be done on the execution
> layer but could instead be pushed to another layer. Having the notion of
> "async" vs "sync" in the execution layer could be benefitial for
> loadbalancing itself though. Something worth exploring imho.
>
> Sorry for the wall of text, I hope this clarifies things!
>
> Cheers,
> Markus
>
> Am Sa., 18. Aug. 2018 um 02:36 Uhr schrieb Carlos Santana <
> csantan...@gmail.com>:
>
> > triggers get responded right away (202) with an activation is and then
> > sent to the queue to be processed async same as async action invokes.
> >
> > I think we would keep same contract as today for this type of activations
> > that are eventually process different from blocking invokes including we
> > Actions were the http client hold a connection waiting for the result
> back.
> >
> > - Carlos Santana
> > @csantanapr
> >
> > > On Aug 17, 2018, at 6:14 PM, Tyson Norris 
> > wrote:
> > >
> > > Hi -
> > > Separate thread regarding the proposal: what is considered for routing
> > activations as overload and destined for kafka?
> > >
> > > In general, if kafka is not on the blocking activation path, why would
> > it be used at all, if the timeouts and processing expectations of
> blocking
> > and non-blocking are the same?
> > >
> > > One case I can imagine: triggers + non-blocking invokes, but only in
> the
> > case where those have some different timeout characteristics. e.g. if a
> > trigger fires an action, is there any case where the activation should be
> > buffered to kafka if it will timeout same as a blocking activation?
> > >
> > > Sorry if I’m missing something obvious.
> > >
> > > Thanks
> > > Tyson
> > >
> > >
> >
>


Re: Concurrency PR

2018-08-19 Thread Markus Thömmes
Hi Tyson, thanks for pushing forward on this! I'll try to get a review in
on it soon.

Am Fr., 17. Aug. 2018 um 19:04 Uhr schrieb Tyson Norris
:

> Hi -
> I have been noodling with a few tests and the akka http client and gotten
> the concurrency PR [1] to a good place, I think, so if anyone can help
> review that would be appreciated.
>
> A couple of notes:
> - akka http client has some different notion of connection reuse than the
> apache client, to address this I created a separate PR [2] which, instead
> of dissuading connection reuse, simple destroys the client (and connection
> pool) when the container is paused. (This change is not reflected in 2795
> FWIW). AFAIK the connection reuse issue only comes up with container
> pauses, so I wanted to address this where it is relevant, and not impose
> additional performance costs for concurrency cases. This client is still
> not enabled by default.
> - There was mention in the comments (for 2795) about need to handle a case
> where a container doesn’t support concurrency, but the action dev has
> enabled it at the action - this PR does NOT deal with that.
>
> To summarize, enabling concurrency requires:
> - all actions may signal that they support concurrency, so all images that
> might be used would need to support concurrency, if concurrency is enabled
> in your deployment
> - log collection must be handled outside of invoker (since invoker does
> not deal with interleaved log parsing)
> - wsk cli will require changes to allow action devs to set the concurrency
> limits on actions (current PR only exposes the OW api for doing this); I
> have a PR queued up for that [3]. (Will need another PR for the cli once
> the client-go lib is updated)
>
> To better handle the case of images that don’t support concurrency, or
> don’t support log collection from invoker, I would suggest we change the
> container protocol to allow containers to broadcast their support either
> via the /init endpoint, or via a new /info endpoint. This of course would
> not give feedback until an action is executed (as opposed to when action is
> created), but I think this is ok. I will work on a separate PR for this,
> but want to mention some thoughts here about possible approaches to address
> these known concerns.
>

Why not make this part of the runtimes manifest? Handling this as late as
actually invoking the action feels kinda weird if we can just as well know
ahead of time, that creating an action with a concurrency > 1 will not work
and should therefore forbid creation at all. Any strong reason not to
encode that information into the runtimes manifest?


>
> Thanks
> Tyson
>
>
> [1] https://github.com/apache/incubator-openwhisk/pull/2795
> [2] https://github.com/apache/incubator-openwhisk/pull/3976
> [3] https://github.com/apache/incubator-openwhisk-client-go/pull/94
>
>


Re: Kafka and Proposal on a future architecture of OpenWhisk

2018-08-19 Thread Markus Thömmes
Hi Tyson, Carlos,

FWIW I should change that to no longer say "Kafka" but "buffer" or "message
queue".

I see two use-cases for a queue here:
1. What you two are alluding to: Buffering asynchronous requests because of
a different notion of "latency sensitivity" if the system is in an overload
scenario.
2. As a work-stealing type balancing layer between the ContainerRouters. If
we assume round-robin/least-connected (essentially random) scheduling
between ContainerRouters, we will get load discrepancies between them. To
smoothen those out, a ContainerRouter can put the work on a queue to be
stolen by a Router that actually has space for that work (for example:
Router1 requests a new container, puts the work on the queue while it waits
for that container, Router2 already has a free container and executes the
action by stealing it from the queue). This does has the added complexity
of breaking a streaming communication between User and Container (to
support essentially unbounded payloads). A nasty wrinkle that might render
this design alternative invalid! We could come up with something smarter
here, i.e. only putting a reference to the work on the queue and the
stealer connects to the initial owner directly which then streams the
payload through to the stealer, rather than persisting it somewhere.

It is important to note, that in this design, blocking invokes could
potentially gain the ability to have unbounded entities, where
trigger/non-blocking invokes might need to be subject to a bound here to be
able to support eventual execution efficiently.

Personally, I'm much more torn to the work-stealing type case. It implies a
wholy different notion of using the queue though and doesn't have much to
do with the way we use it today, which might be confusing. It could also
well be the case, that work-stealing type algorithms are easier to back on
a proper MQ vs. trying to make it work on Kafka.

It might also be important to note that those two use-cases might require
different technologies (buffering vs. queue-backend for work-stealing) and
could well be seperated in the design as well. For instance, buffering
triggers fires etc. does not necessarily need to be done on the execution
layer but could instead be pushed to another layer. Having the notion of
"async" vs "sync" in the execution layer could be benefitial for
loadbalancing itself though. Something worth exploring imho.

Sorry for the wall of text, I hope this clarifies things!

Cheers,
Markus

Am Sa., 18. Aug. 2018 um 02:36 Uhr schrieb Carlos Santana <
csantan...@gmail.com>:

> triggers get responded right away (202) with an activation is and then
> sent to the queue to be processed async same as async action invokes.
>
> I think we would keep same contract as today for this type of activations
> that are eventually process different from blocking invokes including we
> Actions were the http client hold a connection waiting for the result back.
>
> - Carlos Santana
> @csantanapr
>
> > On Aug 17, 2018, at 6:14 PM, Tyson Norris 
> wrote:
> >
> > Hi -
> > Separate thread regarding the proposal: what is considered for routing
> activations as overload and destined for kafka?
> >
> > In general, if kafka is not on the blocking activation path, why would
> it be used at all, if the timeouts and processing expectations of blocking
> and non-blocking are the same?
> >
> > One case I can imagine: triggers + non-blocking invokes, but only in the
> case where those have some different timeout characteristics. e.g. if a
> trigger fires an action, is there any case where the activation should be
> buffered to kafka if it will timeout same as a blocking activation?
> >
> > Sorry if I’m missing something obvious.
> >
> > Thanks
> > Tyson
> >
> >
>


Re: Proposal on a future architecture of OpenWhisk

2018-08-19 Thread Markus Thömmes
Hi Tzu-Chiao,

Am Sa., 18. Aug. 2018 um 06:56 Uhr schrieb TzuChiao Yeh <
su3g4284zo...@gmail.com>:

> Hi Markus,
>
> Nice thoughts on separating logics in this revision! I'm not sure this
> question has already been clarified, sorry if duplicate.
>
> Same question on cluster singleton:
>
> I think there will be two possibilities on container deletion: 1.
> ContainerRouter removes it (when error or idle-state) 2. ContainerManager
> decides to remove it (i.e. clear space for new creation).
>
> For case 2, how do we ensure the safe deletion in ContainerManager?
> Consider if there's still a similar model on busy/free/prewarmed pool, it
> might require additional states related to containers from busy to free
> state, then we can safely remove it or reject if nothing found (system
> overloaded).
>
> By paused state or other states/message? There might be some trade-offs on
> granularity (time-slice in scheduling) and performance bottleneck on
> ClusterSingleton.
>

I'm not sure if I quite got the point, but here's an attempt on an
explanation:

Yes, Container removal in case 2 is triggered from the ContainerManager. To
be able to safely remove it, it requests all ContainerRouters owning that
container to stop serving it and hand it back. Once it's been handed back,
the ContainerManager can safely delete it. The contract should also say: A
container must be handed back in unpaused state, so it can be deleted
safely. Since the ContainerRouters handle pause/unpause, they'll need to
stop serving the container, unpause it, remove it from their state and
acknowledge to the ContainerManager that they handed it back.

There is an open question on when to consider a system to be in overflow
state, or rather: How to handle the edge-situation. If you cannot generate
more containers, we need to decide whether we remove another container (the
case you're describing) or if we call it quits and say "503, overloaded, go
away for now". The logic deciding this is up for discussion as well. The
heuristic could take into account how many resources in the whole system
you already own, how many resources do others own and if we want to decide
to share those fairly or not-fairly. Note that this is also very much
related to being able to scale the resources up in themselves (to be able
to generate new containers). If we assume a bounded system though, yes,
we'll need to find a strategy on how to handle this case. I believe with
the state the ContainerManager has, it can provide a more eloquent answer
to that question than what we can do today (nothing really, we just keep on
churning through containers).

Does that answer the question?


>
> Thanks!
>
> Tzu-Chiao
>
> On Sat, Aug 18, 2018 at 5:55 AM Tyson Norris 
> wrote:
>
> > Ugh my reply formatting got removed!!! Trying this again with some >>
> >
> > On Aug 17, 2018, at 2:45 PM, Tyson Norris  > > wrote:
> >
> >
> > If the failover of the singleton is too long (I think it will be based on
> > cluster size, oldest node becomes the singleton host iirc), I think we
> need
> > to consider how containers can launch in the meantime. A first step might
> > be to test out the singleton behavior in the cluster of various sizes.
> >
> >
> > I agree this bit of design is crucial, a few thoughts:
> > Pre-warm wouldn't help here, the ContainerRouters only know warm
> > containers. Pre-warming is managed by the ContainerManager.
> >
> >
> > >> Ah right
> >
> >
> >
> > Considering a fail-over scenario: We could consider sharing the state via
> > EventSourcing. That is: All state lives inside of frequently snapshotted
> > events and thus can be shared between multiple instances of the
> > ContainerManager seamlessly. Alternatively, we could also think about
> only
> > working on persisted state. That way, a cold-standby model could fly. We
> > should make sure that the state is not "slightly stale" but rather both
> > instances see the same state at any point in time. I believe on that
> > cold-path of generating new containers, we can live with the
> extra-latency
> > of persisting what we're doing as the path will still be dominated by the
> > container creation latency.
> >
> >
> >
> > >> Wasn’t clear if you mean not using ClusterSingleton? To be clear in
> > ClusterSingleton case there are 2 issues:
> > - time it takes for akka ClusterSingletonManager to realize it needs to
> > start a new actor
> > - time it takes for the new actor to assume a usable state
> >
> > EventSourcing (or ext persistence) may help with the latter, but we will
> > need to be sure the former is tolerable to start with.
> > Here is an example test from akka source that may be useful (multi-jvm,
> > but all local):
> >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fakka%2Fakka%2Fblob%2F009214ae07708e8144a279e71d06c4a504907e31%2Fakka-cluster-tools%2Fsrc%2Fmulti-jvm%2Fscala%2Fakka%2Fcluster%2Fsingleton%2FClusterSingletonManagerChaosSpec.scala&data=02%7

Re: Proposal on a future architecture of OpenWhisk

2018-08-19 Thread Markus Thömmes
Hi Dave,
Am Sa., 18. Aug. 2018 um 17:01 Uhr schrieb David P Grove :

>
> [ Discussion about cluster singleton or not for the ContainerManager]
>
> fwiw, I believe for Kubernetes we do not need to attempt to deal with fault
> tolerance for the ContainerManager state ourselves.  We can use labels to
> replicate all the persistent metadata for a container (prewarm or not, the
> ContainerRouter it is assigned to) in the Kube objects representing the
> pods in Kube's etcd metadata server.  If we need to restart a
> ContainerManager, the new instance can come up "instantly" and start
> servicing requests while recovering the state of the previous instance via
> querries against etcd to discover the pre-existing containers it owned.
>

Note that there is also state it has about ContainerRouters (how many are
there and which own which containers). We could make that queryable as
well, so as soon as a fallback happens, the fallback component queries the
state of all routers to get into consistent state.

I agree we should replicate as little state as possible and in the
Kubernetes case, we already have state about containers through pods and
their labels.


>
> We'll need to validate the performance of this is acceptable (should be,
> since it is just some asynchronous labeling operations when (a) the
> container is created and (b) on the initial transition from stemcell to
> warm), but it is going to be pretty simple to implement and makes good
> usage of the underlying platform's capabilities.
>

Agreed, good point.


>
> --dave
>


Re: Proposal on a future architecture of OpenWhisk

2018-08-19 Thread Markus Thömmes
Hi Tyson,

Am Fr., 17. Aug. 2018 um 23:45 Uhr schrieb Tyson Norris
:

>
> If the failover of the singleton is too long (I think it will be based on
> cluster size, oldest node becomes the singleton host iirc), I think we need
> to consider how containers can launch in the meantime. A first step might
> be to test out the singleton behavior in the cluster of various sizes.
>
>
> I agree this bit of design is crucial, a few thoughts:
> Pre-warm wouldn't help here, the ContainerRouters only know warm
> containers. Pre-warming is managed by the ContainerManager.
>
> Ah right
>
>
> Considering a fail-over scenario: We could consider sharing the state via
> EventSourcing. That is: All state lives inside of frequently snapshotted
> events and thus can be shared between multiple instances of the
> ContainerManager seamlessly. Alternatively, we could also think about only
> working on persisted state. That way, a cold-standby model could fly. We
> should make sure that the state is not "slightly stale" but rather both
> instances see the same state at any point in time. I believe on that
> cold-path of generating new containers, we can live with the extra-latency
> of persisting what we're doing as the path will still be dominated by the
> container creation latency.
>
> Wasn’t clear if you mean not using ClusterSingleton? To be clear in
> ClusterSingleton case there are 2 issues:
> - time it takes for akka ClusterSingletonManager to realize it needs to
> start a new actor
> - time it takes for the new actor to assume a usable state
>
> EventSourcing (or ext persistence) may help with the latter, but we will
> need to be sure the former is tolerable to start with.
> Here is an example test from akka source that may be useful (multi-jvm,
> but all local):
>
> https://github.com/akka/akka/blob/009214ae07708e8144a279e71d06c4a504907e31/akka-cluster-tools/src/multi-jvm/scala/akka/cluster/singleton/ClusterSingletonManagerChaosSpec.scala
>
> Some things to consider, that I don’t know details of:
> - will the size of cluster affect the singleton behavior in case of
> failure? (I think so, but not sure, and what extent); in the simple test
> above it takes ~6s for the replacement singleton to begin startup, but if
> we have 100s of nodes, I’m not sure how much time it will take. (I don’t
> think this should be hard to test, but I haven’t done it)
> - in case of hard crash, what is the singleton behavior? In graceful jvm
> termination, I know the cluster behavior is good, but there is always this
> question about how downing nodes will be handled. If this critical piece of
> the system relies on akka cluster functionality, we will need to make sure
> that the singleton can be reconstituted, both in case of graceful
> termination (restart/deployment events) and non-graceful termination (hard
> vm crash, hard container crash) . This is ignoring more complicated cases
> of extended network partitions, which will also have bad affects on many of
> the downstream systems.
>

I don't think we need to be eager to consider akka-cluster to be set in
stone here. The singleton in my mind doesn't need to be clustered at all.
Say we have a fully shared state through persistence or event-sourcing and
a hot-standby model, couldn't we implement the fallback through routing in
front of the active/passive ContainerManager pair? Once one goes
unreachable, fall back to the other.


>
>
>
> Handover time as you say is crucial, but I'd say as it only impacts
> container creation, we could live with, let's say, 5 seconds of
> failover-downtime on this path? What's your experience been on singleton
> failover? How long did it take?
>
>
> Seconds in the simplest case, so I think we need to test it in a scaled
> case (100s of cluster nodes), as well as the hard crash case (where not
> downing the node may affect the cluster state).
>
>
>
>
> On Aug 16, 2018, at 11:01 AM, Tyson Norris  >
> wrote:
>
> A couple comments on singleton:
> - use of cluster singleton will introduce a new single point of failure
> - from time of singleton node failure, to single resurrection on a
> different instance, will be an outage from the point of view of any
> ContainerRouter that does not already have a warm+free container to service
> an activation
> - resurrecting the singleton will require transferring or rebuilding the
> state when recovery occurs - in my experience this was tricky, and requires
> replicating the data (which will be slightly stale, but better than
> rebuilding from nothing); I don’t recall the handover delay (to transfer
> singleton to a new akka cluster node) when I tried last, but I think it was
> not as fast as I hoped it would be.
>
> I don’t have a great suggestion for the singleton failure case, but
> would like to consider this carefully, and discuss the ramifications (which
> may or may not be tolerable) before pursuing this particular aspect of the
> design.
>
>
> On prioritization:
> - if concurrency is