Re: Invoker HA on Mesos

Tyson Norris Tue, 27 Mar 2018 19:54:31 -0700

PR is opened here: https://github.com/apache/incubator-openwhisk/pull/3497

On Mar 27, 2018, at 3:25 PM, Tyson Norris
<tnor...@adobe.com.INVALID<mailto:tnor...@adobe.com.INVALID>> wrote:

On Mar 27, 2018, at 2:28 PM, David P Grove
<gro...@us.ibm.com<mailto:gro...@us.ibm.com>> wrote:

Tyson Norris <tnor...@adobe.com.INVALID<mailto:tnor...@adobe.com.INVALID>>
wrote on 03/27/2018 04:33:48 PM:

We’ve been discussing how to handle mesos framework HA in the
Invoker, and I created a proposal on the wiki to discuss.

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FOPENWHISK%2FClustered%2BSingleton&data=02%7C01%7Ctnorris%40adobe.com%7C68fc29c7e2e140b6bae008d5942a1e91%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636577830978350589&sdata=PGiGXPzmRMesUar6qTLPmyMNG%2FH6gRE2Xur05C3peRE%3D&reserved=0
+Invoker+for+HA+on+Mesos

In general, the idea is to allow a single cluster-wide/single
ContainerPool to operate, while providing a reasonable failover
behavior in case of its unexpected death.

To accomplish this, the proposal is to allow parts of the
ContainerPool (freePool and prewarmPool) to be replicated to other
(passive) invoker instances, and to allow the replicated container
meta data to be used by ContainerFactories to resurrect containers
for use in case a failure occurs.

This does a couple things, like removing the notion of resource
scheduling from the Controller (since there is only ever 1 invoker),
and allows the ContainerPool to operate with a holistic view of the
cluster, useful for whole-cluster ContainerFactory impls like
MesosContainerFactory.

I’m curious if the kubernetes folks will also find this useful?

Hi Tyson,

Thanks for writing this up!

A couple of thoughts.
1. Using Akka Distributed Data to actively track the set of
containers to support failure recovery seems like a lot of overhead. For
Kubernetes, we are labeling all the action containers with their owning
invoker using Kubernetes labels. So, when an invoker crashes and gets
replaced, one could recover all of its prewarmed & freepool containers with
a simple query against the Kubernetes API server. No need to track the set
actively; Kube is already doing that via the labels. Anything similar to
Kube labels in Mesos?

Do you have an example of the labels working? I guess the labels are changed
over time through the lifecycle of the container?
Nothing I know of similar in mesos, although it could be done in the
MesosContainerFactory for similar purposes.

I think it is partly a question of where to place container re-association
logic (in the ContainerPool? ContainerFactory? Both?), and partly how to
implement ContainerPool behavior.
Currently MesosContainerFactory operates as an embedded mesos client that must
behave as a singleton with failover, so ContainerPool needs to do the same

As for the overhead, I’m not sure it is, or isn’t. Currently it is using
WriteLocal writeConsistency to publish updates to the pools, and only on
particular combinations (prewarm+PrewarmedData, and free+WarmedData)

I’ll get the PR up shortly.

2. We've been exploring running with a smaller number of invokers
than worker nodes and cluster-wide scheduling using the
KubernetesContainerFactory + invokerAgent. However, I don't believe at
production scale a single Invoker for an entire cluster is going to be
viable. Especially with the current architecture where the action
parameters get streamed through the invoker and the action results get
streamed back through the invoker. I believe that is going to bottleneck
how many containers a single Invoker can manage.

Yeah I’m wondering the same thing. For now, operating only one (or a few)
controllers has the same issue right?

For mesos, we can operate "multiple invoker clusters”, following the same
approach outlined, but instead of using a single topic invoker0, we have
multiple topics invoker1..invokern, (similar to today), but where each topic is
consumed by multiple invokers (in active/passive mode).

Separately, related to performance:
- we still plan to allow concurrency so things that are required to be fast
(blocking activations, and ones that signal they tolerate it) should leverage
this, and things that can tolerate additional latency should not
- if the ContainerPool operated cleanup on a more GC-like semantic, external
clients (other invokers, or the controller) would be able to use existing
running containers (at least where concurrency is tolerated). When all clients
of a container have completed (plus some time), it could be garbage collected
by the ContainerPool (similar to how freePool containers currently linger).
This could unburden some activation processing from the current invoker
workflow.

Of course, your mileage may vary: I’m not sure how well any of that works in a
case where you don’t support concurrent activations, or for cases where the
number of actions/containers exceeds the number of clients (i.e. where
everything is only ever used once).

Thanks
Tyson

--dave

Re: Invoker HA on Mesos

Reply via email to