Apache Spark client high availability

preeze Mon, 12 Jan 2015 08:39:38 -0800

Dear community,

I've been searching the internet for quite a while to find out what is the
best architecture to support HA for a spark client.


We run an application that connects to a standalone Spark cluster and caches
a big chuck of data for subsequent intensive calculations. To achieve HA
we'll need to run several instances of the application on different hosts.

Initially I explored the option to reuse (i.e. share) the same executors set
between SparkContext instances of all running applications. Found it
impossible.

So, every application, which creates an instance of SparkContext, has to
spawn its own executors. Externalizing and sharing executors' memory cache
with Tachyon is a semi-solution since each application's executors will keep
using their own set of CPU cores.

Spark-jobserver is another possibility. It manages SparkContext itself and
accepts job requests from multiple clients for the same context which is
brilliant. However, this becomes a new single point of failure.

Now I am exploring if it's possible to run the Spark cluster in YARN cluster
mode and connect to the driver from multiple clients.

Is there anything I am missing guys?
Any suggestion is highly appreciated!



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-client-high-availability-tp10088.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Apache Spark client high availability

Reply via email to