Janos Matyas created ZEPPELIN-3020:
--------------------------------------
Summary: Add support to run Spark interpreter on a Kubernetes
cluster
Key: ZEPPELIN-3020
URL: https://issues.apache.org/jira/browse/ZEPPELIN-3020
Project: Zeppelin
Issue Type: New Feature
Reporter: Janos Matyas
The goal of this PR is to be able to execute Spark notebooks on Kubernetes in
cluster mode, so that the Spark Driver runs inside Kubernetes cluster - based
on https://github.com/apache-spark-on-k8s/spark. Zeppelin uses `spark-submit`
to start RemoteInterpreterServer which is able to execute notebooks on Spark.
Kubernetes specific `spark-submit` parameters like driver, executor, init
container, shuffle images should be set in SPARK_SUBMIT_OPTIONS environment
variable. In case the Spark interpreter is configured with a K8 Spark specific
master url (k8s://https....) RemoteInterpreterServer is launched inside a Spark
driver pod on Kubernetes, thus Zeppelin server it has to be able to connect to
the remote server. In a Kubernetes cluster the best solution for this is
creating a K8S service for RemoteInterpreterServer. This is the reason for
having the SparkK8RemoteInterpreterManagerProcess - extending functionality of
RemoteInterpreterManagerProcess - which creates the Kubernetes service, mapping
the port of RemoteInterpreterServer in Driver pod and connects to this service
once Spark Driver pod is in Running state.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)