GitHub user echarles opened a pull request: https://github.com/apache/spark/pull/20451
[SPARK-23146][WIP] Support client mode for Kubernetes cluster backend ## What changes were proposed in this pull request? The changes allow to support Kubernetes resource manager in client mode (upon the existing cluster mode) ## How was this patch tested? The initial changes were done on the latest commits in the spark-k8s fork (https://github.com/apache-spark-on-k8s/spark) and have been tested on AWS with real data processing. In an effort to merge back the latests features to apache master, I open here untested changes subject to feedback and discussion. Documentation will be updated when code will be discussed, but in the meantime [there is a indigest design document](https://github.com/apache-spark-on-k8s/userdocs/pull/25/files) that can be read to know more about the changes. In- and Out- K8s Cluster considerations, as deps and hdfs access is discussed there. Upon the current design and implementation constructs, an open point I have is about the way we wanna configure the path of the k8s config in case of OutCluster mode. Options are: 1. Force use to specify the path and fail if this property is not given 2. In case of absence of `/var/run/secrets/kubernetes.io/serviceaccount/token` (which is there for InCluster), fall back automatically to the given property, or if no property has been given, fallback to the `$HOME/.kube/config` (in this latter case, there is no separate cacert nor keyfile, those details are all bundled in the single `$HOME/.kube/config` file). The tests so far have been done with separated config, cacert and key files (I guess the single config file should not give any issue). A last important point is how we move forward with this for the merge. To have a client mode better coverage, it would be interesting to have also downstream https://github.com/apache-spark-on-k8s/spark/pull/540 which is not only Kerberos, but also the Hadoop steps much needed to mount Hadoop conf to connect HDFS from Driver/Executors. Also to avoid mess in future merge, I list here the changes I had to deal with applying the patch on the apache master repo: + submitsteps package is steps + no OptionRequirements class (used in SparkKubernetesClientFactory) + no ExecutorLocalDirVolumeProvider in ExecutorPodFactory + no APISERVER_AUTH_DRIVER_MOUNTED_CONF_PREFIX in config.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/datalayer-contrib/spark-k8s k8s-client-mode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20451.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20451 ---- commit 26a0126d63fd9ead60ede029a3e7b8e95d34492a Author: Eric Charles <eric@...> Date: 2018-01-31T07:45:41Z [WIP] initial changes for the client mode support ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org