[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238782502 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isDefined) s"context ${kubeContext.getOrElse("?")}" else "current context"}" + --- End diff -- When I was testing this again today I was getting an error from just doing a plain `get` because for some reason the code was taking the first branch even though `kubeContext` was `None` so I changed to using an explicit `isDefined` instead of negating `isEmpty` and added `getOrElse` with a fallback just to avoid that --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238779965 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else "current context"}" + + s" from users K8S config file") + +// Start from an auto-configured config with the desired context +// Fabric 8 uses null to indicate that the users current context should be used so if no +// explicit setting pass null +val config = new ConfigBuilder(autoConfigure(kubeContext.getOrElse(null))) --- End diff -- Yes and I was referring to the K8S config file :) And yes the fact that we would propagate `spark.kubernetes.context` into the pod shouldn't be an issue because there won't be any K8S config file for it to interact with inside the pod as in-pod K8S config should be from the service account token that gets injected into the pod --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238484145 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) --- End diff -- Fixed in latest commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238483901 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else "current context"}" + + s" from users K8S config file") + +// Start from an auto-configured config with the desired context +// Fabric 8 uses null to indicate that the users current context should be used so if no +// explicit setting pass null +val config = new ConfigBuilder(autoConfigure(kubeContext.getOrElse(null))) --- End diff -- If the context does not exist then Fabric 8 falls back to other ways of auto-configuring itself (e.g. service account) Fabric 8 skips any file based auto-configuration if there is no K8S config file present (https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/Config.java#L436-L459). Since we don't propagate the submission clients config file into the driver pods no auto-configuration from config file will be attempted in the driver because there won't be a config file present. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r237447038 --- Diff: docs/running-on-kubernetes.md --- @@ -298,6 +298,16 @@ the Spark application. ## Kubernetes Features +### Configuration File + +Your Kubernetes config file typically lives under `.kube/config` in your home directory or in a location specified by the `KUBECONFIG` environment variable. Spark on Kubernetes will attempt to use this file to do an initial auto-configuration of the Kubernetes client used to interact with the Kubernetes cluster. A variety of Spark configuration properties are provided that allow further customising the client configuration e.g. using an alternative authentication method. --- End diff -- To be frank I'm not really sure why there are different config options for client vs cluster mode and this may have changed with some of the cleanup that @vanzin has been doing lately to simplify the configuration code. Personally I have never needed to use any of the additional configuration properties in either client/cluster mode as the auto-configuration from my K8S config file has always been sufficient. At worst I've needed to set `KUBECONFIG` to select the correct config file for the cluster I want to submit to. Note that the core behaviour (the auto-configuration) has always existed implicitly in the K8S backend but was just not called out explicitly previously in the docs. This PR primarily just makes it more explicit and flexible for users who have multiple contexts in their config files. WRT `spark.master` Spark in general requires that to always be set and will use that to override whatever is present in the K8S config file regardless. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22904 @liyinan926 Any chance of rounding up some other folks to get this reviewed and merged? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23017: [SPARK-26015][K8S] Set a default UID for Spark on K8S Im...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/23017 Rebased to catch up with master and adapt for @vanzin's improvements to Docker build context from PR #23019 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22904 Rebased to bring up to date with master and adapt to @vanzin's changes from PR #23019 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23017: [SPARK-26015][K8S] Set a default UID for Spark on...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23017#discussion_r237441889 --- Diff: docs/running-on-kubernetes.md --- @@ -19,9 +19,9 @@ Please see [Spark Security](security.html) and the specific advice below before ## User Identity -Images built from the project provided Dockerfiles do not contain any [`USER`](https://docs.docker.com/engine/reference/builder/#user) directives. This means that the resulting images will be running the Spark processes as `root` inside the container. On unsecured clusters this may provide an attack vector for privilege escalation and container breakout. Therefore security conscious deployments should consider providing custom images with `USER` directives specifying an unprivileged UID and GID. +Images built from the project provided Dockerfiles contain a default [`USER`](https://docs.docker.com/engine/reference/builder/#user) directive with a default UID of `185`. This means that the resulting images will be running the Spark processes as this UID inside the container. Security conscious deployments should consider providing custom images with `USER` directives specifying their desired unprivileged UID and GID. The resulting UID should include the root group in its supplementary groups in order to be able to run the Spark executables. Users building their own images with the provided `docker-image-tool.sh` script can use the `-u ` option to specify the desired UID. --- End diff -- Yes you can, the `USER` directive allows you to specify both a UID and a GID by separating them with a `:`. However as noted here changing the GID is likely to have side effects that are difficult for Spark as a project to deal with hence the note about ensuring your UID belongs to the `root` group. If end users need more complex setups then they are free to create their own custom images. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23053: [SPARK-25957][K8S] Make building alternate language bind...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/23053 Probably also want to update `docs/running-on-kubernetes.md` to make it clear that you now have to opt into building the additional language bindings --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23053: [SPARK-25957][K8S] Add ability to skip building optional...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/23053 @ramaddepally Sorry, clearly not reading straight today! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23017: [SPARK-26015][K8S] Set a default UID for Spark on K8S Im...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/23017 Have now added the doc updates to this so think this is ready for final review and merging --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23017: [SPARK-26015][K8S] Set a default UID for Spark on...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23017#discussion_r234970716 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ClientModeTestsSuite.scala --- @@ -17,13 +17,13 @@ package org.apache.spark.deploy.k8s.integrationtest import org.scalatest.concurrent.Eventually -import scala.collection.JavaConverters._ import org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.{k8sTestTag, INTERVAL, TIMEOUT} +import org.scalatest.Tag --- End diff -- I used tags just because that's what the existing scripting did. I thought there was probably a Maven property to do the same but using tags was fastest path for me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23053: [SPARK-25957][K8S] Add ability to skip building o...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23053#discussion_r234968357 --- Diff: bin/docker-image-tool.sh --- @@ -41,6 +41,18 @@ function image_ref { echo "$image" } +function docker_push { + local image_name="$1" + if [ ! -z $(docker images -q "$(image_ref ${image_name})") ]; then +docker push "$(image_ref ${image_name})" +if [ $? -ne 0 ]; then + error "Failed to push $image_name Docker image." --- End diff -- Script should really fail i.e. `exit` if a push fails --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23053: [SPARK-25957][K8S] Add ability to skip building o...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23053#discussion_r234967639 --- Diff: bin/docker-image-tool.sh --- @@ -102,33 +114,37 @@ function build { error "Failed to build Spark JVM Docker image, please refer to Docker build output for details." fi - docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \ --t $(image_ref spark-py) \ --f "$PYDOCKERFILE" . -if [ $? -ne 0 ]; then - error "Failed to build PySpark Docker image, please refer to Docker build output for details." + if [ "${PYDOCKERFILE}" != "skip" ]; then +docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \ + -t $(image_ref spark-py) \ + -f "$PYDOCKERFILE" . + if [ $? -ne 0 ]; then +error "Failed to build PySpark Docker image, please refer to Docker build output for details." --- End diff -- Script should bail out here, i.e. `exit` if the `docker build` fails --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23053: [SPARK-25957][K8S] Add ability to skip building o...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23053#discussion_r234968130 --- Diff: bin/docker-image-tool.sh --- @@ -102,33 +114,37 @@ function build { error "Failed to build Spark JVM Docker image, please refer to Docker build output for details." fi - docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \ --t $(image_ref spark-py) \ --f "$PYDOCKERFILE" . -if [ $? -ne 0 ]; then - error "Failed to build PySpark Docker image, please refer to Docker build output for details." + if [ "${PYDOCKERFILE}" != "skip" ]; then +docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \ + -t $(image_ref spark-py) \ + -f "$PYDOCKERFILE" . + if [ $? -ne 0 ]; then +error "Failed to build PySpark Docker image, please refer to Docker build output for details." + fi + else +echo "Skipped building PySpark docker image." + fi + + if [ "${RDOCKERFILE}" != "skip" ]; then +if [ -d "${SPARK_HOME}/R/lib" ]; then + docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \ +-t $(image_ref spark-r) \ +-f "$RDOCKERFILE" . + if [ $? -ne 0 ]; then +error "Failed to build SparkR Docker image, please refer to Docker build output for details." --- End diff -- Again, script should bail out i.e. `exit` if the image build fails --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23053: [SPARK-25957][K8S] Add ability to skip building o...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23053#discussion_r234968048 --- Diff: bin/docker-image-tool.sh --- @@ -41,6 +41,18 @@ function image_ref { echo "$image" } +function docker_push { + local image_name="$1" + if [ ! -z $(docker images -q "$(image_ref ${image_name})") ]; then +docker push "$(image_ref ${image_name})" +if [ $? -ne 0 ]; then + error "Failed to push $image_name Docker image." +fi + else +echo "$(image_ref ${image_name}) image not found. Skipping push for this image." --- End diff -- Currently the script could incorrectly go down this path if an image build fails because it doesn't exit in that case --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23017: [SPARK-26015][K8S] Set a default UID for Spark on...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23017#discussion_r234146162 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile --- @@ -53,5 +54,9 @@ COPY data /opt/spark/data ENV SPARK_HOME /opt/spark WORKDIR /opt/spark/work-dir +RUN chmod g+w /opt/spark/work-dir ENTRYPOINT [ "/opt/entrypoint.sh" ] + +# Specify the User that the actual main process will run as +USER ${spark_uid} --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23013: [SPARK-25023] More detailed security guidance for...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23013#discussion_r234144540 --- Diff: docs/running-on-kubernetes.md --- @@ -15,7 +15,19 @@ container images and entrypoints.** # Security Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. -Please see [Spark Security](security.html) and the specific security sections in this doc before running Spark. +Please see [Spark Security](security.html) and the specific advice below before running Spark. + +## User Identity + +Images built from the project provided Dockerfiles do not contain any [`USER`](https://docs.docker.com/engine/reference/builder/#user) directives. This means that the resulting images will be running the Spark processes as `root` inside the container. On unsecured clusters this may provide an attack vector for privilege escalation and container breakout. Therefore security conscious deployments should consider providing custom images with `USER` directives specifying an unprivileged UID and GID. --- End diff -- I would like this PR to go in first as this will also want back porting as-is to branch-2.4 Then in PR #23017 I will update the docs to explain the new defaults, how to customise etc. before that gets merged into master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spa...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23017#discussion_r234143917 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ClientModeTestsSuite.scala --- @@ -17,13 +17,13 @@ package org.apache.spark.deploy.k8s.integrationtest import org.scalatest.concurrent.Eventually -import scala.collection.JavaConverters._ import org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.{k8sTestTag, INTERVAL, TIMEOUT} +import org.scalatest.Tag --- End diff -- As I was just wanting to debug this specific failing test it was useful to run it on its own repeatedly as I investigated the issue. I backed out other changes I made to enable this (like not cleaning up the tests driver pod) so I will back this out as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23013: [SPARK-25023] More detailed security guidance for K8S
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/23013 @srowen I'm happy with it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spark on K...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/23017 > noted test issue. let's kick off test though @felixcheung This is now resolved, please kick off a retest when you get chance --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spark on K...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/23017 Resolved the issue with the client mode test. The test itself was actually badly written in that it used the Spark images but overrode the entry point which avoided the logic that sets up the `/etc/passwd` entry for the container UID. Without this running there is no home directory for the user so the Ivy setup fails as a result. By changing the test to not override the entry point the test can run successfully. Also made a couple of minor changes to this test to make it easier to debug. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spa...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23017#discussion_r233560448 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -30,6 +30,10 @@ set -e # If there is no passwd entry for the container UID, attempt to create one if [ -z "$uidentry" ] ; then if [ -w /etc/passwd ] ; then +# TODO Should we allow providing an environment variable to set the desired username? --- End diff -- @vanzin Well I am mostly following advice via @skonto from https://docs.okd.io/latest/creating_images/guidelines.html#use-uid which says: > For an image to support running as an arbitrary user, directories and files that may be written to by processes in the image should be owned by the root group and be read/writable by that group. Files to be executed should also have group execute permissions. > >Adding the following to your Dockerfile sets the directory and file permissions to allow users in the root group to access them in the built image: > ... >Because the container user is always a member of the root group, the container user can read and write these files. The root group does not have any special permissions (unlike the root user) so there are no security concerns with this arrangement. In addition, the processes running in the container must not listen on privileged ports (ports below 1024), since they are not running as a privileged user. Which is also where this code snippet/pattern originates from the folks who originally developed the container images for Spark on K8S --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spa...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/23017#discussion_r233385383 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -30,6 +30,10 @@ set -e # If there is no passwd entry for the container UID, attempt to create one if [ -z "$uidentry" ] ; then if [ -w /etc/passwd ] ; then +# TODO Should we allow providing an environment variable to set the desired username? --- End diff -- Not the case. `USER` directives take effect at the point at which they occur in the `Dockerfile`, so since the directive is placed after the `ENTRYPOINT` directive the entry point script still runs as `root` allowing adding UID entries and any other privileged setup operations that are needed prior to the containers main process running and having its UID set appropriately If you manually drop into the resulting container you can see that your UID is the specified non-privileged UID but that an `/etc/passwd` entry was successfully added for you by the entry point script --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23013: [SPARK-25023] More detailed security guidance for K8S
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/23013 @tgravescs Re: Point 1 I have a separate PR #22904 which makes some improvements to the docs around that point --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23013: [SPARK-25023] More detailed security guidance for K8S
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/23013 @mccheah I have tried to keep it minimal and just point to the official K8S docs. Obviously there is a balance to be had between high level warnings and detailed advice. K8S is still a relatively new technology to the wider industry regardless of its maturity as a whole so we should be highlighting the obvious pitfalls. @tgravescs 1. K8S API defaults to SSL by default, primarily the Fabric 8 library auto-configures itself from the users K8S config file which will typically set up all the necessary authentication. All the various authentication properties are used to override that initial auto-configuration and are unnecessary for most users. 2. Not sure 3. I haven't used client mode much so can't comment 4. Kerberos is already covered in more details in the main security doc --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spark on K...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/23017 For those with more knowledge of client mode here is the specific error seen in the integration tests: ``` Exception in thread "main" java.lang.IllegalArgumentException: basedir must be absolute: ?/.ivy2/local at org.apache.ivy.util.Checks.checkAbsolute(Checks.java:48) at org.apache.ivy.plugins.repository.file.FileRepository.setBaseDir(FileRepository.java:135) at org.apache.ivy.plugins.repository.file.FileRepository.(FileRepository.java:44) at org.apache.spark.deploy.SparkSubmitUtils$.createRepoResolvers(SparkSubmit.scala:1063) at org.apache.spark.deploy.SparkSubmitUtils$.buildIvySettings(SparkSubmit.scala:1149) at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:51) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:315) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ``` This looks to me like Spark/Ivy is discovering the users home directory incorrectly. Have done some digging into the code paths but have not spotted what exactly is wrong yet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spa...
GitHub user rvesse opened a pull request: https://github.com/apache/spark/pull/23017 [WIP][SPARK-26015][K8S] Set a default UID for Spark on K8S Images ## What changes were proposed in this pull request? Adds USER directives to the Dockerfiles which is configurable via build argument (`spark_uid`) for easy customisation. A `-u` flag is added to `bin/docker-image-tool.sh` to make it easy to customise this e.g. ``` > bin/docker-image-tool.sh -r rvesse -t uid -u 185 build > bin/docker-image-tool.sh -r rvesse -t uid push ``` If no UID is explicitly specified it defaults to `185` - this is per @skonto's suggestion to align with the OpenShift standard reserved UID for Java apps ( https://lists.openshift.redhat.com/openshift-archives/users/2016-March/msg00283.html) Notes: - We have to make the `WORKDIR` writable by the root group or otherwise jobs will fail with `AccessDeniedException` - We have to specify `HOME` to set it to the `WORKDIR` to avoid an Ivy error caused by `user.home` not having a valid value otherwise To Do: - [ ] Debug and resolve issue with client mode test - [ ] Consider whether to always propagate `SPARK_USER_NAME` to environment of driver and executor pods so `entrypoint.sh` can insert that into `/etc/passwd` entry - [ ] Rebase once PR #23013 is merged and update documentation accordingly ## How was this patch tested? Built the Docker images with the new Dockerfiles that include the `USER` directives. Ran the Spark on K8S integration tests against the new images. All pass except client mode which I am currently debugging further. Also manually dropped myself into the resulting container images via `docker run` and checked `id -u` output to see that UID is as expected. Tried customising the UID from the default via the new `-u` argument to `docker-image-tool.sh` and again checked the resulting image for the correct runtime UID. cc @felixcheung @skonto @vanzin You can merge this pull request into a Git repository by running: $ git pull https://github.com/rvesse/spark SPARK-26015 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23017.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23017 commit 92e22f76bbe61035dcde54c5fb2990cb7099a98d Author: Rob Vesse Date: 2018-11-12T13:44:03Z [SPARK-26015][K8S] Set a default UID for Spark on K8S Images Adds USER directives to the Dockerfiles which is configurable via build argument for easy customisation. A -u flag is added to bin/docker-image-tool.sh to make it easy to customise this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23013: [SPARK-25023] More detailed security guidance for...
GitHub user rvesse opened a pull request: https://github.com/apache/spark/pull/23013 [SPARK-25023] More detailed security guidance for K8S ## What changes were proposed in this pull request? Highlights specific security issues to be aware of with Spark on K8S and recommends K8S mechanisms that should be used to secure clusters. ## How was this patch tested? N/A - Documentation only CC @felixcheung @tgravescs @skonto You can merge this pull request into a Git repository by running: $ git pull https://github.com/rvesse/spark SPARK-25023 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23013.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23013 commit 5f2fe6ef686f0efb7d7b369008b7cdc9a7e8e8a2 Author: Rob Vesse Date: 2018-11-12T10:20:14Z [SPARK-25023] More detailed security guidance for K8S Highlights specific security issues to be aware of with Spark on K8S and recommends K8S mechanisms that should be used to secure clusters. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22959: [SPARK-25876][k8s] Simplify kubernetes configurat...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22959#discussion_r231838596 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -112,125 +72,139 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf]( def getOption(key: String): Option[String] = sparkConf.getOption(key) } +private[spark] class KubernetesDriverConf( +sparkConf: SparkConf, +val appId: String, +val mainAppResource: MainAppResource, +val mainClass: String, +val appArgs: Array[String], +val pyFiles: Seq[String]) + extends KubernetesConf(sparkConf) { + + override val resourceNamePrefix: String = { +val custom = if (Utils.isTesting) get(KUBERNETES_DRIVER_POD_NAME_PREFIX) else None +custom.getOrElse(KubernetesConf.getResourceNamePrefix(appName)) + } + + override def labels: Map[String, String] = { +val presetLabels = Map( + SPARK_APP_ID_LABEL -> appId, + SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE) +val driverCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs( + sparkConf, KUBERNETES_DRIVER_LABEL_PREFIX) + +presetLabels.keys.foreach { key => + require( +!driverCustomLabels.contains(key), +s"Label with key $key is not allowed as it is reserved for Spark bookkeeping operations.") +} + +driverCustomLabels ++ presetLabels + } + + override def environment: Map[String, String] = { +KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_DRIVER_ENV_PREFIX) + } + + override def annotations: Map[String, String] = { +KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_DRIVER_ANNOTATION_PREFIX) + } + + override def secretNamesToMountPaths: Map[String, String] = { +KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_DRIVER_SECRETS_PREFIX) + } + + override def secretEnvNamesToKeyRefs: Map[String, String] = { +KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_DRIVER_SECRET_KEY_REF_PREFIX) + } + + override def volumes: Seq[KubernetesVolumeSpec] = { +KubernetesVolumeUtils.parseVolumesWithPrefix(sparkConf, KUBERNETES_DRIVER_VOLUMES_PREFIX) + } +} + +private[spark] class KubernetesExecutorConf( +sparkConf: SparkConf, +val appId: String, +val executorId: String, +val driverPod: Option[Pod]) + extends KubernetesConf(sparkConf) { + + override val resourceNamePrefix: String = { +get(KUBERNETES_EXECUTOR_POD_NAME_PREFIX).getOrElse( + KubernetesConf.getResourceNamePrefix(appName)) + } + + override def labels: Map[String, String] = { +val presetLabels = Map( + SPARK_EXECUTOR_ID_LABEL -> executorId, + SPARK_APP_ID_LABEL -> appId, + SPARK_ROLE_LABEL -> SPARK_POD_EXECUTOR_ROLE) + +val executorCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs( + sparkConf, KUBERNETES_EXECUTOR_LABEL_PREFIX) + +presetLabels.keys.foreach { key => + require( +!executorCustomLabels.contains(key), +s"Custom executor labels cannot contain $key as it is reserved for Spark.") +} + +executorCustomLabels ++ presetLabels + } + + override def environment: Map[String, String] = sparkConf.getExecutorEnv.toMap + + override def annotations: Map[String, String] = { +KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_EXECUTOR_ANNOTATION_PREFIX) + } + + override def secretNamesToMountPaths: Map[String, String] = { +KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_EXECUTOR_SECRETS_PREFIX) + } + + override def secretEnvNamesToKeyRefs: Map[String, String] = { +KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, KUBERNETES_EXECUTOR_SECRET_KEY_REF_PREFIX) + } + + override def volumes: Seq[KubernetesVolumeSpec] = { +KubernetesVolumeUtils.parseVolumesWithPrefix(sparkConf, KUBERNETES_EXECUTOR_VOLUMES_PREFIX) + } + +} + private[spark] object KubernetesConf { def createDriverConf( sparkConf: SparkConf, - appName: String, - appResourceNamePrefix: String, appId: String, mainAppResource: MainAppResource, mainClass: String, appArgs: Array[String], - maybePyFiles: Option[String], - hadoopConfDir: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = { -val driverCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs( - sparkConf, KUBE
[GitHub] spark issue #22959: [SPARK-25876][k8s] Simplify kubernetes configuration typ...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22959 First glance this looks like a lot of nice simplification, will take a proper look over this tomorrow --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22911: [SPARK-25815][k8s] Support kerberos in client mod...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22911#discussion_r231195413 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala --- @@ -126,20 +134,53 @@ private[spark] class KerberosConfDriverFeatureStep( HadoopBootstrapUtil.bootstrapSparkUserPod( kubeTokenManager.getCurrentUser.getShortUserName, hadoopBasedSparkPod)) + +if (keytab.isDefined) { + val podWitKeytab = new PodBuilder(kerberizedPod.pod) --- End diff -- Typo - `Wit` -> `With` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22904 @mccheah I have made the requested changes, can I get another review please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r230318413 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -23,6 +23,18 @@ import org.apache.spark.internal.config.ConfigBuilder private[spark] object Config extends Logging { + val KUBERNETES_CONTEXT = +ConfigBuilder("spark.kubernetes.context") + .doc("The desired context from your K8S config file used to configure the K8S " + +"client for interacting with the cluster. Useful if your config file has " + +"multiple clusters or user identities defined. The client library used " + +"locates the config file via the KUBECONFIG environment variable or by defaulting " + +"to .kube/config under your home directory. If not specified then your current " + +"context is used. You can always override specific aspects of the config file " + +"provided configuration using other Spark on K8S configuration options.") + .stringConf + .createWithDefault(null) --- End diff -- Refactored to use `Option` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22805 @mccheah I have updated the comment to reference the follow up issue and opened a PR for that as #22904. Can we go ahead and merge now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
GitHub user rvesse opened a pull request: https://github.com/apache/spark/pull/22904 [SPARK-25887][K8S] Configurable K8S context support ## What changes were proposed in this pull request? This enhancement allows for specifying the desired context to use for the initial K8S client auto-configuration. This allows users to more easily access alternative K8S contexts without having to first explicitly change their current context via kubectl. ## How was this patch tested? Explicitly set my K8S context to a context pointing to a non-existent cluster, then launched Spark jobs with explicitly specified contexts via the new `spark.kubernetes.context` configuration property. Example Output: ``` > kubectl config current-context minikube > minikube status minikube: Stopped cluster: kubectl: > ./spark-submit --master k8s://https://localhost:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.context=docker-for-desktop --conf spark.kubernetes.container.image=rvesse/spark:debian local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar 4 18/10/31 11:57:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/10/31 11:57:51 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using context docker-for-desktop from users K8S config file 18/10/31 11:57:52 INFO LoggingPodStatusWatcherImpl: State changed, new state: pod name: spark-pi-1540987071845-driver namespace: default labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver pod uid: 32462cac-dd04-11e8-b6c6-0251 creation time: 2018-10-31T11:57:52Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv node name: N/A start time: N/A phase: Pending container status: N/A 18/10/31 11:57:52 INFO LoggingPodStatusWatcherImpl: State changed, new state: pod name: spark-pi-1540987071845-driver namespace: default labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver pod uid: 32462cac-dd04-11e8-b6c6-0251 creation time: 2018-10-31T11:57:52Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv node name: docker-for-desktop start time: N/A phase: Pending container status: N/A ... 18/10/31 11:58:03 INFO LoggingPodStatusWatcherImpl: State changed, new state: pod name: spark-pi-1540987071845-driver namespace: default labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, spark-role -> driver pod uid: 32462cac-dd04-11e8-b6c6-0251 creation time: 2018-10-31T11:57:52Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv node name: docker-for-desktop start time: 2018-10-31T11:57:52Z phase: Succeeded container status: container name: spark-kubernetes-driver container image: rvesse/spark:debian container state: terminated container started at: 2018-10-31T11:57:54Z container finished at: 2018-10-31T11:58:02Z exit code: 0 termination reason: Completed ``` Without the `spark.kubernetes.context` setting this will fail because the current context - `minikube` - is pointing to a non-running cluster e.g. ``` > ./spark-submit --master k8s://https://localhost:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.container.image=rvesse/spark:debian local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar 4 18/10/31 12:02:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/10/31 12:02:30 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 18/10/31 12:02:31 WARN WatchConnectionManager: Exec Failure javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at sun.security.ssl.Handshaker.fatalSE(H
[GitHub] spark pull request #22805: [SPARK-25809][K8S][TEST] New K8S integration test...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r229630011 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -63,6 +66,8 @@ private[spark] object SparkKubernetesClientFactory { .getOption(s"$kubernetesAuthConfPrefix.$CLIENT_CERT_FILE_CONF_SUFFIX") val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) + +// TODO Create builder in a way that respects configurable context --- End diff -- SPARK-25887, I'll update the comment to reference it explicitly --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22805 Did a bunch more testing on our internal K8S clusters today after rebasing this onto master. I am now happy that this is ready for final review and merging so I have removed the `[WIP]` tag --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r229400780 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -42,6 +42,9 @@ private[spark] object SparkKubernetesClientFactory { sparkConf: SparkConf, defaultServiceAccountToken: Option[File], defaultServiceAccountCaCert: Option[File]): KubernetesClient = { + +// TODO Support configurable context + --- End diff -- Note to reviewers. In testing this I noticed one issue is that when you have multiple contexts defined in your `KUBECONFIG` file that while this PR adds the ability for the integration test machinery to use an arbitrary context the submission client will always use your current context. Therefore if your current context does not match the requested context the integration tests will fail/return false positives because the client and the test harness will use different contexts. I have filed SPARK-25887 for addressing this since the client change seemed like it should be a separate issue and PR. I will provide a PR for that tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r229395036 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/IntegrationTestBackend.scala --- @@ -28,16 +30,15 @@ private[spark] trait IntegrationTestBackend { } private[spark] object IntegrationTestBackendFactory { - val deployModeConfigKey = "spark.kubernetes.test.deployMode" - def getTestBackend: IntegrationTestBackend = { -val deployMode = Option(System.getProperty(deployModeConfigKey)) - .getOrElse("minikube") -if (deployMode == "minikube") { - MinikubeTestBackend -} else { - throw new IllegalArgumentException( -"Invalid " + deployModeConfigKey + ": " + deployMode) +val deployMode = Option(System.getProperty(CONFIG_KEY_DEPLOY_MODE)) + .getOrElse(BACKEND_MINIKUBE) +deployMode match { + case BACKEND_MINIKUBE => MinikubeTestBackend + case BACKEND_CLOUD => new KubeConfigBackend(System.getProperty(CONFIG_KEY_KUBE_CONFIG_CONTEXT)) --- End diff -- The Fabric 8 client treats `null` as meaning use the users current context from their `KUBECONFIG` file - https://github.com/fabric8io/kubernetes-client/blob/v4.1.0/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/Config.java#L467 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r228470066 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/Utils.scala --- @@ -27,4 +27,36 @@ object Utils extends Logging { val resource = createResource try f.apply(resource) finally resource.close() } + + def checkAndGetK8sMasterUrl(rawMasterURL: String): String = { --- End diff -- Now fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r228470004 --- Diff: resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh --- @@ -71,19 +71,36 @@ if [[ $IMAGE_TAG == "N/A" ]]; then IMAGE_TAG=$(uuidgen); cd $UNPACKED_SPARK_TGZ - if [[ $DEPLOY_MODE == cloud ]] ; - then -$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG build -if [[ $IMAGE_REPO == gcr.io* ]] ; -then - gcloud docker -- push $IMAGE_REPO/spark:$IMAGE_TAG -else - $UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG push -fi - else -# -m option for minikube. -$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -m -r $IMAGE_REPO -t $IMAGE_TAG build - fi + + case $DEPLOY_MODE in +cloud|cloud-url) + # Build images + $UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG build + + # Push images appropriately + if [[ $IMAGE_REPO == gcr.io* ]] ; + then +gcloud docker -- push $IMAGE_REPO/spark:$IMAGE_TAG + else +$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG push + fi + ;; + + docker-for-desktop) --- End diff -- Now corrected --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r228469510 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/Utils.scala --- @@ -27,4 +27,36 @@ object Utils extends Logging { val resource = createResource try f.apply(resource) finally resource.close() } + + def checkAndGetK8sMasterUrl(rawMasterURL: String): String = { --- End diff -- Yes, will fix in next commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r228467937 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/IntegrationTestBackend.scala --- @@ -28,16 +30,16 @@ private[spark] trait IntegrationTestBackend { } private[spark] object IntegrationTestBackendFactory { - val deployModeConfigKey = "spark.kubernetes.test.deployMode" - def getTestBackend: IntegrationTestBackend = { -val deployMode = Option(System.getProperty(deployModeConfigKey)) - .getOrElse("minikube") -if (deployMode == "minikube") { - MinikubeTestBackend -} else { - throw new IllegalArgumentException( -"Invalid " + deployModeConfigKey + ": " + deployMode) +val deployMode = Option(System.getProperty(CONFIG_KEY_DEPLOY_MODE)) + .getOrElse(BACKEND_MINIKUBE) +deployMode match { + case BACKEND_MINIKUBE => MinikubeTestBackend + case BACKEND_CLOUD => new KubeConfigBackend(null) --- End diff -- Yes, done this in the latest commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r228467705 --- Diff: resource-managers/kubernetes/integration-tests/README.md --- @@ -41,12 +71,127 @@ The Spark code to test is handed to the integration test system via a tarball. H * `--spark-tgz ` - set `` to point to a tarball containing the Spark distribution to test. -TODO: Don't require the packaging of the built Spark artifacts into this tarball, just read them out of the current tree. +This Tarball should be created by first running `dev/make-distribution.sh` passing the `--tgz` flag and `-Pkubernetes` as one of the +options to ensure that Kubernetes support is included in the distribution. For more details on building a runnable distribution please +see the [Building Spark](https://spark.apache.org/docs/latest/building-spark.html#building-a-runnable-distribution) documentation. + +**TODO:** Don't require the packaging of the built Spark artifacts into this tarball, just read them out of the current tree. ## Customizing the Namespace and Service Account -* `--namespace ` - set `` to the namespace in which the tests should be run. -* `--service-account ` - set `` to the name of the Kubernetes service account to -use in the namespace specified by the `--namespace`. The service account is expected to have permissions to get, list, watch, +If no namespace is specified then a temporary namespace will be created and deleted during the test run. Similarly if no service +account is specified then the `default` service account for the namespace will be used. + +Using the `--namespace ` flag sets `` to the namespace in which the tests should be run. If this is supplied +then the tests assume this namespace exists in the K8S cluster and will not attempt to create it. Additionally this namespace must +have an appropriately authorized service account which can be customised via the `--service-account` flag. + +The `--service-account ` flag sets `` to the name of the Kubernetes service account to +use in the namespace specified by the `--namespace` flag. The service account is expected to have permissions to get, list, watch, and create pods. For clusters with RBAC turned on, it's important that the right permissions are granted to the service account in the namespace through an appropriate role and role binding. A reference RBAC configuration is provided in `dev/spark-rbac.yaml`. + +# Running the Test Directly + +If you prefer to run just the integration tests directly then you can customise the behaviour via properties passed to Maven using the +`-Dproperty=value` option e.g. --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r228467766 --- Diff: resource-managers/kubernetes/integration-tests/README.md --- @@ -13,15 +13,45 @@ The simplest way to run the integration tests is to install and run Minikube, th dev/dev-run-integration-tests.sh The minimum tested version of Minikube is 0.23.0. The kube-dns addon must be enabled. Minikube should -run with a minimum of 3 CPUs and 4G of memory: +run with a minimum of 4 CPUs and 6G of memory: -minikube start --cpus 3 --memory 4096 +minikube start --cpus 4 --memory 6144 You can download Minikube [here](https://github.com/kubernetes/minikube/releases). # Integration test customization -Configuration of the integration test runtime is done through passing different arguments to the test script. The main useful options are outlined below. +Configuration of the integration test runtime is done through passing different arguments to the test script. +The main useful options are outlined below. + +## Using a different backend + +The integration test backend i.e. the K8S cluster used for testing is controlled by the `--deploy-mode` option. By default this +is set to `minikube`, the available backends are their perquisites are as follows. --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r228467591 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/cloud/KubeConfigBackend.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.integrationtest.backend.cloud + +import java.nio.file.Paths + +import io.fabric8.kubernetes.client.{Config, DefaultKubernetesClient} +import org.apache.spark.deploy.k8s.integrationtest.TestConstants +import org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend +import org.apache.spark.internal.Logging + +private[spark] class KubeConfigBackend(var context: String) + extends IntegrationTestBackend with Logging { + // If no context supplied see if one was specified in the system properties supplied + // to the tests + if (context == null) { +context = System.getProperty(TestConstants.CONFIG_KEY_KUBE_CONFIG_CONTEXT) + } + logInfo(s"K8S Integration tests will run against " + +s"${if (context != null) s"context ${context}" else "default context"} " + +s" from users K8S config file") + + private var defaultClient: DefaultKubernetesClient = _ + + override def initialize(): Unit = { +// Auto-configure K8S client from K8S config file + System.setProperty(Config.KUBERNETES_AUTH_TRYKUBECONFIG_SYSTEM_PROPERTY, "true"); --- End diff -- Looked over what Fabric 8 client was doing behind the scenes and realised the values I was setting explicitly are the defaults anyway so removed this unnecessary manipulation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r228467650 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/cloud/KubeConfigBackend.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.integrationtest.backend.cloud + +import java.nio.file.Paths + +import io.fabric8.kubernetes.client.{Config, DefaultKubernetesClient} +import org.apache.spark.deploy.k8s.integrationtest.TestConstants +import org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend +import org.apache.spark.internal.Logging + +private[spark] class KubeConfigBackend(var context: String) --- End diff -- Refactored to avoid this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration testin...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22805 @liyinan926 I will rebase and squash appropriately once PR #22820 is merged --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client vers...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22820#discussion_r228458947 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -157,7 +157,9 @@ private[spark] object KubernetesUtils { }.getOrElse(Seq(("container state", "N/A"))) } - def formatTime(time: Time): String = { -if (time != null) time.getTime else "N/A" + def formatTime(time: String): String = { +// TODO: Investigate form `time` comes in as there was a DataType +// change from Time to String --- End diff -- e.g. a sample from the logs from my PR where I had made a similar change: ``` creation time: 2018-10-24T17:17:03Z ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client vers...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22820#discussion_r228458281 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -157,7 +157,9 @@ private[spark] object KubernetesUtils { }.getOrElse(Seq(("container state", "N/A"))) } - def formatTime(time: Time): String = { -if (time != null) time.getTime else "N/A" + def formatTime(time: String): String = { +// TODO: Investigate form `time` comes in as there was a DataType +// change from Time to String --- End diff -- The string returned is ISO8601 format so no need to do anything special here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration testin...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22805 Ran successfully against one of our dev K8S clusters today: ![screen shot 2018-10-25 at 17 39 40](https://user-images.githubusercontent.com/2104864/47516269-fd2e1480-d87c-11e8-80f9-2325edffe066.png) Will continue onto testing against one of our larger production clusters --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration testin...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22805 @ifilonenko I have done the generalisation today since it was fairly trivial and it actually resolves a number of concerns about the first pass implementation @skonto I have restored the cloud backend as requested, I ended up having two separate cloud backends. One using the original code that simply takes a master URL and another that takes a desired K8S context and auto-configures itself from the K8S config file and the context specified. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r227835021 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/docker/DockerForDesktopBackend.scala --- @@ -0,0 +1,68 @@ +package org.apache.spark.deploy.k8s.integrationtest.backend.docker + +import java.nio.file.Paths + +import io.fabric8.kubernetes.client.{Config, DefaultKubernetesClient} +import org.apache.spark.deploy.k8s.integrationtest.ProcessUtils +import org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend + +private[spark] object DockerForDesktopBackend extends IntegrationTestBackend { --- End diff -- With the latest commits the only state maintained is the K8S client which is reusable and there is no longer any need to modify the on-disk state via `kubectl` by adopting a newer Fabric 8 client library version --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r227833846 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/docker/DockerForDesktopBackend.scala --- @@ -0,0 +1,68 @@ +package org.apache.spark.deploy.k8s.integrationtest.backend.docker --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration testin...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22805 @skonto Yep, I plan to do that tomorrow At least for my `minikube` instance I found 4g insufficient and a couple of tests would fail because their pods didn't get scheduled. 8g is probably over allocating, somewhere in-between likely works also --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration testin...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22805 @srowen Yes there are a lot of assumptions made by the integration tests that are not documented anywhere and I figured out by digging in the code and POMs. Broadly speaking right now to run integration tests you need the following: - A runnable distribution of Spark built - `minikube` installed, configured with at least 4 CPUs and 8 GB memory and started Probably the next thing I will do tomorrow on this is to improve the documentation. There is a `README.md` in the integration testing module but it omits various details I have encountered working with these recently. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22805#discussion_r227400699 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/docker/DockerForDesktopBackend.scala --- @@ -0,0 +1,68 @@ +package org.apache.spark.deploy.k8s.integrationtest.backend.docker + +import java.nio.file.Paths + +import io.fabric8.kubernetes.client.{Config, DefaultKubernetesClient} +import org.apache.spark.deploy.k8s.integrationtest.ProcessUtils +import org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend + +private[spark] object DockerForDesktopBackend extends IntegrationTestBackend { + + private val KUBECTL_STARTUP_TIMEOUT_SECONDS = 15 + + private var defaultClient: DefaultKubernetesClient = _ + private var initialContext = "" + + private def getCurrentContext: String = { +val outputs = executeKubectl("config", "current-context") +assert(outputs.size == 1, "Unexpected amount of output from kubectl config current-context") +outputs.head + } + + private def setContext(context: String): Unit = { +val outputs = executeKubectl("config", "use-context", context) +assert(outputs.size == 1, "Unexpected amount of output from kubectl config use-context") +val errors = outputs.filter(_.startsWith("error")) +assert(errors.size == 0, s"Received errors from kubectl: ${errors.head}") + } + + override def initialize(): Unit = { +// Switch context if necessary +// TODO: If we were using Fabric 8 client 3.1.0 then we could +// instead just use the overload of autoConfigure() that takes the +// desired context avoiding the need to interact with kubectl at all +initialContext = getCurrentContext --- End diff -- Would people be opposed to bumping the K8S client to 3.1.0 (or even a more recent 3.1.x release) as this would simplify the code significantly and avoid any need to shell out to `kubectl`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...
GitHub user rvesse opened a pull request: https://github.com/apache/spark/pull/22805 [WIP][SPARK-25809][K8S][TEST] New K8S integration testing backends ## What changes were proposed in this pull request? Currently K8S integration tests are hardcoded to use a `minikube` based backend. `minikube` is VM based so can be resource hungry and also doesn't cope well with certain networking setups (for example using Cisco AnyConnect software VPN `minikube` is unusable as it detects its own IP incorrectly). This PR Adds a new K8S integration testing backend that allows for using the Kubernetes support in [Docker for Desktop](https://blog.docker.com/2018/07/kubernetes-is-now-available-in-docker-desktop-stable-channel/). This is WIP towards generalising to be able to run the integration tests against an arbitrary Kubernetes cluster. To Do: - [ ] General Kubernetes cluster backend - [ ] Documentation on Kubernetes integration testing ## How was this patch tested? Ran integration tests with Docker for Desktop and all passed: ![screen shot 2018-10-23 at 14 19 56](https://user-images.githubusercontent.com/2104864/47363460-c5816a00-d6ce-11e8-9c15-56b34698e797.png) Suggested Reviewers: @ifilonenko @srowen You can merge this pull request into a Git repository by running: $ git pull https://github.com/rvesse/spark SPARK-25809 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22805.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22805 commit ea3cfd670ec7a7fbb5bfe2b4df56afd67900df9b Author: Rob Vesse Date: 2018-10-23T13:11:16Z [SPARK-25809][K8S] Docker for desktop K8S integration backend Adds a new K8S integration testing backend that allows for using the Kubernetes support in Docker for Desktop --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r227373891 --- Diff: bin/docker-image-tool.sh --- @@ -79,7 +79,7 @@ function build { fi # Verify that Spark has actually been built/is a runnable distribution - #Â i.e. the Spark JARs that the Docker files will place into the image are present --- End diff -- It wasn't as if I used a non-breaking space intentionally, I just used my OSes default editor for Shell scripts! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r227274978 --- Diff: bin/docker-image-tool.sh --- @@ -79,7 +79,7 @@ function build { fi # Verify that Spark has actually been built/is a runnable distribution - #Â i.e. the Spark JARs that the Docker files will place into the image are present --- End diff -- Obvious question but why are we still using ASCII encoding for anything? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r226983785 --- Diff: bin/docker-image-tool.sh --- @@ -79,7 +79,7 @@ function build { fi # Verify that Spark has actually been built/is a runnable distribution - #Â i.e. the Spark JARs that the Docker files will place into the image are present --- End diff -- @dongjoon-hyun This was edited in Xcode on OS X so almost certainly defaulting to UTF-8 encoding --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh script
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22748 Rebased onto master, should be ready for merging --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22681: [SPARK-25682][k8s] Package example jars in same target f...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22681 I actually just got bit by this today while trying to run K8S integration tests with custom images. The integration tests assume the runnable distribution layout for `/opt/spark/examples` in the image so if you have built your images from a working copy the examples JAR is in the wrong place and the integration tests will fail. Tracking this down is made extra hard by the integration tests deleting pods after the tests run so you have to catch the pod status and grab its logs before they get cleaned up. I have patched the tests locally to make this configurable but looks like this change should render that unnecessary though I can still submit that patch if it sounds like it would be useful? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22681: [SPARK-25682][k8s] Package example jars in same t...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22681#discussion_r226289881 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile --- @@ -18,6 +18,7 @@ FROM openjdk:8-alpine ARG spark_jars=jars +ARG example_jars=examples/jars --- End diff -- Also the K8S integration tests currently assume that the examples JAR is present since they rely upon running code from the examples --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh s...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22748#discussion_r225867655 --- Diff: bin/docker-image-tool.sh --- @@ -78,20 +91,38 @@ function build { docker build $NOCACHEARG "${BUILD_ARGS[@]}" \ -t $(image_ref spark) \ -f "$BASEDOCKERFILE" . + if [ $? -ne 0 ]; then +error "Failed to build base Docker image, please refer to Docker build output for details" --- End diff -- Fixed this and other instances thereof --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh s...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22748#discussion_r225867566 --- Diff: bin/docker-image-tool.sh --- @@ -44,28 +44,41 @@ function image_ref { function build { local BUILD_ARGS local IMG_PATH + local JARS if [ ! -f "$SPARK_HOME/RELEASE" ]; then # Set image build arguments accordingly if this is a source repo and not a distribution archive. IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles +JARS=assembly/target/scala-$SPARK_SCALA_VERSION/jars BUILD_ARGS=( ${BUILD_PARAMS} --build-arg img_path=$IMG_PATH --build-arg - spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars + spark_jars=$JARS --build-arg k8s_tests=resource-managers/kubernetes/integration-tests/tests ) else -# Not passed as an argument to docker, but used to validate the Spark directory. +# Not passed as arguments to docker, but used to validate the Spark directory. IMG_PATH="kubernetes/dockerfiles" +JARS=jars BUILD_ARGS=(${BUILD_PARAMS}) fi + # Verify that the Docker image content directory is present if [ ! -d "$IMG_PATH" ]; then error "Cannot find docker image. This script must be run from a runnable distribution of Apache Spark." fi + + # Verify that Spark has actually been built/is a runnable distribution + #Â i.e. the Spark JARs that the Docker files will place into the image are present + local TOTAL_JARS=$(ls $JARS/spark-* | wc -l) + TOTAL_JARS=$(( $TOTAL_JARS )) + if [ "${TOTAL_JARS}" -eq 0 ]; then +error "Cannot find Spark JARs. This script assumes that Apache Spark has first been built locally." --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh script
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22748 > There seems to be overlapping logic between this PR and #22681 Yes sorry, I was having issues with the script while working on something unrelated and hadn't realised your integration tests were also proposing changes to this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22608: [SPARK-23257][K8S][TESTS] Kerberos Support Integr...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22608#discussion_r225614226 --- Diff: bin/docker-image-tool.sh --- @@ -71,18 +71,29 @@ function build { --build-arg base_img=$(image_ref spark) ) - local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/Dockerfile"} - local PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/spark/bindings/python/Dockerfile"} - local RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/spark/bindings/R/Dockerfile"} + local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/Dockerfile"} + local PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/bindings/python/Dockerfile"} + local KDOCKERFILE=${KDOCKERFILE:-"$IMG_PATH/test/dockerfiles/spark/kerberos/Dockerfile"} + local RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/bindings/R/Dockerfile"} + # Spark Base docker build $NOCACHEARG "${BUILD_ARGS[@]}" \ -t $(image_ref spark) \ -f "$BASEDOCKERFILE" . + # PySpark docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \ -t $(image_ref spark-py) \ -f "$PYDOCKERFILE" . + # The following are optional docker builds for Kerberos Testing + docker pull ifilonenko/hadoop-base:latest --- End diff -- Isn't the explicit pull unnecessary? Trying to build the new Kerberos image which is `FROM` this will implicitly pull it anyway --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh script
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22748 Suggested reviewers: @mccheah @liyinan926 @skonto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh s...
GitHub user rvesse opened a pull request: https://github.com/apache/spark/pull/22748 [SPARK-25745][K8S] Improve docker-image-tool.sh script ## What changes were proposed in this pull request? Adds error checking and handling to `docker` invocations ensuring the script terminates early in the event of any errors. This avoids subtle errors that can occur e.g. if the base image fails to build the Python/R images can end up being built from outdated base images and makes it more explicit to the user that something went wrong. Additionally the provided `Dockerfiles` assume that Spark was first built locally or is a runnable distribution however it didn't previously enforce this. The script will now check the JARs folder to ensure that Spark JARs actually exist and if not aborts early reminding the user they need to build locally first. ## How was this patch tested? - Tested with a `mvn clean` working copy and verified that the script now terminates early - Tested with bad `Dockerfiles` that fail to build to see that early termination occurred You can merge this pull request into a Git repository by running: $ git pull https://github.com/rvesse/spark SPARK-25745 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22748.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22748 commit beb4083d8351030ac2b07bf6aab68d3b0f555ac9 Author: Rob Vesse Date: 2018-10-16T12:48:46Z [SPARK-25745][K8S] Improve docker-image-tool.sh script - Actually check for failures from Docker commands and terminate early - Validate that Spark JARs are actually present before attempting the Docker build --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22146#discussion_r223620758 --- Diff: docs/running-on-kubernetes.md --- @@ -799,4 +815,168 @@ specific to Spark on Kubernetes. This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3. + + spark.kubernetes.driver.podTemplateFile + (none) + + Specify the local file that contains the driver [pod template](#pod-template). For example + spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml` + + + + spark.kubernetes.executor.podTemplateFile + (none) + + Specify the local file that contains the executor [pod template](#pod-template). For example + spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml` + + + + + Pod template properties + +See the below table for the full list of pod specifications that will be overwritten by spark. + +### Pod Metadata + + +Pod metadata keyModified valueDescription + + name + Value of spark.kubernetes.driver.pod.name + +The driver pod name will be overwritten with either the configured or default value of +spark.kubernetes.driver.pod.name. The executor pod names will be unaffected. + + + + namespace + Value of spark.kubernetes.namespace + +Spark makes strong assumptions about the driver and executor namespaces. Both driver and executor namespaces will +be replaced by either the configured or default spark conf value. + + + + labels + Adds the labels from spark.kubernetes.{driver,executor}.label.* + +Spark will add additional labels specified by the spark configuration. + + + + annotations + Adds the annotations from spark.kubernetes.{driver,executor}.annotation.* + +Spark will add additional labels specified by the spark configuration. + + + + +### Pod Spec + + +Pod spec keyModified valueDescription + + imagePullSecrets + Adds image pull secrets from spark.kubernetes.container.image.pullSecrets + +Additional pull secrets will be added from the spark configuration to both executor pods. + + + + nodeSelector + Adds node selectors from spark.kubernetes.node.selector.* + +Additional node selectors will be added from the spark configuration to both executor pods. + + + + restartPolicy + "never" + +Spark assumes that both drivers and executors never restart. + + + + serviceAccount + Value of spark.kubernetes.authenticate.driver.serviceAccountName + +Spark will override serviceAccount with the value of the spark configuration for only +driver pods, and only if the spark configuration is specified. Executor pods will remain unaffected. + + + + serviceAccountName + Value of spark.kubernetes.authenticate.driver.serviceAccountName + +Spark will override serviceAccountName with the value of the spark configuration for only +driver pods, and only if the spark configuration is specified. Executor pods will remain unaffected. + + + + volumes + Adds volumes from spark.kubernetes.{driver,executor}.volumes.[VolumeType].[VolumeName].mount.path + +Spark will add volumes as specified by the spark conf, as well as additional volumes necessary for passing --- End diff -- +1 As I've commented elsewhere it is easily possible to create invalid specs with this feature because Spark will create certain volumes, config maps etc with known name patterns that users need to avoid --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22146 @mccheah I was taking that as a given --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r221539921 --- Diff: docs/security.md --- @@ -729,6 +729,15 @@ so that non-local processes can authenticate. These delegation tokens in Kuberne shared by the Driver and its Executors. As such, there are three ways of submitting a kerberos job: In all cases you must define the environment variable: `HADOOP_CONF_DIR`. +It also important to note that the KDC needs to be visible from inside the containers if the user uses a local +krb5 file. + +If a user wishes to use a remote HADOOP_CONF directory, that contains the Hadoop configuration files, or +a remote krb5 file, this could be achieved by mounting a pre-defined ConfigMap and mounting the volume in the +desired location that you can point to via the appropriate configs. This method is useful for those who wish to not +rebuild their Docker images, but instead point to a ConfigMap that they could modify. This strategy is supported +via the pod-template feature. + --- End diff -- This paragraph is a great addition --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22584: [SPARK-25262][DOC][FOLLOWUP] Fix missing markup t...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22584#discussion_r221538327 --- Diff: docs/running-on-kubernetes.md --- @@ -799,7 +799,7 @@ specific to Spark on Kubernetes. spark.kubernetes.local.dirs.tmpfs - false + false --- End diff -- Thanks for fixing, apologies for introducing the error in the first place --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/21669 @vanzin I think in the current implementation of this PR the Kerberos login is happening inside the driver pod which is running inside the K8S cluster. The old design from the Spark on K8S fork did the login on the submission client and then simply distributed the delegation tokens from there. Something closer to the old design with the login happening in the submission client would be simpler and have less gotchas. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22256: [SPARK-25262][K8S] Better support configurability...
Github user rvesse closed the pull request at: https://github.com/apache/spark/pull/22256 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22256: [SPARK-25262][K8S] Better support configurability of Spa...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22256 Closed in favour of #22323 which has been merged --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215695909 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -212,6 +212,60 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_KERBEROS_PROXY_USER = +ConfigBuilder("spark.kubernetes.kerberos.proxyUser") + .doc("Specify the proxy user " + +"for HadoopUGI login for the Driver + Executors") + .internal() + .stringConf + .createWithDefault("false") + + val KUBERNETES_KERBEROS_KRB5_FILE = +ConfigBuilder("spark.kubernetes.kerberos.krb5location") + .doc("Specify the location of the krb5 file " + --- End diff -- Would be worth being explicit here that the KDC defined needs to be visible from inside the containers which may not be the case in some network setups. For example we have clusters with multiple networks and K8S is bound to a different network than the network that our default Kerberos configuration points to. So I had to copy and customise the default KDC for Kerberos login to get this running during my testing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r215692843 --- Diff: docs/security.md --- @@ -722,6 +722,62 @@ with encryption, at least. The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created. +## Secure Interaction with Kubernetes + +When talking to Hadoop-based services behind Kerberos, it was noted that Spark needs to obtain delegation tokens +so that non-local processes can authenticate. These delegation tokens in Kubernetes are stored in Secrets that are +shared by the Driver and its Executors. As such, there are three ways of submitting a kerberos job: + --- End diff -- Might be worth making it explicit here that for any of the following examples to work `HADOOP_CONF_DIR` must be defined in the submission environment otherwise the K8S backend skips all the HDFS steps including Kerberos setup Also shouldn't the Running on Kubernetes docs also be updated to mention this feature, even if only to link users across to this doc? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22215 Think this is pretty much ready to merge, can folks take another look when they get chance --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs ba...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22323 All comments so far addressed, can we kick off the PR builder on this now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22323#discussion_r215625508 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala --- @@ -22,6 +22,7 @@ import java.util.UUID import io.fabric8.kubernetes.api.model.{ContainerBuilder, HasMetadata, PodBuilder, VolumeBuilder, VolumeMountBuilder} import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesDriverSpecificConf, KubernetesRoleSpecificConf, SparkPod} +import org.apache.spark.deploy.k8s.Config._ --- End diff -- Left as-is --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22323#discussion_r215625448 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -225,6 +225,15 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_LOCAL_DIRS_TMPFS = +ConfigBuilder("spark.kubernetes.local.dirs.tmpfs") + .doc("If set to true then emptyDir volumes created to back spark.local.dirs will have " + +"their medium set to Memory so that they will be created as tmpfs (i.e. RAM) backed " + +"volumes. This may improve performance but scratch space usage will count towards " + +"your pods memory limit so you may wish to request more memory.") +.booleanConf --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22323#discussion_r215625299 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala --- @@ -45,6 +47,10 @@ private[spark] class LocalDirsFeatureStep( new VolumeBuilder() .withName(s"spark-local-dir-${index + 1}") .withNewEmptyDir() +.withMedium(useLocalDirTmpFs match { + case true => "Memory" // Use tmpfs --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22323#discussion_r215625636 --- Diff: docs/running-on-kubernetes.md --- @@ -215,6 +215,19 @@ spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below. +## Local Storage + +Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume mounted for each directory listed in `SPARK_LOCAL_DIRS`. If no directories are explicitly specified then a default directory is created and configured appropriately. + +`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do not persist beyond the life of the pod. + +### Using RAM for local storage + +As `emptyDir` volumes use the nodes backing storage for ephemeral storage this default behaviour may not be appropriate for some compute environments. For example if you have diskless nodes with remote storage mounted over a network having lots of executors doing IO to this remote storage may actually degrade performance. --- End diff -- One suggestion used, other sentence tweaked to read better and insert a comma --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22323#discussion_r215625362 --- Diff: docs/running-on-kubernetes.md --- @@ -215,6 +215,19 @@ spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below. +## Local Storage + +Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume mounted for each directory listed in `SPARK_LOCAL_DIRS`. If no directories are explicitly specified then a default directory is created and configured appropriately. + +`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do not persist beyond the life of the pod. + +### Using RAM for local storage + +As `emptyDir` volumes use the nodes backing storage for ephemeral storage this default behaviour may not be appropriate for some compute environments. For example if you have diskless nodes with remote storage mounted over a network having lots of executors doing IO to this remote storage may actually degrade performance. + +In this case it may be desirable to set `spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes. When configured like this Sparks local storage usage will count towards your pods memory usage therefore you may wish to increase your memory requests via the normal `spark.driver.memory` and `spark.executor.memory` configuration properties. --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22323#discussion_r215338145 --- Diff: docs/running-on-kubernetes.md --- @@ -215,6 +215,19 @@ spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below. +## Local Storage + +Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume mounted for each directory listed in `SPARK_LOCAL_DIRS`. If no directories are explicitly specified then a default directory is created and configured appropriately. + +`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do not persist beyond the life of the pod. + +### Using RAM for local storage + +As `emptyDir` volumes use the nodes backing storage for ephemeral storage this default behaviour may not be appropriate for some compute environments. For example if you have diskless nodes with remote storage mounted over a network having lots of executors doing IO to this remote storage may actually degrade performance. + +In this case it may be desirable to set `spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes. When configured like this Sparks local storage usage will count towards your pods memory usage therefore you may wish to increase your memory requests via the normal `spark.driver.memory` and `spark.executor.memory` configuration properties. --- End diff -- Ok, I'll update the proposed doc changes to mention both --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22323#discussion_r215187426 --- Diff: docs/running-on-kubernetes.md --- @@ -215,6 +215,19 @@ spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai The configuration properties for mounting volumes into the executor pods use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. For a complete list of available options for each supported type of volumes, please refer to the [Spark Properties](#spark-properties) section below. +## Local Storage + +Spark uses temporary scratch space to spill data to disk during shuffles and other operations. When using Kubernetes as the resource manager the pods will be created with an [emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) volume mounted for each directory listed in `SPARK_LOCAL_DIRS`. If no directories are explicitly specified then a default directory is created and configured appropriately. + +`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do not persist beyond the life of the pod. + +### Using RAM for local storage + +As `emptyDir` volumes use the nodes backing storage for ephemeral storage this default behaviour may not be appropriate for some compute environments. For example if you have diskless nodes with remote storage mounted over a network having lots of executors doing IO to this remote storage may actually degrade performance. + +In this case it may be desirable to set `spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes. When configured like this Sparks local storage usage will count towards your pods memory usage therefore you may wish to increase your memory requests via the normal `spark.driver.memory` and `spark.executor.memory` configuration properties. --- End diff -- @liyinan926 Yes it is in the case of K8S, per the [documentation](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir): > However, you can set the emptyDir.medium field to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead. While tmpfs is very fast, be aware that unlike disks, tmpfs is cleared on node reboot and **any files you write will count against your Containerâs memory limit**. Emphasis added by me, since the container memory requests and limits are driven by `spark.*.memory` this is the appropriate setting to change. Changing the memory overhead would also serve to increase these limits but if the user has a rough idea of how much memory they need asking for it explicitly is easier. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/21669 @ifilonenko I think the issue with the `UnixUsername` might possibly be avoided by exporting `HADOOP_USER_NAME` as an environment variable in the pod spec set to the same value as `SPARK_USER`. This should cause the Hadoop UGI machinery to simply take the provided username rather than attempting to obtain it from the container which fails because the user does not exist as a Unix user in `/etc/passwd` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22256: [SPARK-25262][K8S] Better support configurability...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22256#discussion_r214623272 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala --- @@ -37,41 +40,99 @@ private[spark] class LocalDirsFeatureStep( .orElse(conf.getOption("spark.local.dir")) .getOrElse(defaultLocalDir) .split(",") + private val useLocalDirTmpFs = conf.get(KUBERNETES_LOCAL_DIRS_TMPFS) override def configurePod(pod: SparkPod): SparkPod = { val localDirVolumes = resolvedLocalDirs .zipWithIndex .map { case (localDir, index) => -new VolumeBuilder() - .withName(s"spark-local-dir-${index + 1}") - .withNewEmptyDir() - .endEmptyDir() - .build() +val name = s"spark-local-dir-${index + 1}" +// To allow customisation of local dirs backing volumes we should avoid creating +// emptyDir volumes if the volume is already defined in the pod spec +hasVolume(pod, name) match { + case true => +// For pre-existing volume definitions just re-use the volume +pod.pod.getSpec().getVolumes().asScala.find(v => v.getName.equals(name)).get + case false => +// Create new emptyDir volume +new VolumeBuilder() + .withName(name) + .withNewEmptyDir() +.withMedium(useLocalDirTmpFs match { + case true => "Memory" // Use tmpfs + case false => null// Default - use nodes backing storage +}) + .endEmptyDir() + .build() +} } + val localDirVolumeMounts = localDirVolumes .zip(resolvedLocalDirs) .map { case (localDirVolume, localDirPath) => -new VolumeMountBuilder() - .withName(localDirVolume.getName) - .withMountPath(localDirPath) - .build() +hasVolumeMount(pod, localDirVolume.getName, localDirPath) match { + case true => +// For pre-existing volume mounts just re-use the mount +pod.container.getVolumeMounts().asScala + .find(m => m.getName.equals(localDirVolume.getName) + && m.getMountPath.equals(localDirPath)) + .get + case false => +// Create new volume mount +new VolumeMountBuilder() + .withName (localDirVolume.getName) + .withMountPath (localDirPath) + .build() +} + } + +// Check for conflicting volume mounts +for (m: VolumeMount <- localDirVolumeMounts) { + if (hasConflictingVolumeMount(pod, m.getName, m.getMountPath).size > 0) { +throw new SparkException(s"Conflicting volume mounts defined, pod template attempted to " + + "mount SPARK_LOCAL_DIRS volume ${m.getName} multiple times or at an alternative path " + + "then the expected ${m.getPath}") } +} + val podWithLocalDirVolumes = new PodBuilder(pod.pod) .editSpec() -.addToVolumes(localDirVolumes: _*) + // Don't want to re-add volumes that already existed in the incoming spec + // as duplicate definitions will lead to K8S API errors +.addToVolumes(localDirVolumes.filter(v => !hasVolume(pod, v.getName)): _*) --- End diff -- @mccheah @ifilonenko OK, I have opened PR #22323 with just the `tmpfs` enabling changes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...
GitHub user rvesse opened a pull request: https://github.com/apache/spark/pull/22323 [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S ## What changes were proposed in this pull request? The default behaviour of Spark on K8S currently is to create `emptyDir` volumes to back `SPARK_LOCAL_DIRS`. In some environments e.g. diskless compute nodes this may actually hurt performance because these are backed by the Kubelet's node storage which on a diskless node will typically be some remote network storage. Even if this is enterprise grade storage connected via a high speed interconnect the way Spark uses these directories as scratch space (lots of relatively small short lived files) has been observed to cause serious performance degradation. Therefore we would like to provide the option to use K8S's ability to instead back these `emptyDir` volumes with `tmpfs`. Therefore this PR adds a configuration option that enables `SPARK_LOCAL_DIRS` to be backed by Memory backed `emptyDir` volumes rather than the default. Documentation is added to describe both the default behaviour plus this new option and its implications. One of which is that scratch space then counts towards your pods memory limits and therefore users will need to adjust their memory requests accordingly. *NB* - This is an alternative version of PR #22256 reduced to just the `tmpfs` piece ## How was this patch tested? Ran with this option in our diskless compute environments to verify functionality You can merge this pull request into a Git repository by running: $ git pull https://github.com/rvesse/spark SPARK-25262-tmpfs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22323.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22323 commit 544f132f2bf61d05b16d93b086e5a776824b70c5 Author: Rob Vesse Date: 2018-08-28T13:52:07Z [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S Adds a configuration option that enables SPARK_LOCAL_DIRS to be backed by Memory backed emptyDir volumes rather than the default which is whatever the kubelet's node storage happens to be --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22215: [SPARK-25222][K8S] Improve container status loggi...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22215#discussion_r214614277 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala --- @@ -151,13 +152,15 @@ private[spark] class ExecutorPodsLifecycleManager( private def exitReasonMessage(podState: FinalPodState, execId: Long, exitCode: Int) = { val pod = podState.pod +val reason = Option(pod.getStatus.getReason) +val message = Option(pod.getStatus.getMessage) s""" |The executor with id $execId exited with exit code $exitCode. - |The API gave the following brief reason: ${pod.getStatus.getReason} - |The API gave the following message: ${pod.getStatus.getMessage} + |The API gave the following brief reason: ${reason.getOrElse("")} --- End diff -- Fixed in latest commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22215 @mccheah Thanks for the review, have made the change you suggested to use N/A instead of empty string. I have left indentation as tabs for now, as I said in a previous comment this was just what the existing code used and I am happy to change it if others also want the change to spaces made --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22215: [SPARK-25222][K8S] Improve container status loggi...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22215#discussion_r214612510 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -60,4 +64,81 @@ private[spark] object KubernetesUtils { } def parseMasterUrl(url: String): String = url.substring("k8s://".length) + + def formatPairsBundle(pairs: Seq[(String, String)], indent: Int = 1) : String = { +// Use more loggable format if value is null or empty +val indentStr = "\t" * indent --- End diff -- I just preserved the original codes choice here, I would happily change to spaces if preferred --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22256: [SPARK-25262][K8S] Better support configurability...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22256#discussion_r214416634 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala --- @@ -37,41 +40,99 @@ private[spark] class LocalDirsFeatureStep( .orElse(conf.getOption("spark.local.dir")) .getOrElse(defaultLocalDir) .split(",") + private val useLocalDirTmpFs = conf.get(KUBERNETES_LOCAL_DIRS_TMPFS) override def configurePod(pod: SparkPod): SparkPod = { val localDirVolumes = resolvedLocalDirs .zipWithIndex .map { case (localDir, index) => -new VolumeBuilder() - .withName(s"spark-local-dir-${index + 1}") - .withNewEmptyDir() - .endEmptyDir() - .build() +val name = s"spark-local-dir-${index + 1}" +// To allow customisation of local dirs backing volumes we should avoid creating +// emptyDir volumes if the volume is already defined in the pod spec +hasVolume(pod, name) match { + case true => +// For pre-existing volume definitions just re-use the volume +pod.pod.getSpec().getVolumes().asScala.find(v => v.getName.equals(name)).get + case false => +// Create new emptyDir volume +new VolumeBuilder() + .withName(name) + .withNewEmptyDir() +.withMedium(useLocalDirTmpFs match { + case true => "Memory" // Use tmpfs + case false => null// Default - use nodes backing storage +}) + .endEmptyDir() + .build() +} } + val localDirVolumeMounts = localDirVolumes .zip(resolvedLocalDirs) .map { case (localDirVolume, localDirPath) => -new VolumeMountBuilder() - .withName(localDirVolume.getName) - .withMountPath(localDirPath) - .build() +hasVolumeMount(pod, localDirVolume.getName, localDirPath) match { + case true => +// For pre-existing volume mounts just re-use the mount +pod.container.getVolumeMounts().asScala + .find(m => m.getName.equals(localDirVolume.getName) + && m.getMountPath.equals(localDirPath)) + .get + case false => +// Create new volume mount +new VolumeMountBuilder() + .withName (localDirVolume.getName) + .withMountPath (localDirPath) + .build() +} + } + +// Check for conflicting volume mounts +for (m: VolumeMount <- localDirVolumeMounts) { + if (hasConflictingVolumeMount(pod, m.getName, m.getMountPath).size > 0) { +throw new SparkException(s"Conflicting volume mounts defined, pod template attempted to " + + "mount SPARK_LOCAL_DIRS volume ${m.getName} multiple times or at an alternative path " + + "then the expected ${m.getPath}") } +} + val podWithLocalDirVolumes = new PodBuilder(pod.pod) .editSpec() -.addToVolumes(localDirVolumes: _*) + // Don't want to re-add volumes that already existed in the incoming spec + // as duplicate definitions will lead to K8S API errors +.addToVolumes(localDirVolumes.filter(v => !hasVolume(pod, v.getName)): _*) --- End diff -- Ok, will do that Monday FYI I notice @onursatici has now made some similar tweaks in his latest commit - https://github.com/apache/spark/pull/22146/commits/a4fde0cdc4dc5b64fd3f888244656371eb76f837 - notice several feature steps there now have `editOrNewX()` or `addToX()` so that they combine with rather than overriding the template --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22256: [SPARK-25262][K8S] Better support configurability...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22256#discussion_r214277672 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala --- @@ -37,41 +40,99 @@ private[spark] class LocalDirsFeatureStep( .orElse(conf.getOption("spark.local.dir")) .getOrElse(defaultLocalDir) .split(",") + private val useLocalDirTmpFs = conf.get(KUBERNETES_LOCAL_DIRS_TMPFS) override def configurePod(pod: SparkPod): SparkPod = { val localDirVolumes = resolvedLocalDirs .zipWithIndex .map { case (localDir, index) => -new VolumeBuilder() - .withName(s"spark-local-dir-${index + 1}") - .withNewEmptyDir() - .endEmptyDir() - .build() +val name = s"spark-local-dir-${index + 1}" +// To allow customisation of local dirs backing volumes we should avoid creating +// emptyDir volumes if the volume is already defined in the pod spec +hasVolume(pod, name) match { + case true => +// For pre-existing volume definitions just re-use the volume +pod.pod.getSpec().getVolumes().asScala.find(v => v.getName.equals(name)).get + case false => +// Create new emptyDir volume +new VolumeBuilder() + .withName(name) + .withNewEmptyDir() +.withMedium(useLocalDirTmpFs match { + case true => "Memory" // Use tmpfs + case false => null// Default - use nodes backing storage +}) + .endEmptyDir() + .build() +} } + val localDirVolumeMounts = localDirVolumes .zip(resolvedLocalDirs) .map { case (localDirVolume, localDirPath) => -new VolumeMountBuilder() - .withName(localDirVolume.getName) - .withMountPath(localDirPath) - .build() +hasVolumeMount(pod, localDirVolume.getName, localDirPath) match { + case true => +// For pre-existing volume mounts just re-use the mount +pod.container.getVolumeMounts().asScala + .find(m => m.getName.equals(localDirVolume.getName) + && m.getMountPath.equals(localDirPath)) + .get + case false => +// Create new volume mount +new VolumeMountBuilder() + .withName (localDirVolume.getName) + .withMountPath (localDirPath) + .build() +} + } + +// Check for conflicting volume mounts +for (m: VolumeMount <- localDirVolumeMounts) { + if (hasConflictingVolumeMount(pod, m.getName, m.getMountPath).size > 0) { +throw new SparkException(s"Conflicting volume mounts defined, pod template attempted to " + + "mount SPARK_LOCAL_DIRS volume ${m.getName} multiple times or at an alternative path " + + "then the expected ${m.getPath}") } +} + val podWithLocalDirVolumes = new PodBuilder(pod.pod) .editSpec() -.addToVolumes(localDirVolumes: _*) + // Don't want to re-add volumes that already existed in the incoming spec + // as duplicate definitions will lead to K8S API errors +.addToVolumes(localDirVolumes.filter(v => !hasVolume(pod, v.getName)): _*) --- End diff -- > This is a stance that as far I'm aware, we specifically chose not to take in the pod template feature. If one is using the pod template feature then Spark won't provide any guarantees that the pod it makes will be well-formed. When spark submit deploys the pod to the cluster the API will return a clear enough error informing the user to make the appropriate corrections to their pod template. Sure, but we still need to be realistic about how the template feature will be used. It is supposed to enable power users to customise the pods for their environments. If there is an area like this where there is a clear use case to allow customisation we should be enabling that rather than saying sorry we're going to generate invalid pods regardless. Obviously the power user is assuming the risk of creating a pod template that meaningfully combines with Sparks generated pod to yield a valid runtime environment. Clearly my stance here is controversial and likely needs a broader discussion on the dev list. I can reduce this PR to just the config to enable `tmpfs
[GitHub] spark issue #22256: [SPARK-25262][K8S] Better support configurability of Spa...
Github user rvesse commented on the issue: https://github.com/apache/spark/pull/22256 @skonto I haven't done anything specific for the size limit ATM. From the K8S docs `tmpfs` backed `emptyDir` usage counts towards your containers memory limits so you can just set `spark.executor.memory` to a larger amount as needed. From the discussion you linked you can explicitly set a size limit for the volume but I wanted to avoid adding multiple configuration options if possible. Since this PR allows for template defined volumes to be used you could define a volume in your template with the `sizeLimit` applied to it once you have pod templates PR available --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22256: [SPARK-25262][K8S] Better support configurability...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22256#discussion_r213954156 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala --- @@ -37,41 +40,99 @@ private[spark] class LocalDirsFeatureStep( .orElse(conf.getOption("spark.local.dir")) .getOrElse(defaultLocalDir) .split(",") + private val useLocalDirTmpFs = conf.get(KUBERNETES_LOCAL_DIRS_TMPFS) override def configurePod(pod: SparkPod): SparkPod = { val localDirVolumes = resolvedLocalDirs .zipWithIndex .map { case (localDir, index) => -new VolumeBuilder() - .withName(s"spark-local-dir-${index + 1}") - .withNewEmptyDir() - .endEmptyDir() - .build() +val name = s"spark-local-dir-${index + 1}" +// To allow customisation of local dirs backing volumes we should avoid creating +// emptyDir volumes if the volume is already defined in the pod spec +hasVolume(pod, name) match { + case true => +// For pre-existing volume definitions just re-use the volume +pod.pod.getSpec().getVolumes().asScala.find(v => v.getName.equals(name)).get + case false => +// Create new emptyDir volume +new VolumeBuilder() + .withName(name) + .withNewEmptyDir() +.withMedium(useLocalDirTmpFs match { + case true => "Memory" // Use tmpfs + case false => null// Default - use nodes backing storage +}) + .endEmptyDir() + .build() +} } + val localDirVolumeMounts = localDirVolumes .zip(resolvedLocalDirs) .map { case (localDirVolume, localDirPath) => -new VolumeMountBuilder() - .withName(localDirVolume.getName) - .withMountPath(localDirPath) - .build() +hasVolumeMount(pod, localDirVolume.getName, localDirPath) match { + case true => +// For pre-existing volume mounts just re-use the mount +pod.container.getVolumeMounts().asScala + .find(m => m.getName.equals(localDirVolume.getName) + && m.getMountPath.equals(localDirPath)) + .get + case false => +// Create new volume mount +new VolumeMountBuilder() + .withName (localDirVolume.getName) + .withMountPath (localDirPath) + .build() +} + } + +// Check for conflicting volume mounts +for (m: VolumeMount <- localDirVolumeMounts) { + if (hasConflictingVolumeMount(pod, m.getName, m.getMountPath).size > 0) { +throw new SparkException(s"Conflicting volume mounts defined, pod template attempted to " + + "mount SPARK_LOCAL_DIRS volume ${m.getName} multiple times or at an alternative path " + + "then the expected ${m.getPath}") } +} + val podWithLocalDirVolumes = new PodBuilder(pod.pod) .editSpec() -.addToVolumes(localDirVolumes: _*) + // Don't want to re-add volumes that already existed in the incoming spec + // as duplicate definitions will lead to K8S API errors +.addToVolumes(localDirVolumes.filter(v => !hasVolume(pod, v.getName)): _*) --- End diff -- The implementation is still additive in that it will add to existing elements in the pod spec as needed but respect what is already present. If your pod spec contains duplicate volumes/volume mounts then K8S will reject it as invalid e.g. ``` The Pod "rvesse-test" is invalid: spec.volumes[1].name: Duplicate value: "spark-local-dirs-1" ``` Therefore it is necessary to explicitly avoid duplicating things already present in the template If the aim is to replace adding further config options with the pod template feature then the existing builders do need to be more intelligent in what they do to avoid generating invalid pod specs. This is regardless of whether the template feature is opinionated about validation, even if the template feature doesn't do validation, Spark code itself should be ensuring that it generates valid specs as far as it is able to. Obviously it can't detect every possible invalid spec that it might generate if the templates aren't being validated but it can avoid introducing easily avoidable i