[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-04 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238782502
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty)
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isDefined) s"context 
${kubeContext.getOrElse("?")}" else "current context"}" +
--- End diff --

When I was testing this again today I was getting an error from just doing 
a plain `get` because for some reason the code was taking the first branch even 
though `kubeContext` was `None` so I changed to using an explicit `isDefined` 
instead of negating `isEmpty` and added `getOrElse` with a fallback just to 
avoid that


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-04 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238779965
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => 
StringUtils.isNotBlank(c))
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else 
"current context"}" +
+  s" from users K8S config file")
+
+// Start from an auto-configured config with the desired context
+// Fabric 8 uses null to indicate that the users current context 
should be used so if no
+// explicit setting pass null
+val config = new 
ConfigBuilder(autoConfigure(kubeContext.getOrElse(null)))
--- End diff --

Yes and I was referring to the K8S config file :) And yes the fact that we 
would propagate `spark.kubernetes.context` into the pod shouldn't be an issue 
because there won't be any K8S config file for it to interact with inside the 
pod as in-pod K8S config should be from the service account token that gets 
injected into the pod


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-03 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238484145
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => 
StringUtils.isNotBlank(c))
--- End diff --

Fixed in latest commit


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-03 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238483901
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => 
StringUtils.isNotBlank(c))
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else 
"current context"}" +
+  s" from users K8S config file")
+
+// Start from an auto-configured config with the desired context
+// Fabric 8 uses null to indicate that the users current context 
should be used so if no
+// explicit setting pass null
+val config = new 
ConfigBuilder(autoConfigure(kubeContext.getOrElse(null)))
--- End diff --

If the context does not exist then Fabric 8 falls back to other ways of 
auto-configuring itself (e.g. service account)

Fabric 8 skips any file based auto-configuration if there is no K8S config 
file present 
(https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/Config.java#L436-L459).

Since we don't propagate the submission clients config file into the driver 
pods no auto-configuration from config file will be attempted in the driver 
because there won't be a config file present.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-11-29 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r237447038
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -298,6 +298,16 @@ the Spark application.
 
 ## Kubernetes Features
 
+### Configuration File
+
+Your Kubernetes config file typically lives under `.kube/config` in your 
home directory or in a location specified by the `KUBECONFIG` environment 
variable.  Spark on Kubernetes will attempt to use this file to do an initial 
auto-configuration of the Kubernetes client used to interact with the 
Kubernetes cluster.  A variety of Spark configuration properties are provided 
that allow further customising the client configuration e.g. using an 
alternative authentication method.
--- End diff --

To be frank I'm not really sure why there are different config options for 
client vs cluster mode and this may have changed with some of the cleanup that 
@vanzin has been doing lately to simplify the configuration code.

Personally I have never needed to use any of the additional configuration 
properties in either client/cluster mode as the auto-configuration from my K8S 
config file has always been sufficient.  At worst I've needed to set 
`KUBECONFIG` to select the correct config file for the cluster I want to submit 
to.

Note that the core behaviour (the auto-configuration) has always existed 
implicitly in the K8S backend but was just not called out explicitly previously 
in the docs.  This PR primarily just makes it more explicit and flexible for 
users who have multiple contexts in their config files.

WRT `spark.master` Spark in general requires that to always be set and will 
use that to override whatever is present in the K8S config file regardless.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support

2018-11-29 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22904
  
@liyinan926 Any chance of rounding up some other folks to get this reviewed 
and merged?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23017: [SPARK-26015][K8S] Set a default UID for Spark on K8S Im...

2018-11-29 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/23017
  
Rebased to catch up with master and adapt for @vanzin's improvements to 
Docker build context from PR #23019 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support

2018-11-29 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22904
  
Rebased to bring up to date with master and adapt to @vanzin's changes from 
PR #23019 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23017: [SPARK-26015][K8S] Set a default UID for Spark on...

2018-11-29 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23017#discussion_r237441889
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -19,9 +19,9 @@ Please see [Spark Security](security.html) and the 
specific advice below before
 
 ## User Identity
 
-Images built from the project provided Dockerfiles do not contain any 
[`USER`](https://docs.docker.com/engine/reference/builder/#user) directives.  
This means that the resulting images will be running the Spark processes as 
`root` inside the container.  On unsecured clusters this may provide an attack 
vector for privilege escalation and container breakout.  Therefore security 
conscious deployments should consider providing custom images with `USER` 
directives specifying an unprivileged UID and GID.
+Images built from the project provided Dockerfiles contain a default 
[`USER`](https://docs.docker.com/engine/reference/builder/#user) directive with 
a default UID of `185`.  This means that the resulting images will be running 
the Spark processes as this UID inside the container. Security conscious 
deployments should consider providing custom images with `USER` directives 
specifying their desired unprivileged UID and GID.  The resulting UID should 
include the root group in its supplementary groups in order to be able to run 
the Spark executables.  Users building their own images with the provided 
`docker-image-tool.sh` script can use the `-u ` option to specify the 
desired UID.
--- End diff --

Yes you can, the `USER` directive allows you to specify both a UID and a 
GID by separating them with a `:`.  However as noted here changing the GID is 
likely to have side effects that are difficult for Spark as a project to deal 
with hence the note about ensuring your UID belongs to the `root` group.  If 
end users need more complex setups then they are free to create their own 
custom images.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23053: [SPARK-25957][K8S] Make building alternate language bind...

2018-11-21 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/23053
  
Probably also want to update `docs/running-on-kubernetes.md` to make it 
clear that you now have to opt into building the additional language bindings


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23053: [SPARK-25957][K8S] Add ability to skip building optional...

2018-11-20 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/23053
  
@ramaddepally Sorry, clearly not reading straight today!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23017: [SPARK-26015][K8S] Set a default UID for Spark on K8S Im...

2018-11-20 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/23017
  
Have now added the doc updates to this so think this is ready for final 
review and merging


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23017: [SPARK-26015][K8S] Set a default UID for Spark on...

2018-11-20 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23017#discussion_r234970716
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ClientModeTestsSuite.scala
 ---
@@ -17,13 +17,13 @@
 package org.apache.spark.deploy.k8s.integrationtest
 
 import org.scalatest.concurrent.Eventually
-import scala.collection.JavaConverters._
 
 import 
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.{k8sTestTag, 
INTERVAL, TIMEOUT}
+import org.scalatest.Tag
--- End diff --

I used tags just because that's what the existing scripting did.  I thought 
there was probably a Maven property to do the same but using tags was fastest 
path for me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23053: [SPARK-25957][K8S] Add ability to skip building o...

2018-11-20 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23053#discussion_r234968357
  
--- Diff: bin/docker-image-tool.sh ---
@@ -41,6 +41,18 @@ function image_ref {
   echo "$image"
 }
 
+function docker_push {
+  local image_name="$1"
+  if [ ! -z $(docker images -q "$(image_ref ${image_name})") ]; then
+docker push "$(image_ref ${image_name})"
+if [ $? -ne 0 ]; then
+  error "Failed to push $image_name Docker image."
--- End diff --

Script should really fail i.e. `exit` if a push fails


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23053: [SPARK-25957][K8S] Add ability to skip building o...

2018-11-20 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23053#discussion_r234967639
  
--- Diff: bin/docker-image-tool.sh ---
@@ -102,33 +114,37 @@ function build {
 error "Failed to build Spark JVM Docker image, please refer to Docker 
build output for details."
   fi
 
-  docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
--t $(image_ref spark-py) \
--f "$PYDOCKERFILE" .
-if [ $? -ne 0 ]; then
-  error "Failed to build PySpark Docker image, please refer to Docker 
build output for details."
+  if [ "${PYDOCKERFILE}" != "skip" ]; then
+docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
+  -t $(image_ref spark-py) \
+  -f "$PYDOCKERFILE" .
+  if [ $? -ne 0 ]; then
+error "Failed to build PySpark Docker image, please refer to 
Docker build output for details."
--- End diff --

Script should bail out here, i.e. `exit` if the `docker build` fails


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23053: [SPARK-25957][K8S] Add ability to skip building o...

2018-11-20 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23053#discussion_r234968048
  
--- Diff: bin/docker-image-tool.sh ---
@@ -41,6 +41,18 @@ function image_ref {
   echo "$image"
 }
 
+function docker_push {
+  local image_name="$1"
+  if [ ! -z $(docker images -q "$(image_ref ${image_name})") ]; then
+docker push "$(image_ref ${image_name})"
+if [ $? -ne 0 ]; then
+  error "Failed to push $image_name Docker image."
+fi
+  else
+echo "$(image_ref ${image_name}) image not found. Skipping push for 
this image."
--- End diff --

Currently the script could incorrectly go down this path if an image build 
fails because it doesn't exit in that case


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23053: [SPARK-25957][K8S] Add ability to skip building o...

2018-11-20 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23053#discussion_r234968130
  
--- Diff: bin/docker-image-tool.sh ---
@@ -102,33 +114,37 @@ function build {
 error "Failed to build Spark JVM Docker image, please refer to Docker 
build output for details."
   fi
 
-  docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
--t $(image_ref spark-py) \
--f "$PYDOCKERFILE" .
-if [ $? -ne 0 ]; then
-  error "Failed to build PySpark Docker image, please refer to Docker 
build output for details."
+  if [ "${PYDOCKERFILE}" != "skip" ]; then
+docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
+  -t $(image_ref spark-py) \
+  -f "$PYDOCKERFILE" .
+  if [ $? -ne 0 ]; then
+error "Failed to build PySpark Docker image, please refer to 
Docker build output for details."
+  fi
+  else
+echo "Skipped building PySpark docker image."
+  fi
+
+  if [ "${RDOCKERFILE}" != "skip" ]; then
+if [ -d "${SPARK_HOME}/R/lib" ]; then
+  docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
+-t $(image_ref spark-r) \
+-f "$RDOCKERFILE" .
+  if [ $? -ne 0 ]; then
+error "Failed to build SparkR Docker image, please refer to Docker 
build output for details."
--- End diff --

Again, script should bail out i.e. `exit` if the image build fails


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23017: [SPARK-26015][K8S] Set a default UID for Spark on...

2018-11-16 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23017#discussion_r234146162
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile ---
@@ -53,5 +54,9 @@ COPY data /opt/spark/data
 ENV SPARK_HOME /opt/spark
 
 WORKDIR /opt/spark/work-dir
+RUN chmod g+w /opt/spark/work-dir
 
 ENTRYPOINT [ "/opt/entrypoint.sh" ]
+
+# Specify the User that the actual main process will run as
+USER ${spark_uid}
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23013: [SPARK-25023] More detailed security guidance for...

2018-11-16 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23013#discussion_r234144540
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -15,7 +15,19 @@ container images and entrypoints.**
 # Security
 
 Security in Spark is OFF by default. This could mean you are vulnerable to 
attack by default.
-Please see [Spark Security](security.html) and the specific security 
sections in this doc before running Spark.
+Please see [Spark Security](security.html) and the specific advice below 
before running Spark.
+
+## User Identity
+
+Images built from the project provided Dockerfiles do not contain any 
[`USER`](https://docs.docker.com/engine/reference/builder/#user) directives.  
This means that the resulting images will be running the Spark processes as 
`root` inside the container.  On unsecured clusters this may provide an attack 
vector for privilege escalation and container breakout.  Therefore security 
conscious deployments should consider providing custom images with `USER` 
directives specifying an unprivileged UID and GID.
--- End diff --

I would like this PR to go in first as this will also want back porting 
as-is to branch-2.4

Then in PR #23017 I will update the docs to explain the new defaults, how 
to customise etc. before that gets merged into master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spa...

2018-11-16 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23017#discussion_r234143917
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ClientModeTestsSuite.scala
 ---
@@ -17,13 +17,13 @@
 package org.apache.spark.deploy.k8s.integrationtest
 
 import org.scalatest.concurrent.Eventually
-import scala.collection.JavaConverters._
 
 import 
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.{k8sTestTag, 
INTERVAL, TIMEOUT}
+import org.scalatest.Tag
--- End diff --

As I was just wanting to debug this specific failing test it was useful to 
run it on its own repeatedly as I investigated the issue.  I backed out other 
changes I made to enable this (like not cleaning up the tests driver pod) so I 
will back this out as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23013: [SPARK-25023] More detailed security guidance for K8S

2018-11-15 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/23013
  
@srowen I'm happy with it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spark on K...

2018-11-15 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/23017
  
> noted test issue. let's kick off test though

@felixcheung This is now resolved, please kick off a retest when you get 
chance


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spark on K...

2018-11-15 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/23017
  
Resolved the issue with the client mode test.  The test itself was actually 
badly written in that it used the Spark images but overrode the entry point 
which avoided the logic that sets up the `/etc/passwd` entry for the container 
UID.  Without this running there is no home directory for the user so the Ivy 
setup fails as a result.  By changing the test to not override the entry point 
the test can run successfully.  Also made a couple of minor changes to this 
test to make it easier to debug.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spa...

2018-11-14 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23017#discussion_r233560448
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
@@ -30,6 +30,10 @@ set -e
 # If there is no passwd entry for the container UID, attempt to create one
 if [ -z "$uidentry" ] ; then
 if [ -w /etc/passwd ] ; then
+# TODO Should we allow providing an environment variable to set 
the desired username?
--- End diff --

@vanzin Well I am mostly following advice via @skonto from 
https://docs.okd.io/latest/creating_images/guidelines.html#use-uid which says:

> For an image to support running as an arbitrary user, directories and 
files that may be written to by processes in the image should be owned by the 
root group and be read/writable by that group. Files to be executed should also 
have group execute permissions.
>
>Adding the following to your Dockerfile sets the directory and file 
permissions to allow users in the root group to access them in the built image:
> ...
>Because the container user is always a member of the root group, the 
container user can read and write these files. The root group does not have any 
special permissions (unlike the root user) so there are no security concerns 
with this arrangement. In addition, the processes running in the container must 
not listen on privileged ports (ports below 1024), since they are not running 
as a privileged user.

Which is also where this code snippet/pattern originates from the folks who 
originally developed the container images for Spark on K8S


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spa...

2018-11-14 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/23017#discussion_r233385383
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
@@ -30,6 +30,10 @@ set -e
 # If there is no passwd entry for the container UID, attempt to create one
 if [ -z "$uidentry" ] ; then
 if [ -w /etc/passwd ] ; then
+# TODO Should we allow providing an environment variable to set 
the desired username?
--- End diff --

Not the case.  `USER` directives take effect at the point at which they 
occur in the `Dockerfile`, so since the directive is placed after the 
`ENTRYPOINT` directive the entry point script still runs as `root` allowing 
adding UID entries and any other privileged setup operations that are needed 
prior to the containers main process running and having its UID set 
appropriately

If you manually drop into the resulting container you can see that your UID 
is the specified non-privileged UID but that an `/etc/passwd` entry was 
successfully added for you by the entry point script


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23013: [SPARK-25023] More detailed security guidance for K8S

2018-11-13 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/23013
  
@tgravescs Re: Point 1 I have a separate PR #22904 which makes some 
improvements to the docs around that point


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23013: [SPARK-25023] More detailed security guidance for K8S

2018-11-13 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/23013
  
@mccheah I have tried to keep it minimal and just point to the official K8S 
docs.  Obviously there is a balance to be had between high level warnings and 
detailed advice.  K8S is still a relatively new technology to the wider 
industry regardless of its maturity as a whole so we should be highlighting the 
obvious pitfalls.

@tgravescs 

1. K8S API defaults to SSL by default, primarily the Fabric 8 library 
auto-configures itself from the users K8S config file which will typically set 
up all the necessary authentication.  All the various authentication properties 
are used to override that initial auto-configuration and are unnecessary for 
most users.
2. Not sure
3. I haven't used client mode much so can't comment
4. Kerberos is already covered in more details in the main security doc


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spark on K...

2018-11-12 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/23017
  
For those with more knowledge of client mode here is the specific error 
seen in the integration tests:

```
Exception in thread "main" java.lang.IllegalArgumentException: basedir must 
be absolute: ?/.ivy2/local
at org.apache.ivy.util.Checks.checkAbsolute(Checks.java:48)
at 
org.apache.ivy.plugins.repository.file.FileRepository.setBaseDir(FileRepository.java:135)
at 
org.apache.ivy.plugins.repository.file.FileRepository.(FileRepository.java:44)
at 
org.apache.spark.deploy.SparkSubmitUtils$.createRepoResolvers(SparkSubmit.scala:1063)
at 
org.apache.spark.deploy.SparkSubmitUtils$.buildIvySettings(SparkSubmit.scala:1149)
at 
org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:51)
at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:315)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```

This looks to me like Spark/Ivy is discovering the users home directory 
incorrectly.  Have done some digging into the code paths but have not spotted 
what exactly is wrong yet.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23017: [WIP][SPARK-26015][K8S] Set a default UID for Spa...

2018-11-12 Thread rvesse
GitHub user rvesse opened a pull request:

https://github.com/apache/spark/pull/23017

[WIP][SPARK-26015][K8S] Set a default UID for Spark on K8S Images

## What changes were proposed in this pull request?

Adds USER directives to the Dockerfiles which is configurable via build 
argument (`spark_uid`) for easy customisation.  A `-u` flag is added to 
`bin/docker-image-tool.sh` to make it easy to customise this e.g.

```
> bin/docker-image-tool.sh -r rvesse -t uid -u 185 build
> bin/docker-image-tool.sh -r rvesse -t uid push
```

If no UID is explicitly specified it defaults to `185` - this is per 
@skonto's suggestion to align with the OpenShift standard reserved UID for Java 
apps ( 

https://lists.openshift.redhat.com/openshift-archives/users/2016-March/msg00283.html)

Notes:
- We have to make the `WORKDIR` writable by the root group or otherwise 
jobs will fail with `AccessDeniedException`
- We have to specify `HOME` to set it to the `WORKDIR` to avoid an Ivy 
error caused by `user.home` not having a valid value otherwise

To Do:
- [ ] Debug and resolve issue with client mode test
- [ ] Consider whether to always propagate `SPARK_USER_NAME` to environment 
of driver and executor pods so `entrypoint.sh` can insert that into 
`/etc/passwd` entry
- [ ] Rebase once PR #23013 is merged and update documentation accordingly

## How was this patch tested?

Built the Docker images with the new Dockerfiles that include the `USER` 
directives.  Ran the Spark on K8S integration tests against the new images.  
All pass except client mode which I am currently debugging further.

Also manually dropped myself into the resulting container images via 
`docker run` and checked `id -u` output to see that UID is as expected.

Tried customising the UID from the default via the new `-u` argument to 
`docker-image-tool.sh` and again checked the resulting image for the correct 
runtime UID.

cc @felixcheung @skonto @vanzin 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rvesse/spark SPARK-26015

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23017.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23017


commit 92e22f76bbe61035dcde54c5fb2990cb7099a98d
Author: Rob Vesse 
Date:   2018-11-12T13:44:03Z

[SPARK-26015][K8S] Set a default UID for Spark on K8S Images

Adds USER directives to the Dockerfiles which is configurable via build
argument for easy customisation.  A -u flag is added to
bin/docker-image-tool.sh to make it easy to customise this.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23013: [SPARK-25023] More detailed security guidance for...

2018-11-12 Thread rvesse
GitHub user rvesse opened a pull request:

https://github.com/apache/spark/pull/23013

[SPARK-25023] More detailed security guidance for K8S

## What changes were proposed in this pull request?

Highlights specific security issues to be aware of with Spark on K8S and 
recommends K8S mechanisms that should be used to secure clusters.

## How was this patch tested?

N/A - Documentation only

CC @felixcheung @tgravescs @skonto 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rvesse/spark SPARK-25023

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23013.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23013


commit 5f2fe6ef686f0efb7d7b369008b7cdc9a7e8e8a2
Author: Rob Vesse 
Date:   2018-11-12T10:20:14Z

[SPARK-25023] More detailed security guidance for K8S

Highlights specific security issues to be aware of with Spark on K8S and
recommends K8S mechanisms that should be used to secure clusters.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22959: [SPARK-25876][k8s] Simplify kubernetes configurat...

2018-11-08 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22959#discussion_r231838596
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ---
@@ -112,125 +72,139 @@ private[spark] case class KubernetesConf[T <: 
KubernetesRoleSpecificConf](
   def getOption(key: String): Option[String] = sparkConf.getOption(key)
 }
 
+private[spark] class KubernetesDriverConf(
+sparkConf: SparkConf,
+val appId: String,
+val mainAppResource: MainAppResource,
+val mainClass: String,
+val appArgs: Array[String],
+val pyFiles: Seq[String])
+  extends KubernetesConf(sparkConf) {
+
+  override val resourceNamePrefix: String = {
+val custom = if (Utils.isTesting) 
get(KUBERNETES_DRIVER_POD_NAME_PREFIX) else None
+custom.getOrElse(KubernetesConf.getResourceNamePrefix(appName))
+  }
+
+  override def labels: Map[String, String] = {
+val presetLabels = Map(
+  SPARK_APP_ID_LABEL -> appId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+val driverCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs(
+  sparkConf, KUBERNETES_DRIVER_LABEL_PREFIX)
+
+presetLabels.keys.foreach { key =>
+  require(
+!driverCustomLabels.contains(key),
+s"Label with key $key is not allowed as it is reserved for Spark 
bookkeeping operations.")
+}
+
+driverCustomLabels ++ presetLabels
+  }
+
+  override def environment: Map[String, String] = {
+KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, 
KUBERNETES_DRIVER_ENV_PREFIX)
+  }
+
+  override def annotations: Map[String, String] = {
+KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, 
KUBERNETES_DRIVER_ANNOTATION_PREFIX)
+  }
+
+  override def secretNamesToMountPaths: Map[String, String] = {
+KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, 
KUBERNETES_DRIVER_SECRETS_PREFIX)
+  }
+
+  override def secretEnvNamesToKeyRefs: Map[String, String] = {
+KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, 
KUBERNETES_DRIVER_SECRET_KEY_REF_PREFIX)
+  }
+
+  override def volumes: Seq[KubernetesVolumeSpec] = {
+KubernetesVolumeUtils.parseVolumesWithPrefix(sparkConf, 
KUBERNETES_DRIVER_VOLUMES_PREFIX)
+  }
+}
+
+private[spark] class KubernetesExecutorConf(
+sparkConf: SparkConf,
+val appId: String,
+val executorId: String,
+val driverPod: Option[Pod])
+  extends KubernetesConf(sparkConf) {
+
+  override val resourceNamePrefix: String = {
+get(KUBERNETES_EXECUTOR_POD_NAME_PREFIX).getOrElse(
+  KubernetesConf.getResourceNamePrefix(appName))
+  }
+
+  override def labels: Map[String, String] = {
+val presetLabels = Map(
+  SPARK_EXECUTOR_ID_LABEL -> executorId,
+  SPARK_APP_ID_LABEL -> appId,
+  SPARK_ROLE_LABEL -> SPARK_POD_EXECUTOR_ROLE)
+
+val executorCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs(
+  sparkConf, KUBERNETES_EXECUTOR_LABEL_PREFIX)
+
+presetLabels.keys.foreach { key =>
+  require(
+!executorCustomLabels.contains(key),
+s"Custom executor labels cannot contain $key as it is reserved for 
Spark.")
+}
+
+executorCustomLabels ++ presetLabels
+  }
+
+  override def environment: Map[String, String] = 
sparkConf.getExecutorEnv.toMap
+
+  override def annotations: Map[String, String] = {
+KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, 
KUBERNETES_EXECUTOR_ANNOTATION_PREFIX)
+  }
+
+  override def secretNamesToMountPaths: Map[String, String] = {
+KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, 
KUBERNETES_EXECUTOR_SECRETS_PREFIX)
+  }
+
+  override def secretEnvNamesToKeyRefs: Map[String, String] = {
+KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, 
KUBERNETES_EXECUTOR_SECRET_KEY_REF_PREFIX)
+  }
+
+  override def volumes: Seq[KubernetesVolumeSpec] = {
+KubernetesVolumeUtils.parseVolumesWithPrefix(sparkConf, 
KUBERNETES_EXECUTOR_VOLUMES_PREFIX)
+  }
+
+}
+
 private[spark] object KubernetesConf {
   def createDriverConf(
   sparkConf: SparkConf,
-  appName: String,
-  appResourceNamePrefix: String,
   appId: String,
   mainAppResource: MainAppResource,
   mainClass: String,
   appArgs: Array[String],
-  maybePyFiles: Option[String],
-  hadoopConfDir: Option[String]): 
KubernetesConf[KubernetesDriverSpecificConf] = {
-val driverCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs(
-  sparkConf, KUBE

[GitHub] spark issue #22959: [SPARK-25876][k8s] Simplify kubernetes configuration typ...

2018-11-07 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22959
  
First glance this looks like a lot of nice simplification, will take a 
proper look over this tomorrow


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22911: [SPARK-25815][k8s] Support kerberos in client mod...

2018-11-06 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22911#discussion_r231195413
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStep.scala
 ---
@@ -126,20 +134,53 @@ private[spark] class KerberosConfDriverFeatureStep(
   HadoopBootstrapUtil.bootstrapSparkUserPod(
 kubeTokenManager.getCurrentUser.getShortUserName,
 hadoopBasedSparkPod))
+
+if (keytab.isDefined) {
+  val podWitKeytab = new PodBuilder(kerberizedPod.pod)
--- End diff --

Typo - `Wit` -> `With`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support

2018-11-06 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22904
  
@mccheah I have made the requested changes, can I get another review please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-11-02 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r230318413
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -23,6 +23,18 @@ import org.apache.spark.internal.config.ConfigBuilder
 
 private[spark] object Config extends Logging {
 
+  val KUBERNETES_CONTEXT =
+ConfigBuilder("spark.kubernetes.context")
+  .doc("The desired context from your K8S config file used to 
configure the K8S " +
+"client for interacting with the cluster.  Useful if your config 
file has " +
+"multiple clusters or user identities defined.  The client library 
used " +
+"locates the config file via the KUBECONFIG environment variable 
or by defaulting " +
+"to .kube/config under your home directory.  If not specified then 
your current " +
+"context is used.  You can always override specific aspects of the 
config file " +
+"provided configuration using other Spark on K8S configuration 
options.")
+  .stringConf
+  .createWithDefault(null)
--- End diff --

Refactored to use `Option`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...

2018-11-01 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22805
  
@mccheah I have updated the comment to reference the follow up issue and 
opened a PR for that as #22904.  Can we go ahead and merge now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-10-31 Thread rvesse
GitHub user rvesse opened a pull request:

https://github.com/apache/spark/pull/22904

[SPARK-25887][K8S] Configurable K8S context support

## What changes were proposed in this pull request?

This enhancement allows for specifying the desired context to use for
the initial K8S client auto-configuration.  This allows users to more
easily access alternative K8S contexts without having to first
explicitly change their current context via kubectl.

## How was this patch tested?

Explicitly set my K8S context to a context pointing to a non-existent 
cluster, then launched Spark jobs with explicitly specified contexts via the 
new `spark.kubernetes.context` configuration property.

Example Output:

```
> kubectl config current-context
minikube
> minikube status
minikube: Stopped
cluster: 
kubectl: 
> ./spark-submit --master k8s://https://localhost:6443 --deploy-mode 
cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
spark.executor.instances=2 --conf spark.kubernetes.context=docker-for-desktop 
--conf spark.kubernetes.container.image=rvesse/spark:debian 
local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar 4
18/10/31 11:57:51 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
18/10/31 11:57:51 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using context docker-for-desktop from users K8S config file
18/10/31 11:57:52 INFO LoggingPodStatusWatcherImpl: State changed, new 
state: 
 pod name: spark-pi-1540987071845-driver
 namespace: default
 labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, 
spark-role -> driver
 pod uid: 32462cac-dd04-11e8-b6c6-0251
 creation time: 2018-10-31T11:57:52Z
 service account name: default
 volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv
 node name: N/A
 start time: N/A
 phase: Pending
 container status: N/A
18/10/31 11:57:52 INFO LoggingPodStatusWatcherImpl: State changed, new 
state: 
 pod name: spark-pi-1540987071845-driver
 namespace: default
 labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, 
spark-role -> driver
 pod uid: 32462cac-dd04-11e8-b6c6-0251
 creation time: 2018-10-31T11:57:52Z
 service account name: default
 volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv
 node name: docker-for-desktop
 start time: N/A
 phase: Pending
 container status: N/A
...
18/10/31 11:58:03 INFO LoggingPodStatusWatcherImpl: State changed, new 
state: 
 pod name: spark-pi-1540987071845-driver
 namespace: default
 labels: spark-app-selector -> spark-2c4abc226ed3415986eb602bd13f3582, 
spark-role -> driver
 pod uid: 32462cac-dd04-11e8-b6c6-0251
 creation time: 2018-10-31T11:57:52Z
 service account name: default
 volumes: spark-local-dir-1, spark-conf-volume, default-token-glpfv
 node name: docker-for-desktop
 start time: 2018-10-31T11:57:52Z
 phase: Succeeded
 container status: 
 container name: spark-kubernetes-driver
     container image: rvesse/spark:debian
 container state: terminated
 container started at: 2018-10-31T11:57:54Z
 container finished at: 2018-10-31T11:58:02Z
 exit code: 0
 termination reason: Completed
```

Without the `spark.kubernetes.context` setting this will fail because the 
current context - `minikube` - is pointing to a non-running cluster e.g.

```
> ./spark-submit --master k8s://https://localhost:6443 --deploy-mode 
cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
spark.executor.instances=2 --conf 
spark.kubernetes.container.image=rvesse/spark:debian 
local:///opt/spark/examples/jars/spark-examples_2.11-3.0.0-SNAPSHOT.jar 4
18/10/31 12:02:30 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
18/10/31 12:02:30 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file
18/10/31 12:02:31 WARN WatchConnectionManager: Exec Failure
javax.net.ssl.SSLHandshakeException: 
sun.security.validator.ValidatorException: PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshak

[GitHub] spark pull request #22805: [SPARK-25809][K8S][TEST] New K8S integration test...

2018-10-31 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r229630011
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -63,6 +66,8 @@ private[spark] object SparkKubernetesClientFactory {
   
.getOption(s"$kubernetesAuthConfPrefix.$CLIENT_CERT_FILE_CONF_SUFFIX")
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
+
+// TODO Create builder in a way that respects configurable context
--- End diff --

SPARK-25887, I'll update the comment to reference it explicitly


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [SPARK-25809][K8S][TEST] New K8S integration testing bac...

2018-10-30 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22805
  
Did a bunch more testing on our internal K8S clusters today after rebasing 
this onto master. I am now happy that this is ready for final review and 
merging so I have removed the `[WIP]` tag


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-30 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r229400780
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -42,6 +42,9 @@ private[spark] object SparkKubernetesClientFactory {
   sparkConf: SparkConf,
   defaultServiceAccountToken: Option[File],
   defaultServiceAccountCaCert: Option[File]): KubernetesClient = {
+
+// TODO Support configurable context
+
--- End diff --

Note to reviewers.  In testing this I noticed one issue is that when you 
have multiple contexts defined in your `KUBECONFIG` file that while this PR 
adds the ability for the integration test machinery to use an arbitrary context 
the submission client will always use your current context.  Therefore if your 
current context does not match the requested context the integration tests will 
fail/return false positives because the client and the test harness will use 
different contexts.

I have filed SPARK-25887 for addressing this since the client change seemed 
like it should be a separate issue and PR.  I will provide a PR for that 
tomorrow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-30 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r229395036
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/IntegrationTestBackend.scala
 ---
@@ -28,16 +30,15 @@ private[spark] trait IntegrationTestBackend {
 }
 
 private[spark] object IntegrationTestBackendFactory {
-  val deployModeConfigKey = "spark.kubernetes.test.deployMode"
-
   def getTestBackend: IntegrationTestBackend = {
-val deployMode = Option(System.getProperty(deployModeConfigKey))
-  .getOrElse("minikube")
-if (deployMode == "minikube") {
-  MinikubeTestBackend
-} else {
-  throw new IllegalArgumentException(
-"Invalid " + deployModeConfigKey + ": " + deployMode)
+val deployMode = Option(System.getProperty(CONFIG_KEY_DEPLOY_MODE))
+  .getOrElse(BACKEND_MINIKUBE)
+deployMode match {
+  case BACKEND_MINIKUBE => MinikubeTestBackend
+  case BACKEND_CLOUD => new 
KubeConfigBackend(System.getProperty(CONFIG_KEY_KUBE_CONFIG_CONTEXT))
--- End diff --

The Fabric 8 client treats `null` as meaning use the users current context 
from their `KUBECONFIG` file - 
https://github.com/fabric8io/kubernetes-client/blob/v4.1.0/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/Config.java#L467


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-26 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r228470066
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/Utils.scala
 ---
@@ -27,4 +27,36 @@ object Utils extends Logging {
 val resource = createResource
 try f.apply(resource) finally resource.close()
   }
+
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
--- End diff --

Now fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-26 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r228470004
  
--- Diff: 
resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh
 ---
@@ -71,19 +71,36 @@ if [[ $IMAGE_TAG == "N/A" ]];
 then
   IMAGE_TAG=$(uuidgen);
   cd $UNPACKED_SPARK_TGZ
-  if [[ $DEPLOY_MODE == cloud ]] ;
-  then
-$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t 
$IMAGE_TAG build
-if  [[ $IMAGE_REPO == gcr.io* ]] ;
-then
-  gcloud docker -- push $IMAGE_REPO/spark:$IMAGE_TAG
-else
-  $UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t 
$IMAGE_TAG push
-fi
-  else
-# -m option for minikube.
-$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -m -r $IMAGE_REPO -t 
$IMAGE_TAG build
-  fi
+
+  case $DEPLOY_MODE in
+cloud|cloud-url)
+  # Build images
+  $UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t 
$IMAGE_TAG build
+
+  # Push images appropriately
+  if [[ $IMAGE_REPO == gcr.io* ]] ;
+  then
+gcloud docker -- push $IMAGE_REPO/spark:$IMAGE_TAG
+  else
+$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t 
$IMAGE_TAG push
+  fi
+  ;;
+
+ docker-for-desktop)
--- End diff --

Now corrected


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-26 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r228469510
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/Utils.scala
 ---
@@ -27,4 +27,36 @@ object Utils extends Logging {
 val resource = createResource
 try f.apply(resource) finally resource.close()
   }
+
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
--- End diff --

Yes, will fix in next commit


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-26 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r228467937
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/IntegrationTestBackend.scala
 ---
@@ -28,16 +30,16 @@ private[spark] trait IntegrationTestBackend {
 }
 
 private[spark] object IntegrationTestBackendFactory {
-  val deployModeConfigKey = "spark.kubernetes.test.deployMode"
-
   def getTestBackend: IntegrationTestBackend = {
-val deployMode = Option(System.getProperty(deployModeConfigKey))
-  .getOrElse("minikube")
-if (deployMode == "minikube") {
-  MinikubeTestBackend
-} else {
-  throw new IllegalArgumentException(
-"Invalid " + deployModeConfigKey + ": " + deployMode)
+val deployMode = Option(System.getProperty(CONFIG_KEY_DEPLOY_MODE))
+  .getOrElse(BACKEND_MINIKUBE)
+deployMode match {
+  case BACKEND_MINIKUBE => MinikubeTestBackend
+  case BACKEND_CLOUD => new KubeConfigBackend(null)
--- End diff --

Yes, done this in the latest commit


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-26 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r228467705
  
--- Diff: resource-managers/kubernetes/integration-tests/README.md ---
@@ -41,12 +71,127 @@ The Spark code to test is handed to the integration 
test system via a tarball. H
 
 * `--spark-tgz ` - set `` to point to a tarball 
containing the Spark distribution to test.
 
-TODO: Don't require the packaging of the built Spark artifacts into this 
tarball, just read them out of the current tree.
+This Tarball should be created by first running `dev/make-distribution.sh` 
passing the `--tgz` flag and `-Pkubernetes` as one of the
+options to ensure that Kubernetes support is included in the distribution. 
 For more details on building a runnable distribution please
+see the [Building 
Spark](https://spark.apache.org/docs/latest/building-spark.html#building-a-runnable-distribution)
 documentation.
+
+**TODO:** Don't require the packaging of the built Spark artifacts into 
this tarball, just read them out of the current tree.
 
 ## Customizing the Namespace and Service Account
 
-* `--namespace ` - set `` to the namespace in which 
the tests should be run.
-* `--service-account ` - set `` to the name of the Kubernetes service account to
-use in the namespace specified by the `--namespace`. The service account 
is expected to have permissions to get, list, watch,
+If no namespace is specified then a temporary namespace will be created 
and deleted during the test run.  Similarly if no service
+account is specified then the `default` service account for the namespace 
will be used.
+
+Using the `--namespace ` flag sets `` to the 
namespace in which the tests should be run.  If this is supplied 
+then the tests assume this namespace exists in the K8S cluster and will 
not attempt to create it.  Additionally this namespace must 
+have an appropriately authorized service account which can be customised 
via the `--service-account` flag.
+
+The `--service-account ` flag sets `` to the name of the Kubernetes service account to
+use in the namespace specified by the `--namespace` flag. The service 
account is expected to have permissions to get, list, watch,
 and create pods. For clusters with RBAC turned on, it's important that the 
right permissions are granted to the service account
 in the namespace through an appropriate role and role binding. A reference 
RBAC configuration is provided in `dev/spark-rbac.yaml`.
+
+# Running the Test Directly
+
+If you prefer to run just the integration tests directly then you can 
customise the behaviour via properties passed to Maven using the
+`-Dproperty=value` option e.g.
--- End diff --

Fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-26 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r228467766
  
--- Diff: resource-managers/kubernetes/integration-tests/README.md ---
@@ -13,15 +13,45 @@ The simplest way to run the integration tests is to 
install and run Minikube, th
 dev/dev-run-integration-tests.sh
 
 The minimum tested version of Minikube is 0.23.0. The kube-dns addon must 
be enabled. Minikube should
-run with a minimum of 3 CPUs and 4G of memory:
+run with a minimum of 4 CPUs and 6G of memory:
 
-minikube start --cpus 3 --memory 4096
+minikube start --cpus 4 --memory 6144
 
 You can download Minikube 
[here](https://github.com/kubernetes/minikube/releases).
 
 # Integration test customization
 
-Configuration of the integration test runtime is done through passing 
different arguments to the test script. The main useful options are outlined 
below.
+Configuration of the integration test runtime is done through passing 
different arguments to the test script. 
+The main useful options are outlined below.
+
+## Using a different backend
+
+The integration test backend i.e. the K8S cluster used for testing is 
controlled by the `--deploy-mode` option.  By default this
+is set to `minikube`, the available backends are their perquisites are as 
follows.
--- End diff --

Fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-26 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r228467591
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/cloud/KubeConfigBackend.scala
 ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.integrationtest.backend.cloud
+
+import java.nio.file.Paths
+
+import io.fabric8.kubernetes.client.{Config, DefaultKubernetesClient}
+import org.apache.spark.deploy.k8s.integrationtest.TestConstants
+import 
org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend
+import org.apache.spark.internal.Logging
+
+private[spark] class KubeConfigBackend(var context: String)
+  extends IntegrationTestBackend with Logging {
+  // If no context supplied see if one was specified in the system 
properties supplied
+  // to the tests
+  if (context == null) {
+context = 
System.getProperty(TestConstants.CONFIG_KEY_KUBE_CONFIG_CONTEXT)
+  }
+  logInfo(s"K8S Integration tests will run against " +
+s"${if (context != null) s"context ${context}" else "default context"} 
" +
+s" from users K8S config file")
+
+  private var defaultClient: DefaultKubernetesClient = _
+
+  override def initialize(): Unit = {
+// Auto-configure K8S client from K8S config file
+
System.setProperty(Config.KUBERNETES_AUTH_TRYKUBECONFIG_SYSTEM_PROPERTY, 
"true");
--- End diff --

Looked over what Fabric 8 client was doing behind the scenes and realised 
the values I was setting explicitly are the defaults anyway so removed this 
unnecessary manipulation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-26 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r228467650
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/cloud/KubeConfigBackend.scala
 ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.integrationtest.backend.cloud
+
+import java.nio.file.Paths
+
+import io.fabric8.kubernetes.client.{Config, DefaultKubernetesClient}
+import org.apache.spark.deploy.k8s.integrationtest.TestConstants
+import 
org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend
+import org.apache.spark.internal.Logging
+
+private[spark] class KubeConfigBackend(var context: String)
--- End diff --

Refactored to avoid this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration testin...

2018-10-26 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22805
  
@liyinan926 I will rebase and squash appropriately once PR #22820 is merged


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client vers...

2018-10-26 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22820#discussion_r228458947
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ---
@@ -157,7 +157,9 @@ private[spark] object KubernetesUtils {
   }.getOrElse(Seq(("container state", "N/A")))
   }
 
-  def formatTime(time: Time): String = {
-if (time != null) time.getTime else "N/A"
+  def formatTime(time: String): String = {
+// TODO: Investigate form `time` comes in as there was a DataType
+// change from Time to String
--- End diff --

e.g. a sample from the logs from my PR where I had made a similar change:

```
creation time: 2018-10-24T17:17:03Z
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22820: [SPARK-25828][K8S] Bumping Kubernetes-Client vers...

2018-10-26 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22820#discussion_r228458281
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ---
@@ -157,7 +157,9 @@ private[spark] object KubernetesUtils {
   }.getOrElse(Seq(("container state", "N/A")))
   }
 
-  def formatTime(time: Time): String = {
-if (time != null) time.getTime else "N/A"
+  def formatTime(time: String): String = {
+// TODO: Investigate form `time` comes in as there was a DataType
+// change from Time to String
--- End diff --

The string returned is ISO8601 format so no need to do anything special here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration testin...

2018-10-25 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22805
  
Ran successfully against one of our dev K8S clusters today:

![screen shot 2018-10-25 at 17 39 
40](https://user-images.githubusercontent.com/2104864/47516269-fd2e1480-d87c-11e8-80f9-2325edffe066.png)

Will continue onto testing against one of our larger production clusters


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration testin...

2018-10-24 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22805
  
@ifilonenko I have done the generalisation today since it was fairly 
trivial and it actually resolves a number of concerns about the first pass 
implementation

@skonto I have restored the cloud backend as requested, I ended up having 
two separate cloud backends.  One using the original code that simply takes a 
master URL and another that takes a desired K8S context and auto-configures 
itself from the K8S config file and the context specified.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-24 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r227835021
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/docker/DockerForDesktopBackend.scala
 ---
@@ -0,0 +1,68 @@
+package org.apache.spark.deploy.k8s.integrationtest.backend.docker
+
+import java.nio.file.Paths
+
+import io.fabric8.kubernetes.client.{Config, DefaultKubernetesClient}
+import org.apache.spark.deploy.k8s.integrationtest.ProcessUtils
+import 
org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend
+
+private[spark] object DockerForDesktopBackend extends 
IntegrationTestBackend {
--- End diff --

With the latest commits the only state maintained is the K8S client which 
is reusable and there is no longer any need to modify the on-disk state via 
`kubectl` by adopting a newer Fabric 8 client library version


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-24 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r227833846
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/docker/DockerForDesktopBackend.scala
 ---
@@ -0,0 +1,68 @@
+package org.apache.spark.deploy.k8s.integrationtest.backend.docker
--- End diff --

Fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration testin...

2018-10-23 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22805
  
@skonto Yep, I plan to do that tomorrow

At least for my `minikube` instance I found 4g insufficient and a couple of 
tests would fail because their pods didn't get scheduled.  8g is probably over 
allocating, somewhere in-between likely works also


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration testin...

2018-10-23 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22805
  
@srowen Yes there are a lot of assumptions made by the integration tests 
that are not documented anywhere and I figured out by digging in the code and 
POMs.

Broadly speaking right now to run integration tests you need the following:

- A runnable distribution of Spark built
- `minikube` installed, configured with at least 4 CPUs and 8 GB memory and 
started

Probably the next thing I will do tomorrow on this is to improve the 
documentation.  There is a `README.md` in the integration testing module but it 
omits various details I have encountered working with these recently.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-23 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22805#discussion_r227400699
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/docker/DockerForDesktopBackend.scala
 ---
@@ -0,0 +1,68 @@
+package org.apache.spark.deploy.k8s.integrationtest.backend.docker
+
+import java.nio.file.Paths
+
+import io.fabric8.kubernetes.client.{Config, DefaultKubernetesClient}
+import org.apache.spark.deploy.k8s.integrationtest.ProcessUtils
+import 
org.apache.spark.deploy.k8s.integrationtest.backend.IntegrationTestBackend
+
+private[spark] object DockerForDesktopBackend extends 
IntegrationTestBackend {
+
+  private val KUBECTL_STARTUP_TIMEOUT_SECONDS = 15
+
+  private var defaultClient: DefaultKubernetesClient = _
+  private var initialContext = ""
+
+  private def getCurrentContext: String = {
+val outputs = executeKubectl("config", "current-context")
+assert(outputs.size == 1, "Unexpected amount of output from kubectl 
config current-context")
+outputs.head
+  }
+
+  private def setContext(context: String): Unit = {
+val outputs = executeKubectl("config", "use-context", context)
+assert(outputs.size == 1, "Unexpected amount of output from kubectl 
config use-context")
+val errors = outputs.filter(_.startsWith("error"))
+assert(errors.size == 0, s"Received errors from kubectl: 
${errors.head}")
+  }
+
+  override def initialize(): Unit = {
+// Switch context if necessary
+// TODO: If we were using Fabric 8 client 3.1.0 then we could
+// instead just use the overload of autoConfigure() that takes the
+// desired context avoiding the need to interact with kubectl at all
+initialContext = getCurrentContext
--- End diff --

Would people be opposed to bumping the K8S client to 3.1.0 (or even a more 
recent 3.1.x release) as this would simplify the code significantly and avoid 
any need to shell out to `kubectl`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22805: [WIP][SPARK-25809][K8S][TEST] New K8S integration...

2018-10-23 Thread rvesse
GitHub user rvesse opened a pull request:

https://github.com/apache/spark/pull/22805

[WIP][SPARK-25809][K8S][TEST] New K8S integration testing backends

## What changes were proposed in this pull request?

Currently K8S integration tests are hardcoded to use a `minikube` based 
backend.  `minikube` is VM based so can be resource hungry and also doesn't 
cope well with certain networking setups (for example using Cisco AnyConnect 
software VPN `minikube` is unusable as it detects its own IP incorrectly).

This PR Adds a new K8S integration testing backend that allows for using 
the Kubernetes support in [Docker for 
Desktop](https://blog.docker.com/2018/07/kubernetes-is-now-available-in-docker-desktop-stable-channel/).
  This is WIP towards generalising to be able to run the integration tests 
against an arbitrary Kubernetes cluster.

To Do:

- [ ] General Kubernetes cluster backend
- [ ] Documentation on Kubernetes integration testing

## How was this patch tested?

Ran integration tests with Docker for Desktop and all passed:

![screen shot 2018-10-23 at 14 19 
56](https://user-images.githubusercontent.com/2104864/47363460-c5816a00-d6ce-11e8-9c15-56b34698e797.png)

Suggested Reviewers: @ifilonenko @srowen 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rvesse/spark SPARK-25809

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22805.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22805


commit ea3cfd670ec7a7fbb5bfe2b4df56afd67900df9b
Author: Rob Vesse 
Date:   2018-10-23T13:11:16Z

[SPARK-25809][K8S] Docker for desktop K8S integration backend

Adds a new K8S integration testing backend that allows for using the
Kubernetes support in Docker for Desktop




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...

2018-10-23 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22782#discussion_r227373891
  
--- Diff: bin/docker-image-tool.sh ---
@@ -79,7 +79,7 @@ function build {
   fi
 
   # Verify that Spark has actually been built/is a runnable distribution
-  # i.e. the Spark JARs that the Docker files will place into the image 
are present
--- End diff --

It wasn't as if I used a non-breaking space intentionally, I just used my 
OSes default editor for Shell scripts!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...

2018-10-23 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22782#discussion_r227274978
  
--- Diff: bin/docker-image-tool.sh ---
@@ -79,7 +79,7 @@ function build {
   fi
 
   # Verify that Spark has actually been built/is a runnable distribution
-  # i.e. the Spark JARs that the Docker files will place into the image 
are present
--- End diff --

Obvious question but why are we still using ASCII encoding for anything?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...

2018-10-22 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22782#discussion_r226983785
  
--- Diff: bin/docker-image-tool.sh ---
@@ -79,7 +79,7 @@ function build {
   fi
 
   # Verify that Spark has actually been built/is a runnable distribution
-  # i.e. the Spark JARs that the Docker files will place into the image 
are present
--- End diff --

@dongjoon-hyun This was edited in Xcode on OS X so almost certainly 
defaulting to UTF-8 encoding


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh script

2018-10-19 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22748
  
Rebased onto master, should be ready for merging


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22681: [SPARK-25682][k8s] Package example jars in same target f...

2018-10-18 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22681
  
I actually just got bit by this today while trying to run K8S integration 
tests with custom images.  The integration tests assume the runnable 
distribution layout for `/opt/spark/examples` in the image so if you have built 
your images from a working copy the examples JAR is in the wrong place and the 
integration tests will fail.

Tracking this down is made extra hard by the integration tests deleting 
pods after the tests run so you have to catch the pod status and grab its logs 
before they get cleaned up.

I have patched the tests locally to make this configurable but looks like 
this change should render that unnecessary though I can still submit that patch 
if it sounds like it would be useful?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22681: [SPARK-25682][k8s] Package example jars in same t...

2018-10-18 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22681#discussion_r226289881
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile ---
@@ -18,6 +18,7 @@
 FROM openjdk:8-alpine
 
 ARG spark_jars=jars
+ARG example_jars=examples/jars
--- End diff --

Also the K8S integration tests currently assume that the examples JAR is 
present since they rely upon running code from the examples


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh s...

2018-10-17 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22748#discussion_r225867655
  
--- Diff: bin/docker-image-tool.sh ---
@@ -78,20 +91,38 @@ function build {
   docker build $NOCACHEARG "${BUILD_ARGS[@]}" \
 -t $(image_ref spark) \
 -f "$BASEDOCKERFILE" .
+  if [ $? -ne 0 ]; then
+error "Failed to build base Docker image, please refer to Docker build 
output for details"
--- End diff --

Fixed this and other instances thereof


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh s...

2018-10-17 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22748#discussion_r225867566
  
--- Diff: bin/docker-image-tool.sh ---
@@ -44,28 +44,41 @@ function image_ref {
 function build {
   local BUILD_ARGS
   local IMG_PATH
+  local JARS
 
   if [ ! -f "$SPARK_HOME/RELEASE" ]; then
 # Set image build arguments accordingly if this is a source repo and 
not a distribution archive.
 IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles
+JARS=assembly/target/scala-$SPARK_SCALA_VERSION/jars
 BUILD_ARGS=(
   ${BUILD_PARAMS}
   --build-arg
   img_path=$IMG_PATH
   --build-arg
-  spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
+  spark_jars=$JARS
   --build-arg
   k8s_tests=resource-managers/kubernetes/integration-tests/tests
 )
   else
-# Not passed as an argument to docker, but used to validate the Spark 
directory.
+# Not passed as arguments to docker, but used to validate the Spark 
directory.
 IMG_PATH="kubernetes/dockerfiles"
+JARS=jars
 BUILD_ARGS=(${BUILD_PARAMS})
   fi
 
+  # Verify that the Docker image content directory is present
   if [ ! -d "$IMG_PATH" ]; then
 error "Cannot find docker image. This script must be run from a 
runnable distribution of Apache Spark."
   fi
+
+  # Verify that Spark has actually been built/is a runnable distribution
+  # i.e. the Spark JARs that the Docker files will place into the image 
are present
+  local TOTAL_JARS=$(ls $JARS/spark-* | wc -l)
+  TOTAL_JARS=$(( $TOTAL_JARS ))
+  if [ "${TOTAL_JARS}" -eq 0 ]; then
+error "Cannot find Spark JARs.  This script assumes that Apache Spark 
has first been built locally."
--- End diff --

Fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh script

2018-10-17 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22748
  
> There seems to be overlapping logic between this PR and #22681

Yes sorry, I was having issues with the script while working on something 
unrelated and hadn't realised your integration tests were also proposing 
changes to this.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22608: [SPARK-23257][K8S][TESTS] Kerberos Support Integr...

2018-10-16 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22608#discussion_r225614226
  
--- Diff: bin/docker-image-tool.sh ---
@@ -71,18 +71,29 @@ function build {
 --build-arg
 base_img=$(image_ref spark)
   )
-  local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/Dockerfile"}
-  local 
PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/spark/bindings/python/Dockerfile"}
-  local RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/spark/bindings/R/Dockerfile"}
+  local 
BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/Dockerfile"}
+  local 
PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/bindings/python/Dockerfile"}
+  local 
KDOCKERFILE=${KDOCKERFILE:-"$IMG_PATH/test/dockerfiles/spark/kerberos/Dockerfile"}
+  local 
RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/main/dockerfiles/spark/bindings/R/Dockerfile"}
 
+  # Spark Base
   docker build $NOCACHEARG "${BUILD_ARGS[@]}" \
 -t $(image_ref spark) \
 -f "$BASEDOCKERFILE" .
 
+  # PySpark
   docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
 -t $(image_ref spark-py) \
 -f "$PYDOCKERFILE" .
 
+  # The following are optional docker builds for Kerberos Testing
+  docker pull ifilonenko/hadoop-base:latest
--- End diff --

Isn't the explicit pull unnecessary?  Trying to build the new Kerberos 
image which is `FROM` this will implicitly pull it anyway


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh s...

2018-10-16 Thread rvesse
GitHub user rvesse opened a pull request:

https://github.com/apache/spark/pull/22748

[SPARK-25745][K8S] Improve docker-image-tool.sh script

## What changes were proposed in this pull request?

Adds error checking and handling to `docker` invocations ensuring the 
script terminates early in the event of any errors.  This avoids subtle errors 
that can occur e.g. if the base image fails to build the Python/R images can 
end up being built from outdated base images and makes it more explicit to the 
user that something went wrong.

Additionally the provided `Dockerfiles` assume that Spark was first built 
locally or is a runnable distribution however it didn't previously enforce 
this.  The script will now check the JARs folder to ensure that Spark JARs 
actually exist and if not aborts early reminding the user they need to build 
locally first.

## How was this patch tested?

- Tested with a `mvn clean` working copy and verified that the script now 
terminates early
- Tested with bad `Dockerfiles` that fail to build to see that early 
termination occurred

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rvesse/spark SPARK-25745

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22748.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22748


commit beb4083d8351030ac2b07bf6aab68d3b0f555ac9
Author: Rob Vesse 
Date:   2018-10-16T12:48:46Z

[SPARK-25745][K8S] Improve docker-image-tool.sh script

- Actually check for failures from Docker commands and terminate early
- Validate that Spark JARs are actually present before attempting the
  Docker build




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh script

2018-10-16 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22748
  
Suggested reviewers: @mccheah @liyinan926 @skonto 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22146: [SPARK-24434][K8S] pod template files

2018-10-09 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22146#discussion_r223620758
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -799,4 +815,168 @@ specific to Spark on Kubernetes.
This sets the major Python version of the docker image used to run the 
driver and executor containers. Can either be 2 or 3. 
   
 
+
+  spark.kubernetes.driver.podTemplateFile
+  (none)
+  
+   Specify the local file that contains the driver [pod 
template](#pod-template). For example
+   
spark.kubernetes.driver.podTemplateFile=/path/to/driver-pod-template.yaml`
+  
+
+
+  spark.kubernetes.executor.podTemplateFile
+  (none)
+  
+   Specify the local file that contains the executor [pod 
template](#pod-template). For example
+   
spark.kubernetes.executor.podTemplateFile=/path/to/executor-pod-template.yaml`
+  
+
+
+
+ Pod template properties
+
+See the below table for the full list of pod specifications that will be 
overwritten by spark.
+
+### Pod Metadata
+
+
+Pod metadata keyModified 
valueDescription
+
+  name
+  Value of spark.kubernetes.driver.pod.name
+  
+The driver pod name will be overwritten with either the configured or 
default value of
+spark.kubernetes.driver.pod.name. The executor pod names 
will be unaffected.
+  
+
+
+  namespace
+  Value of spark.kubernetes.namespace
+  
+Spark makes strong assumptions about the driver and executor 
namespaces. Both driver and executor namespaces will
+be replaced by either the configured or default spark conf value.
+  
+
+
+  labels
+  Adds the labels from 
spark.kubernetes.{driver,executor}.label.*
+  
+Spark will add additional labels specified by the spark configuration.
+  
+
+
+  annotations
+  Adds the annotations from 
spark.kubernetes.{driver,executor}.annotation.*
+  
+Spark will add additional labels specified by the spark configuration.
+  
+
+
+
+### Pod Spec
+
+
+Pod spec keyModified valueDescription
+
+  imagePullSecrets
+  Adds image pull secrets from 
spark.kubernetes.container.image.pullSecrets
+  
+Additional pull secrets will be added from the spark configuration to 
both executor pods.
+  
+
+
+  nodeSelector
+  Adds node selectors from 
spark.kubernetes.node.selector.*
+  
+Additional node selectors will be added from the spark configuration 
to both executor pods.
+  
+
+
+  restartPolicy
+  "never"
+  
+Spark assumes that both drivers and executors never restart.
+  
+
+
+  serviceAccount
+  Value of 
spark.kubernetes.authenticate.driver.serviceAccountName
+  
+Spark will override serviceAccount with the value of the 
spark configuration for only
+driver pods, and only if the spark configuration is specified. 
Executor pods will remain unaffected.
+  
+
+
+  serviceAccountName
+  Value of 
spark.kubernetes.authenticate.driver.serviceAccountName
+  
+Spark will override serviceAccountName with the value of 
the spark configuration for only
+driver pods, and only if the spark configuration is specified. 
Executor pods will remain unaffected.
+  
+
+
+  volumes
+  Adds volumes from 
spark.kubernetes.{driver,executor}.volumes.[VolumeType].[VolumeName].mount.path
+  
+Spark will add volumes as specified by the spark conf, as well as 
additional volumes necessary for passing
--- End diff --

+1

As I've commented elsewhere it is easily possible to create invalid specs 
with this feature because Spark will create certain volumes, config maps etc 
with known name patterns that users need to avoid


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22146: [SPARK-24434][K8S] pod template files

2018-10-03 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22146
  
@mccheah I was taking that as a given


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-10-01 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r221539921
  
--- Diff: docs/security.md ---
@@ -729,6 +729,15 @@ so that non-local processes can authenticate. These 
delegation tokens in Kuberne
 shared by the Driver and its Executors. As such, there are three ways of 
submitting a kerberos job: 
 
 In all cases you must define the environment variable: `HADOOP_CONF_DIR`.
+It also important to note that the KDC needs to be visible from inside the 
containers if the user uses a local
+krb5 file. 
+
+If a user wishes to use a remote HADOOP_CONF directory, that contains the 
Hadoop configuration files, or 
+a remote krb5 file, this could be achieved by mounting a pre-defined 
ConfigMap and mounting the volume in the
+desired location that you can point to via the appropriate configs. This 
method is useful for those who wish to not
+rebuild their Docker images, but instead point to a ConfigMap that they 
could modify. This strategy is supported
+via the pod-template feature. 
+
--- End diff --

This paragraph is a great addition


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22584: [SPARK-25262][DOC][FOLLOWUP] Fix missing markup t...

2018-10-01 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22584#discussion_r221538327
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -799,7 +799,7 @@ specific to Spark on Kubernetes.
 
 
   spark.kubernetes.local.dirs.tmpfs
-  false
+  false
--- End diff --

Thanks for fixing, apologies for introducing the error in the first place


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-09-10 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/21669
  
@vanzin I think in the current implementation of this PR the Kerberos login 
is happening inside the driver pod which is running inside the K8S cluster.

The old design from the Spark on K8S fork did the login on the submission 
client and then simply distributed the delegation tokens from there.

Something closer to the old design with the login happening in the 
submission client would be simpler and have less gotchas.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22256: [SPARK-25262][K8S] Better support configurability...

2018-09-07 Thread rvesse
Github user rvesse closed the pull request at:

https://github.com/apache/spark/pull/22256


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22256: [SPARK-25262][K8S] Better support configurability of Spa...

2018-09-07 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22256
  
Closed in favour of #22323 which has been merged


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-06 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215695909
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -212,6 +212,60 @@ private[spark] object Config extends Logging {
 "Ensure that major Python version is either Python2 or Python3")
   .createWithDefault("2")
 
+  val KUBERNETES_KERBEROS_PROXY_USER =
+ConfigBuilder("spark.kubernetes.kerberos.proxyUser")
+  .doc("Specify the proxy user " +
+"for HadoopUGI login for the Driver + Executors")
+  .internal()
+  .stringConf
+  .createWithDefault("false")
+
+  val KUBERNETES_KERBEROS_KRB5_FILE =
+ConfigBuilder("spark.kubernetes.kerberos.krb5location")
+  .doc("Specify the location of the krb5 file " +
--- End diff --

Would be worth being explicit here that the KDC defined needs to be visible 
from inside the containers which may not be the case in some network setups.

For example we have clusters with multiple networks and K8S is bound to a 
different network than the network that our default Kerberos configuration 
points to.  So I had to copy and customise the default KDC for Kerberos login 
to get this running during my testing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21669: [SPARK-23257][K8S] Kerberos Support for Spark on ...

2018-09-06 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r215692843
  
--- Diff: docs/security.md ---
@@ -722,6 +722,62 @@ with encryption, at least.
 The Kerberos login will be periodically renewed using the provided 
credentials, and new delegation
 tokens for supported will be created.
 
+## Secure Interaction with Kubernetes
+
+When talking to Hadoop-based services behind Kerberos, it was noted that 
Spark needs to obtain delegation tokens
+so that non-local processes can authenticate. These delegation tokens in 
Kubernetes are stored in Secrets that are 
+shared by the Driver and its Executors. As such, there are three ways of 
submitting a kerberos job: 
+
--- End diff --

Might be worth making it explicit here that for any of the following 
examples to work `HADOOP_CONF_DIR` must be defined in the submission 
environment otherwise the K8S backend skips all the HDFS steps including 
Kerberos setup

Also shouldn't the Running on Kubernetes docs also be updated to mention 
this feature, even if only to link users across to this doc?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-09-06 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22215
  
Think this is pretty much ready to merge, can folks take another look when 
they get chance


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs ba...

2018-09-06 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22323
  
All comments so far addressed, can we kick off the PR builder on this now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-06 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215625508
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
 ---
@@ -22,6 +22,7 @@ import java.util.UUID
 import io.fabric8.kubernetes.api.model.{ContainerBuilder, HasMetadata, 
PodBuilder, VolumeBuilder, VolumeMountBuilder}
 
 import org.apache.spark.deploy.k8s.{KubernetesConf, 
KubernetesDriverSpecificConf, KubernetesRoleSpecificConf, SparkPod}
+import org.apache.spark.deploy.k8s.Config._
--- End diff --

Left as-is


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-06 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215625448
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -225,6 +225,15 @@ private[spark] object Config extends Logging {
 "Ensure that major Python version is either Python2 or Python3")
   .createWithDefault("2")
 
+  val KUBERNETES_LOCAL_DIRS_TMPFS =
+ConfigBuilder("spark.kubernetes.local.dirs.tmpfs")
+  .doc("If set to true then emptyDir volumes created to back 
spark.local.dirs will have " +
+"their medium set to Memory so that they will be created as tmpfs 
(i.e. RAM) backed " +
+"volumes. This may improve performance but scratch space usage 
will count towards " +
+"your pods memory limit so you may wish to request more memory.")
+.booleanConf
--- End diff --

Fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-06 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215625299
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
 ---
@@ -45,6 +47,10 @@ private[spark] class LocalDirsFeatureStep(
 new VolumeBuilder()
   .withName(s"spark-local-dir-${index + 1}")
   .withNewEmptyDir()
+.withMedium(useLocalDirTmpFs match {
+  case true => "Memory" // Use tmpfs
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-06 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215625636
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -215,6 +215,19 @@ 
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai
 
 The configuration properties for mounting volumes into the executor pods 
use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. 
For a complete list of available options for each supported type of volumes, 
please refer to the [Spark Properties](#spark-properties) section below. 
 
+## Local Storage
+
+Spark uses temporary scratch space to spill data to disk during shuffles 
and other operations.  When using Kubernetes as the resource manager the pods 
will be created with an 
[emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) 
volume mounted for each directory listed in `SPARK_LOCAL_DIRS`.  If no 
directories are explicitly specified then a default directory is created and 
configured appropriately.
+
+`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do 
not persist beyond the life of the pod.
+
+### Using RAM for local storage
+
+As `emptyDir` volumes use the nodes backing storage for ephemeral storage 
this default behaviour may not be appropriate for some compute environments.  
For example if you have diskless nodes with remote storage mounted over a 
network having lots of executors doing IO to this remote storage may actually 
degrade performance.
--- End diff --

One suggestion used, other sentence tweaked to read better and insert a 
comma


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-06 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215625362
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -215,6 +215,19 @@ 
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai
 
 The configuration properties for mounting volumes into the executor pods 
use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. 
For a complete list of available options for each supported type of volumes, 
please refer to the [Spark Properties](#spark-properties) section below. 
 
+## Local Storage
+
+Spark uses temporary scratch space to spill data to disk during shuffles 
and other operations.  When using Kubernetes as the resource manager the pods 
will be created with an 
[emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) 
volume mounted for each directory listed in `SPARK_LOCAL_DIRS`.  If no 
directories are explicitly specified then a default directory is created and 
configured appropriately.
+
+`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do 
not persist beyond the life of the pod.
+
+### Using RAM for local storage
+
+As `emptyDir` volumes use the nodes backing storage for ephemeral storage 
this default behaviour may not be appropriate for some compute environments.  
For example if you have diskless nodes with remote storage mounted over a 
network having lots of executors doing IO to this remote storage may actually 
degrade performance.
+
+In this case it may be desirable to set 
`spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause 
the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes.  
When configured like this Sparks local storage usage will count towards your 
pods memory usage therefore you may wish to increase your memory requests via 
the normal `spark.driver.memory` and `spark.executor.memory` configuration 
properties.
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-05 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215338145
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -215,6 +215,19 @@ 
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai
 
 The configuration properties for mounting volumes into the executor pods 
use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. 
For a complete list of available options for each supported type of volumes, 
please refer to the [Spark Properties](#spark-properties) section below. 
 
+## Local Storage
+
+Spark uses temporary scratch space to spill data to disk during shuffles 
and other operations.  When using Kubernetes as the resource manager the pods 
will be created with an 
[emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) 
volume mounted for each directory listed in `SPARK_LOCAL_DIRS`.  If no 
directories are explicitly specified then a default directory is created and 
configured appropriately.
+
+`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do 
not persist beyond the life of the pod.
+
+### Using RAM for local storage
+
+As `emptyDir` volumes use the nodes backing storage for ephemeral storage 
this default behaviour may not be appropriate for some compute environments.  
For example if you have diskless nodes with remote storage mounted over a 
network having lots of executors doing IO to this remote storage may actually 
degrade performance.
+
+In this case it may be desirable to set 
`spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause 
the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes.  
When configured like this Sparks local storage usage will count towards your 
pods memory usage therefore you may wish to increase your memory requests via 
the normal `spark.driver.memory` and `spark.executor.memory` configuration 
properties.
--- End diff --

Ok, I'll update the proposed doc changes to mention both


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-05 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22323#discussion_r215187426
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -215,6 +215,19 @@ 
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai
 
 The configuration properties for mounting volumes into the executor pods 
use prefix `spark.kubernetes.executor.` instead of `spark.kubernetes.driver.`. 
For a complete list of available options for each supported type of volumes, 
please refer to the [Spark Properties](#spark-properties) section below. 
 
+## Local Storage
+
+Spark uses temporary scratch space to spill data to disk during shuffles 
and other operations.  When using Kubernetes as the resource manager the pods 
will be created with an 
[emptyDir](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) 
volume mounted for each directory listed in `SPARK_LOCAL_DIRS`.  If no 
directories are explicitly specified then a default directory is created and 
configured appropriately.
+
+`emptyDir` volumes use the ephemeral storage feature of Kubernetes and do 
not persist beyond the life of the pod.
+
+### Using RAM for local storage
+
+As `emptyDir` volumes use the nodes backing storage for ephemeral storage 
this default behaviour may not be appropriate for some compute environments.  
For example if you have diskless nodes with remote storage mounted over a 
network having lots of executors doing IO to this remote storage may actually 
degrade performance.
+
+In this case it may be desirable to set 
`spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause 
the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes.  
When configured like this Sparks local storage usage will count towards your 
pods memory usage therefore you may wish to increase your memory requests via 
the normal `spark.driver.memory` and `spark.executor.memory` configuration 
properties.
--- End diff --

@liyinan926 Yes it is in the case of K8S, per the 
[documentation](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir):

> However, you can set the emptyDir.medium field to "Memory" to tell 
Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead. While 
tmpfs is very fast, be aware that unlike disks, tmpfs is cleared on node reboot 
and **any files you write will count against your Container’s memory limit**.

Emphasis added by me, since the container memory requests and limits are 
driven by `spark.*.memory` this is the appropriate setting to change.  Changing 
the memory overhead would also serve to increase these limits but if the user 
has a rough idea of how much memory they need asking for it explicitly is 
easier.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-09-03 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/21669
  
@ifilonenko I think the issue with the `UnixUsername` might possibly be 
avoided by exporting `HADOOP_USER_NAME` as an environment variable in the pod 
spec set to the same value as `SPARK_USER`.  This should cause the Hadoop UGI 
machinery to simply take the provided username  rather than attempting to 
obtain it from the container which fails because the user does not exist as a 
Unix user in `/etc/passwd`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22256: [SPARK-25262][K8S] Better support configurability...

2018-09-03 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22256#discussion_r214623272
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
 ---
@@ -37,41 +40,99 @@ private[spark] class LocalDirsFeatureStep(
 .orElse(conf.getOption("spark.local.dir"))
 .getOrElse(defaultLocalDir)
 .split(",")
+  private val useLocalDirTmpFs = conf.get(KUBERNETES_LOCAL_DIRS_TMPFS)
 
   override def configurePod(pod: SparkPod): SparkPod = {
 val localDirVolumes = resolvedLocalDirs
   .zipWithIndex
   .map { case (localDir, index) =>
-new VolumeBuilder()
-  .withName(s"spark-local-dir-${index + 1}")
-  .withNewEmptyDir()
-  .endEmptyDir()
-  .build()
+val name = s"spark-local-dir-${index + 1}"
+// To allow customisation of local dirs backing volumes we should 
avoid creating
+// emptyDir volumes if the volume is already defined in the pod 
spec
+hasVolume(pod, name) match {
+  case true =>
+// For pre-existing volume definitions just re-use the volume
+pod.pod.getSpec().getVolumes().asScala.find(v => 
v.getName.equals(name)).get
+  case false =>
+// Create new emptyDir volume
+new VolumeBuilder()
+  .withName(name)
+  .withNewEmptyDir()
+.withMedium(useLocalDirTmpFs match {
+  case true => "Memory" // Use tmpfs
+  case false => null// Default - use nodes backing 
storage
+})
+  .endEmptyDir()
+  .build()
+}
   }
+
 val localDirVolumeMounts = localDirVolumes
   .zip(resolvedLocalDirs)
   .map { case (localDirVolume, localDirPath) =>
-new VolumeMountBuilder()
-  .withName(localDirVolume.getName)
-  .withMountPath(localDirPath)
-  .build()
+hasVolumeMount(pod, localDirVolume.getName, localDirPath) match {
+  case true =>
+// For pre-existing volume mounts just re-use the mount
+pod.container.getVolumeMounts().asScala
+  .find(m => m.getName.equals(localDirVolume.getName)
+ && m.getMountPath.equals(localDirPath))
+  .get
+  case false =>
+// Create new volume mount
+new VolumeMountBuilder()
+  .withName (localDirVolume.getName)
+  .withMountPath (localDirPath)
+  .build()
+}
+  }
+
+// Check for conflicting volume mounts
+for (m: VolumeMount <- localDirVolumeMounts) {
+  if (hasConflictingVolumeMount(pod, m.getName, m.getMountPath).size > 
0) {
+throw new SparkException(s"Conflicting volume mounts defined, pod 
template attempted to " +
+  "mount SPARK_LOCAL_DIRS volume ${m.getName} multiple times or at 
an alternative path " +
+  "then the expected ${m.getPath}")
   }
+}
+
 val podWithLocalDirVolumes = new PodBuilder(pod.pod)
   .editSpec()
-.addToVolumes(localDirVolumes: _*)
+ // Don't want to re-add volumes that already existed in the 
incoming spec
+ // as duplicate definitions will lead to K8S API errors
+.addToVolumes(localDirVolumes.filter(v => !hasVolume(pod, 
v.getName)): _*)
--- End diff --

@mccheah @ifilonenko OK, I have opened PR #22323 with just the `tmpfs` 
enabling changes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22323: [SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be t...

2018-09-03 Thread rvesse
GitHub user rvesse opened a pull request:

https://github.com/apache/spark/pull/22323

[SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S

## What changes were proposed in this pull request?

The default behaviour of Spark on K8S currently is to create `emptyDir` 
volumes to back `SPARK_LOCAL_DIRS`.  In some environments e.g. diskless compute 
nodes this may actually hurt performance because these are backed by the 
Kubelet's node storage which on a diskless node will typically be some remote 
network storage.

Even if this is enterprise grade storage connected via a high speed 
interconnect the way Spark uses these directories as scratch space (lots of 
relatively small short lived files) has been observed to cause serious 
performance degradation.  Therefore we would like to provide the option to use 
K8S's ability to instead back these `emptyDir` volumes with `tmpfs`. Therefore 
this PR adds a configuration option that enables `SPARK_LOCAL_DIRS` to be 
backed by Memory backed `emptyDir` volumes rather than the default.

Documentation is added to describe both the default behaviour plus this new 
option and its implications.  One of which is that scratch space then counts 
towards your pods memory limits and therefore users will need to adjust their 
memory requests accordingly.

*NB* - This is an alternative version of PR #22256 reduced to just the 
`tmpfs` piece

## How was this patch tested?

Ran with this option in our diskless compute environments to verify 
functionality

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rvesse/spark SPARK-25262-tmpfs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22323


commit 544f132f2bf61d05b16d93b086e5a776824b70c5
Author: Rob Vesse 
Date:   2018-08-28T13:52:07Z

[SPARK-25262][K8S] Allow SPARK_LOCAL_DIRS to be tmpfs backed on K8S

Adds a configuration option that enables SPARK_LOCAL_DIRS to be backed
by Memory backed emptyDir volumes rather than the default which is
whatever the kubelet's node storage happens to be




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22215: [SPARK-25222][K8S] Improve container status loggi...

2018-09-03 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22215#discussion_r214614277
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala
 ---
@@ -151,13 +152,15 @@ private[spark] class ExecutorPodsLifecycleManager(
 
   private def exitReasonMessage(podState: FinalPodState, execId: Long, 
exitCode: Int) = {
 val pod = podState.pod
+val reason = Option(pod.getStatus.getReason)
+val message = Option(pod.getStatus.getMessage)
 s"""
|The executor with id $execId exited with exit code $exitCode.
-   |The API gave the following brief reason: ${pod.getStatus.getReason}
-   |The API gave the following message: ${pod.getStatus.getMessage}
+   |The API gave the following brief reason: ${reason.getOrElse("")}
--- End diff --

Fixed in latest commit


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22215: [SPARK-25222][K8S] Improve container status logging

2018-09-03 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22215
  
@mccheah Thanks for the review, have made the change you suggested to use 
N/A instead of empty string.

I have left indentation as tabs for now, as I said in a previous comment 
this was just what the existing code used and I am happy to change it if others 
also want the change to spaces made


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22215: [SPARK-25222][K8S] Improve container status loggi...

2018-09-03 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22215#discussion_r214612510
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 ---
@@ -60,4 +64,81 @@ private[spark] object KubernetesUtils {
   }
 
   def parseMasterUrl(url: String): String = url.substring("k8s://".length)
+
+  def formatPairsBundle(pairs: Seq[(String, String)], indent: Int = 1) : 
String = {
+// Use more loggable format if value is null or empty
+val indentStr = "\t" * indent
--- End diff --

I just preserved the original codes choice here, I would happily change to 
spaces if preferred


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22256: [SPARK-25262][K8S] Better support configurability...

2018-08-31 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22256#discussion_r214416634
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
 ---
@@ -37,41 +40,99 @@ private[spark] class LocalDirsFeatureStep(
 .orElse(conf.getOption("spark.local.dir"))
 .getOrElse(defaultLocalDir)
 .split(",")
+  private val useLocalDirTmpFs = conf.get(KUBERNETES_LOCAL_DIRS_TMPFS)
 
   override def configurePod(pod: SparkPod): SparkPod = {
 val localDirVolumes = resolvedLocalDirs
   .zipWithIndex
   .map { case (localDir, index) =>
-new VolumeBuilder()
-  .withName(s"spark-local-dir-${index + 1}")
-  .withNewEmptyDir()
-  .endEmptyDir()
-  .build()
+val name = s"spark-local-dir-${index + 1}"
+// To allow customisation of local dirs backing volumes we should 
avoid creating
+// emptyDir volumes if the volume is already defined in the pod 
spec
+hasVolume(pod, name) match {
+  case true =>
+// For pre-existing volume definitions just re-use the volume
+pod.pod.getSpec().getVolumes().asScala.find(v => 
v.getName.equals(name)).get
+  case false =>
+// Create new emptyDir volume
+new VolumeBuilder()
+  .withName(name)
+  .withNewEmptyDir()
+.withMedium(useLocalDirTmpFs match {
+  case true => "Memory" // Use tmpfs
+  case false => null// Default - use nodes backing 
storage
+})
+  .endEmptyDir()
+  .build()
+}
   }
+
 val localDirVolumeMounts = localDirVolumes
   .zip(resolvedLocalDirs)
   .map { case (localDirVolume, localDirPath) =>
-new VolumeMountBuilder()
-  .withName(localDirVolume.getName)
-  .withMountPath(localDirPath)
-  .build()
+hasVolumeMount(pod, localDirVolume.getName, localDirPath) match {
+  case true =>
+// For pre-existing volume mounts just re-use the mount
+pod.container.getVolumeMounts().asScala
+  .find(m => m.getName.equals(localDirVolume.getName)
+ && m.getMountPath.equals(localDirPath))
+  .get
+  case false =>
+// Create new volume mount
+new VolumeMountBuilder()
+  .withName (localDirVolume.getName)
+  .withMountPath (localDirPath)
+  .build()
+}
+  }
+
+// Check for conflicting volume mounts
+for (m: VolumeMount <- localDirVolumeMounts) {
+  if (hasConflictingVolumeMount(pod, m.getName, m.getMountPath).size > 
0) {
+throw new SparkException(s"Conflicting volume mounts defined, pod 
template attempted to " +
+  "mount SPARK_LOCAL_DIRS volume ${m.getName} multiple times or at 
an alternative path " +
+  "then the expected ${m.getPath}")
   }
+}
+
 val podWithLocalDirVolumes = new PodBuilder(pod.pod)
   .editSpec()
-.addToVolumes(localDirVolumes: _*)
+ // Don't want to re-add volumes that already existed in the 
incoming spec
+ // as duplicate definitions will lead to K8S API errors
+.addToVolumes(localDirVolumes.filter(v => !hasVolume(pod, 
v.getName)): _*)
--- End diff --

Ok, will do that Monday

FYI I notice @onursatici has now made some similar tweaks in his latest 
commit - 
https://github.com/apache/spark/pull/22146/commits/a4fde0cdc4dc5b64fd3f888244656371eb76f837
 - notice several feature steps there now have `editOrNewX()` or `addToX()` so 
that they combine with rather than overriding the template


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22256: [SPARK-25262][K8S] Better support configurability...

2018-08-31 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22256#discussion_r214277672
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
 ---
@@ -37,41 +40,99 @@ private[spark] class LocalDirsFeatureStep(
 .orElse(conf.getOption("spark.local.dir"))
 .getOrElse(defaultLocalDir)
 .split(",")
+  private val useLocalDirTmpFs = conf.get(KUBERNETES_LOCAL_DIRS_TMPFS)
 
   override def configurePod(pod: SparkPod): SparkPod = {
 val localDirVolumes = resolvedLocalDirs
   .zipWithIndex
   .map { case (localDir, index) =>
-new VolumeBuilder()
-  .withName(s"spark-local-dir-${index + 1}")
-  .withNewEmptyDir()
-  .endEmptyDir()
-  .build()
+val name = s"spark-local-dir-${index + 1}"
+// To allow customisation of local dirs backing volumes we should 
avoid creating
+// emptyDir volumes if the volume is already defined in the pod 
spec
+hasVolume(pod, name) match {
+  case true =>
+// For pre-existing volume definitions just re-use the volume
+pod.pod.getSpec().getVolumes().asScala.find(v => 
v.getName.equals(name)).get
+  case false =>
+// Create new emptyDir volume
+new VolumeBuilder()
+  .withName(name)
+  .withNewEmptyDir()
+.withMedium(useLocalDirTmpFs match {
+  case true => "Memory" // Use tmpfs
+  case false => null// Default - use nodes backing 
storage
+})
+  .endEmptyDir()
+  .build()
+}
   }
+
 val localDirVolumeMounts = localDirVolumes
   .zip(resolvedLocalDirs)
   .map { case (localDirVolume, localDirPath) =>
-new VolumeMountBuilder()
-  .withName(localDirVolume.getName)
-  .withMountPath(localDirPath)
-  .build()
+hasVolumeMount(pod, localDirVolume.getName, localDirPath) match {
+  case true =>
+// For pre-existing volume mounts just re-use the mount
+pod.container.getVolumeMounts().asScala
+  .find(m => m.getName.equals(localDirVolume.getName)
+ && m.getMountPath.equals(localDirPath))
+  .get
+  case false =>
+// Create new volume mount
+new VolumeMountBuilder()
+  .withName (localDirVolume.getName)
+  .withMountPath (localDirPath)
+  .build()
+}
+  }
+
+// Check for conflicting volume mounts
+for (m: VolumeMount <- localDirVolumeMounts) {
+  if (hasConflictingVolumeMount(pod, m.getName, m.getMountPath).size > 
0) {
+throw new SparkException(s"Conflicting volume mounts defined, pod 
template attempted to " +
+  "mount SPARK_LOCAL_DIRS volume ${m.getName} multiple times or at 
an alternative path " +
+  "then the expected ${m.getPath}")
   }
+}
+
 val podWithLocalDirVolumes = new PodBuilder(pod.pod)
   .editSpec()
-.addToVolumes(localDirVolumes: _*)
+ // Don't want to re-add volumes that already existed in the 
incoming spec
+ // as duplicate definitions will lead to K8S API errors
+.addToVolumes(localDirVolumes.filter(v => !hasVolume(pod, 
v.getName)): _*)
--- End diff --

> This is a stance that as far I'm aware, we specifically chose not to take 
in the pod template feature. If one is using the pod template feature then 
Spark won't provide any guarantees that the pod it makes will be well-formed. 
When spark submit deploys the pod to the cluster the API will return a clear 
enough error informing the user to make the appropriate corrections to their 
pod template.

Sure, but we still need to be realistic about how the template feature will 
be used.  It is supposed to enable power users to customise the pods for their 
environments.  If there is an area like this where there is a clear use case to 
allow customisation we should be enabling that rather than saying sorry we're 
going to generate invalid pods regardless.  Obviously the power user is 
assuming the risk of creating a pod template that meaningfully combines with 
Sparks generated pod to yield a valid runtime environment.

Clearly my stance here is controversial and likely needs a broader 
discussion on the dev list.

I can reduce this PR to just the config to enable `tmpfs` backed `emptyDir` 

[GitHub] spark issue #22256: [SPARK-25262][K8S] Better support configurability of Spa...

2018-08-30 Thread rvesse
Github user rvesse commented on the issue:

https://github.com/apache/spark/pull/22256
  
@skonto I haven't done anything specific for the size limit ATM.  From the 
K8S docs `tmpfs` backed `emptyDir` usage counts towards your containers memory 
limits so you can just set `spark.executor.memory` to a larger amount as needed.

From the discussion you linked you can explicitly set a size limit for the 
volume but I wanted to avoid adding multiple configuration options if possible. 
 Since this PR allows for template defined volumes to be used you could define 
a volume in your template with the `sizeLimit` applied to it once you have pod 
templates PR available


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22256: [SPARK-25262][K8S] Better support configurability...

2018-08-30 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22256#discussion_r213954156
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
 ---
@@ -37,41 +40,99 @@ private[spark] class LocalDirsFeatureStep(
 .orElse(conf.getOption("spark.local.dir"))
 .getOrElse(defaultLocalDir)
 .split(",")
+  private val useLocalDirTmpFs = conf.get(KUBERNETES_LOCAL_DIRS_TMPFS)
 
   override def configurePod(pod: SparkPod): SparkPod = {
 val localDirVolumes = resolvedLocalDirs
   .zipWithIndex
   .map { case (localDir, index) =>
-new VolumeBuilder()
-  .withName(s"spark-local-dir-${index + 1}")
-  .withNewEmptyDir()
-  .endEmptyDir()
-  .build()
+val name = s"spark-local-dir-${index + 1}"
+// To allow customisation of local dirs backing volumes we should 
avoid creating
+// emptyDir volumes if the volume is already defined in the pod 
spec
+hasVolume(pod, name) match {
+  case true =>
+// For pre-existing volume definitions just re-use the volume
+pod.pod.getSpec().getVolumes().asScala.find(v => 
v.getName.equals(name)).get
+  case false =>
+// Create new emptyDir volume
+new VolumeBuilder()
+  .withName(name)
+  .withNewEmptyDir()
+.withMedium(useLocalDirTmpFs match {
+  case true => "Memory" // Use tmpfs
+  case false => null// Default - use nodes backing 
storage
+})
+  .endEmptyDir()
+  .build()
+}
   }
+
 val localDirVolumeMounts = localDirVolumes
   .zip(resolvedLocalDirs)
   .map { case (localDirVolume, localDirPath) =>
-new VolumeMountBuilder()
-  .withName(localDirVolume.getName)
-  .withMountPath(localDirPath)
-  .build()
+hasVolumeMount(pod, localDirVolume.getName, localDirPath) match {
+  case true =>
+// For pre-existing volume mounts just re-use the mount
+pod.container.getVolumeMounts().asScala
+  .find(m => m.getName.equals(localDirVolume.getName)
+ && m.getMountPath.equals(localDirPath))
+  .get
+  case false =>
+// Create new volume mount
+new VolumeMountBuilder()
+  .withName (localDirVolume.getName)
+  .withMountPath (localDirPath)
+  .build()
+}
+  }
+
+// Check for conflicting volume mounts
+for (m: VolumeMount <- localDirVolumeMounts) {
+  if (hasConflictingVolumeMount(pod, m.getName, m.getMountPath).size > 
0) {
+throw new SparkException(s"Conflicting volume mounts defined, pod 
template attempted to " +
+  "mount SPARK_LOCAL_DIRS volume ${m.getName} multiple times or at 
an alternative path " +
+  "then the expected ${m.getPath}")
   }
+}
+
 val podWithLocalDirVolumes = new PodBuilder(pod.pod)
   .editSpec()
-.addToVolumes(localDirVolumes: _*)
+ // Don't want to re-add volumes that already existed in the 
incoming spec
+ // as duplicate definitions will lead to K8S API errors
+.addToVolumes(localDirVolumes.filter(v => !hasVolume(pod, 
v.getName)): _*)
--- End diff --

The implementation is still additive in that it will add to existing 
elements in the pod spec as needed but respect what is already present.

    If your pod spec contains duplicate volumes/volume mounts then K8S will 
reject it as invalid e.g.

```
The Pod "rvesse-test" is invalid: spec.volumes[1].name: Duplicate value: 
"spark-local-dirs-1"
```

Therefore it is necessary to explicitly avoid duplicating things already 
present in the template

If the aim is to replace adding further config options with the pod 
template feature then the existing builders do need to be more intelligent in 
what they do to avoid generating invalid pod specs.  This is regardless of 
whether the template feature is opinionated about validation, even if the 
template feature doesn't do validation, Spark code itself should be ensuring 
that it generates valid specs as far as it is able to.  Obviously it can't 
detect every possible invalid spec that it might generate if the templates 
aren't being validated but it can avoid introducing easily avoidable invalid 
specs.


---

  1   2   >