[GitHub] spark issue #22392: [SPARK-23200] Reset Kubernetes-specific config on Checkp...

2018-09-13 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/22392
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes

2018-07-19 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21067
  
> ReadWriteOnce storage can only be attached to one node.

This is well known. Using the RWO volume for fencing here would work - but 
this is not representative of all users. This breaks down if you include 
checkpointing to object storage (s3) or HDFS or into ReadWriteMany volumes like 
NFS. In all of those cases, there will be a problem with correctness. 

For folks that need it right away, the same restarts feature can be 
realized using an approach like the 
[spark-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) 
without any of this hassle in a safe way, so, why are we trying to fit this 
into Spark with caveats around how volumes should be used to ensure fencing? 
This seems more error prone and harder to explain and I can't see the gain from 
it. One way forward is proposing to the k8s community to have a new option jobs 
that allow us to get fencing from the k8s apiserver through deterministic 
names. I think that would be a good way forward. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes

2018-07-12 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21067
  
> After a short/configurable delay the driver pod state changed to Unknown
and the Job controller initiated a new spark driver.

This is dangerous behavior. The old spark driver can still be perfectly
functional and running within the cluster even though it's state is marked
Unknown. It could also still be making progress with it's own executors.
Network connection with the K8s master is not a prerequisite for pods to
continue running.

On Thu, Jul 12, 2018, 7:57 AM Lucas Kacher  wrote:

> @baluchicken <https://github.com/baluchicken>, did that test involve
> using checkpointing in a shared location?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21067#issuecomment-404541386>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AA3U5z1vUwuS3NHh8Zx388Am8gs1sedTks5uF2PogaJpZM4TTiRg>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-06 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21583
  
@ifilonenko, can you elaborate on what the issue is and what version of 
docker is needed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes

2018-07-06 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21067
  
I don't think this current approach will suffice. Correctness is important 
here, especially for folks using spark streaming. I understand that we're 
proposing the use of backoff limits but there is **no guarantee** that a job 
controller **won't** spin up 2 driver pods when we ask for 1. That by 
definition is how the job controller works, by being greedy and working towards 
desired completions. For example, in the case of a network partition, the job 
controller logic in the Kubernetes master will not differentiate between:

1. Losing contact with the driver pod temporarily
2. Finding no driver pod and starting a new one

This has been the reason why in the past I've proposed using a StatefulSet. 
However, getting termination semantics with a StatefulSet will be more work. I 
don't think we should sacrifice correctness in this layer as it would surprise 
the application author who now has to reason about whether the operation they 
are performing is idempotent.

Can we have a proposal and understand all the subtleties before trying to 
change this behavior. For example, if we end up with more than one driver for a 
single job, I'd like to ensure that only one of them is making progress (for 
ex. by using a lease in ZK). 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21652: [SPARK-24551][K8S] Add integration tests for secr...

2018-07-02 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21652#discussion_r199608693
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala
 ---
@@ -74,10 +76,12 @@ private[spark] class KubernetesSuite extends 
SparkFunSuite
 testBackend = IntegrationTestBackendFactory.getTestBackend
 testBackend.initialize()
 kubernetesTestComponents = new 
KubernetesTestComponents(testBackend.getKubernetesClient)
+createTestSecret()
--- End diff --

I'm wondering if we should add these tests to a separate class? This file 
is growing in size and we can separate them out a bit for better code 
understanding.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21652: [SPARK-24551][K8S] Add integration tests for secr...

2018-07-02 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21652#discussion_r199608853
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala
 ---
@@ -150,6 +179,28 @@ private[spark] class KubernetesSuite extends 
SparkFunSuite
   })
   }
 
+  test("Run SparkPi with env and mount secrets.") {
+sparkAppConf
+  .set(s"spark.kubernetes.driver.secrets.$ENV_SECRET_NAME", 
SECRET_MOUNT_PATH)
+  .set(s"spark.kubernetes.driver.secretKeyRef.USERNAME", 
s"$ENV_SECRET_NAME:username")
+  .set(s"spark.kubernetes.driver.secretKeyRef.PASSWORD", 
s"$ENV_SECRET_NAME:password")
+  .set(s"spark.kubernetes.executor.secrets.$ENV_SECRET_NAME", 
SECRET_MOUNT_PATH)
+  .set(s"spark.kubernetes.executor.secretKeyRef.USERNAME", 
s"$ENV_SECRET_NAME:username")
+  .set(s"spark.kubernetes.executor.secretKeyRef.PASSWORD", 
s"$ENV_SECRET_NAME:password")
+
+runSparkPiAndVerifyCompletion(
+  driverPodChecker = (driverPod: Pod) => {
+doBasicDriverPodCheck(driverPod)
+  checkSecrets(driverPod)
--- End diff --

nit: indentation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-02 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21652
  
jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21697: [SPARK-24711][K8S] Fix tags for integration tests

2018-07-02 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21697
  
This is good! Same comment that @srowen had on moving the version up to the 
parent POM and referencing it here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-24428][K8S] Fix unused code

2018-07-02 Thread foxish
Repository: spark
Updated Branches:
  refs/heads/master 42815548c -> 85fe1297e


[SPARK-24428][K8S] Fix unused code

## What changes were proposed in this pull request?

Remove code that is misleading and is a leftover from a previous implementation.

## How was this patch tested?
Manually.

Author: Stavros Kontopoulos 

Closes #21462 from skonto/fix-k8s-docs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/85fe1297
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/85fe1297
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/85fe1297

Branch: refs/heads/master
Commit: 85fe1297e35bcff9cf86bd53fee615e140ee5bfb
Parents: 4281554
Author: Stavros Kontopoulos 
Authored: Mon Jul 2 13:08:16 2018 -0700
Committer: Anirudh Ramanathan 
Committed: Mon Jul 2 13:08:16 2018 -0700

--
 .../scala/org/apache/spark/deploy/k8s/Constants.scala   |  6 --
 .../cluster/k8s/KubernetesClusterManager.scala  |  2 --
 .../docker/src/main/dockerfiles/spark/entrypoint.sh | 12 +---
 3 files changed, 5 insertions(+), 15 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/85fe1297/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
index 69bd03d..5ecdd3a 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
@@ -25,9 +25,6 @@ private[spark] object Constants {
   val SPARK_POD_DRIVER_ROLE = "driver"
   val SPARK_POD_EXECUTOR_ROLE = "executor"
 
-  // Annotations
-  val SPARK_APP_NAME_ANNOTATION = "spark-app-name"
-
   // Credentials secrets
   val DRIVER_CREDENTIALS_SECRETS_BASE_DIR =
 "/mnt/secrets/spark-kubernetes-credentials"
@@ -50,17 +47,14 @@ private[spark] object Constants {
   val DEFAULT_BLOCKMANAGER_PORT = 7079
   val DRIVER_PORT_NAME = "driver-rpc-port"
   val BLOCK_MANAGER_PORT_NAME = "blockmanager"
-  val EXECUTOR_PORT_NAME = "executor"
 
   // Environment Variables
-  val ENV_EXECUTOR_PORT = "SPARK_EXECUTOR_PORT"
   val ENV_DRIVER_URL = "SPARK_DRIVER_URL"
   val ENV_EXECUTOR_CORES = "SPARK_EXECUTOR_CORES"
   val ENV_EXECUTOR_MEMORY = "SPARK_EXECUTOR_MEMORY"
   val ENV_APPLICATION_ID = "SPARK_APPLICATION_ID"
   val ENV_EXECUTOR_ID = "SPARK_EXECUTOR_ID"
   val ENV_EXECUTOR_POD_IP = "SPARK_EXECUTOR_POD_IP"
-  val ENV_MOUNTED_CLASSPATH = "SPARK_MOUNTED_CLASSPATH"
   val ENV_JAVA_OPT_PREFIX = "SPARK_JAVA_OPT_"
   val ENV_CLASSPATH = "SPARK_CLASSPATH"
   val ENV_DRIVER_BIND_ADDRESS = "SPARK_DRIVER_BIND_ADDRESS"

http://git-wip-us.apache.org/repos/asf/spark/blob/85fe1297/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
index c6e931a..de2a52b 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
@@ -48,8 +48,6 @@ private[spark] class KubernetesClusterManager extends 
ExternalClusterManager wit
   sc: SparkContext,
   masterURL: String,
   scheduler: TaskScheduler): SchedulerBackend = {
-val executorSecretNamesToMountPaths = 
KubernetesUtils.parsePrefixedKeyValuePairs(
-  sc.conf, KUBERNETES_EXECUTOR_SECRETS_PREFIX)
 val kubernetesClient = SparkKubernetesClientFactory.createKubernetesClient(
   KUBERNETES_MASTER_INTERNAL_URL,
   Some(sc.conf.get(KUBERNETES_NAMESPACE)),

http://git-wip-us.apache.org/repos/asf/spark/blob/85fe1297/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
--
diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
index 2f4e115..8bdb0f7 100755
--- 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
+++ 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
@@ -51,12 +51,10 @@ esac
 
 

[GitHub] spark issue #21462: [SPARK-24428][K8S] Fix unused code

2018-07-02 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21462
  
Merging to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21462: [SPARK-24428][K8S] Fix unused code

2018-07-02 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21462
  
LGTM, we can file a separate task to track what happened to secrets 
mounting without blocking on removing dead code.

Thanks for the fix @skonto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21511: [SPARK-24491][Kubernetes] Configuration support for requ...

2018-06-25 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21511
  
+1 agreed with @liyinan926. Let's be more careful what we add to the 
configuration options. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-06-25 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21583
  
@ifilonenko, is this ready to go?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21462: [SPARK-24428][K8S] Fix unused code

2018-06-25 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21462#discussion_r197909183
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
@@ -45,12 +45,10 @@ shift 1
 
 SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
 env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt
-readarray -t SPARK_JAVA_OPTS < /tmp/java_opts.txt
-if [ -n "$SPARK_MOUNTED_CLASSPATH" ]; then
-  SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_MOUNTED_CLASSPATH"
-fi
-if [ -n "$SPARK_MOUNTED_FILES_DIR" ]; then
-  cp -R "$SPARK_MOUNTED_FILES_DIR/." .
+readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt
--- End diff --

+1, that makes sense. Those env-vars are set by launcher at submission 
time. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21462: [SPARK-24428][K8S] Fix unused code

2018-06-25 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21462#discussion_r197909535
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
 ---
@@ -46,8 +46,6 @@ private[spark] class KubernetesClusterManager extends 
ExternalClusterManager wit
   sc: SparkContext,
   masterURL: String,
   scheduler: TaskScheduler): SchedulerBackend = {
-val executorSecretNamesToMountPaths = 
KubernetesUtils.parsePrefixedKeyValuePairs(
-  sc.conf, KUBERNETES_EXECUTOR_SECRETS_PREFIX)
--- End diff --

Do we not have secrets -> mountpaths support right now? @mccheah 
@liyinan926 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21279: [SPARK-24219][k8s] Improve the docker building sc...

2018-06-25 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21279#discussion_r197906642
  
--- Diff: bin/docker-image-tool.sh ---
@@ -44,15 +44,37 @@ function image_ref {
 function build {
   local BUILD_ARGS
   local IMG_PATH
+  local TMPFOLDER
 
   if [ ! -f "$SPARK_HOME/RELEASE" ]; then
 # Set image build arguments accordingly if this is a source repo and 
not a distribution archive.
+local 
JARS="${SPARK_HOME}/assembly/target/scala-${SPARK_SCALA_VERSION}/jars"
+TMPFOLDER=`mktemp -q -d examples.XX`
+if [ $? -ne 0 ]; then
+  ehco "Cannot create temp folder, exiting..."
+  exit 1
+fi
+
+mkdir -p "${TMPFOLDER}/jars"
+cp "${SPARK_HOME}"/examples/target/scala*/jars/* "${TMPFOLDER}/jars"
--- End diff --

Do we need the temp folder? Can we just pass the path 
`"${SPARK_HOME}"/examples/target/scala*/jars/*` to the dockerfile instead to 
pick up?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-24547][K8S] Allow for building spark on k8s docker images without cache and don't forget to push spark-py container.

2018-06-20 Thread foxish
Repository: spark
Updated Branches:
  refs/heads/master 3f4bda728 -> 15747cfd3


[SPARK-24547][K8S] Allow for building spark on k8s docker images without cache 
and don't forget to push spark-py container.

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-24547

TL;DR from JIRA issue:

- First time I generated images for 2.4.0 Docker was using it's cache, so 
actually when running jobs, old jars where still in the Docker image. This 
produces errors like this in the executors:

`java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local 
class incompatible: stream classdesc serialVersionUID = 6155820641931972169, 
local class serialVersionUID = -3720498261147521051`

- The second problem was that the spark container is pushed, but the spark-py 
container wasn't yet. This was just forgotten in the initial PR.

- A third problem I also ran into because I had an older docker was 
https://github.com/apache/spark/pull/21551 so I have not included a fix for 
that in this ticket.

## How was this patch tested?

I've tested it on my own Spark on k8s deployment.

Author: Ray Burgemeestre 

Closes #21555 from rayburgemeestre/SPARK-24547.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/15747cfd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/15747cfd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/15747cfd

Branch: refs/heads/master
Commit: 15747cfd3246385ffb23e19e28d2e4effa710bf6
Parents: 3f4bda7
Author: Ray Burgemeestre 
Authored: Wed Jun 20 17:09:37 2018 -0700
Committer: Anirudh Ramanathan 
Committed: Wed Jun 20 17:09:37 2018 -0700

--
 bin/docker-image-tool.sh | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/15747cfd/bin/docker-image-tool.sh
--
diff --git a/bin/docker-image-tool.sh b/bin/docker-image-tool.sh
index a871ab5..a3f1bcf 100755
--- a/bin/docker-image-tool.sh
+++ b/bin/docker-image-tool.sh
@@ -70,17 +70,18 @@ function build {
   local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/Dockerfile"}
   local 
PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/spark/bindings/python/Dockerfile"}
 
-  docker build "${BUILD_ARGS[@]}" \
+  docker build $NOCACHEARG "${BUILD_ARGS[@]}" \
 -t $(image_ref spark) \
 -f "$BASEDOCKERFILE" .
 
-docker build "${BINDING_BUILD_ARGS[@]}" \
+  docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
 -t $(image_ref spark-py) \
 -f "$PYDOCKERFILE" .
 }
 
 function push {
   docker push "$(image_ref spark)"
+  docker push "$(image_ref spark-py)"
 }
 
 function usage {
@@ -99,6 +100,7 @@ Options:
   -r repo Repository address.
   -t tag  Tag to apply to the built image, or to identify the image to be 
pushed.
   -m  Use minikube's Docker daemon.
+  -n  Build docker image with --no-cache
 
 Using minikube when building images will do so directly into minikube's Docker 
daemon.
 There is no need to push the images into minikube in that case, they'll be 
automatically
@@ -127,7 +129,8 @@ REPO=
 TAG=
 BASEDOCKERFILE=
 PYDOCKERFILE=
-while getopts f:mr:t: option
+NOCACHEARG=
+while getopts f:mr:t:n option
 do
  case "${option}"
  in
@@ -135,6 +138,7 @@ do
  p) PYDOCKERFILE=${OPTARG};;
  r) REPO=${OPTARG};;
  t) TAG=${OPTARG};;
+ n) NOCACHEARG="--no-cache";;
  m)
if ! which minikube 1>/dev/null; then
  error "Cannot find minikube."


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] spark issue #21555: [SPARK-24547][K8S] Allow for building spark on k8s docke...

2018-06-20 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21555
  
Will follow-up. Thanks all for the comments! Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21555: [SPARK-24547][K8S] Allow for building spark on k8s docke...

2018-06-20 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21555
  
@mccheah, that is a good point but I agree with @ifilonenko that we can do 
it in a subsequent PR. I'm thinking we merge this as-is and I can try to get a 
follow-up PR here for dockerfile refactor.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21555: [SPARK-24547][K8S] Allow for building spark on k8s docke...

2018-06-19 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21555
  
Lgtm. Will merge when tests pass. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21551: [K8S] Fix issue in 'docker-image-tool.sh'

2018-06-19 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21551
  
Phew, this was a bad one. Thanks for the fix all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...

2018-06-14 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21366
  
If last round's comments are addressed, LGTM from me. Important behavior to 
check is -  the snapshot, and creating replacement executors based on captured 
snapshot.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: [SPARK-23010][k8s] Initial checkin of k8s integration te...

2018-06-08 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20697
  
Thanks @mccheah and @ssuchter! Added a deprecation notice to the repo in 
https://github.com/apache-spark-on-k8s/spark-integration


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-06-07 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21366
  
If we did miss the watch as you said, when the next poll comes along, that 
should capture the fact that a pod that we thought existed doesn't exist 
anymore in the API. So, that would be the level triggered approach - where 
we'll see the expected state diverge from the current state and take action.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-06-07 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21366
  
Yes, watches + polling things is common in controllers. Shared informers 
are a good stable abstractions they use so, they can avoid the polling in most 
cases. However, for now, If we just use the watch as a trigger to pull all the 
data we're interested in and then run our control loop, that's a fine middle 
ground.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-06-07 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21366
  
Sorry, not yet. I explained to @mccheah what I meant about us preferring 
the level triggered approach and always being able to rebuild state even if 
some events are lost and resyncing completely on events. If those things look 
good, @mccheah, please go ahead with the merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-06-04 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21366
  
I was planning to do one last pass over it today/tomorrow. I can merge 
after that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-24232][K8S] Add support for secret env vars

2018-05-31 Thread foxish
Repository: spark
Updated Branches:
  refs/heads/master cc976f6cb -> 21e1fc7d4


[SPARK-24232][K8S] Add support for secret env vars

## What changes were proposed in this pull request?

* Allows to refer a secret as an env var.
* Introduces new config properties in the form: 
spark.kubernetes{driver,executor}.secretKeyRef.ENV_NAME=name:key
  ENV_NAME is case sensitive.

* Updates docs.
* Adds required unit tests.

## How was this patch tested?
Manually tested and confirmed that the secrets exist in driver's and executor's 
container env.
Also job finished successfully.
First created a secret with the following yaml:
```
apiVersion: v1
kind: Secret
metadata:
  name: test-secret
data:
  username: c3RhdnJvcwo=
  password: Mzk1MjgkdmRnN0pi

---

$ echo -n 'stavros' | base64
c3RhdnJvcw==
$ echo -n '39528$vdg7Jb' | base64
MWYyZDFlMmU2N2Rm
```
Run a job as follows:
```./bin/spark-submit \
  --master k8s://http://localhost:9000 \
  --deploy-mode cluster \
  --name spark-pi \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.executor.instances=1 \
  --conf spark.kubernetes.container.image=skonto/spark:k8envs3 \
  --conf 
spark.kubernetes.driver.secretKeyRef.MY_USERNAME=test-secret:username \
  --conf 
spark.kubernetes.driver.secretKeyRef.My_password=test-secret:password \
  --conf 
spark.kubernetes.executor.secretKeyRef.MY_USERNAME=test-secret:username \
  --conf 
spark.kubernetes.executor.secretKeyRef.My_password=test-secret:password \
  local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar 
1
```

Secret loaded correctly at the driver container:
![image](https://user-images.githubusercontent.com/7945591/40174346-7fee70c8-59dd-11e8-8705-995a5472716f.png)

Also if I log into the exec container:

kubectl exec -it spark-pi-1526555613156-exec-1 bash
bash-4.4# env

> SPARK_EXECUTOR_MEMORY=1g
> SPARK_EXECUTOR_CORES=1
> LANG=C.UTF-8
> HOSTNAME=spark-pi-1526555613156-exec-1
> SPARK_APPLICATION_ID=spark-application-1526555618626
> **MY_USERNAME=stavros**
>
> JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk
> KUBERNETES_PORT_443_TCP_PROTO=tcp
> KUBERNETES_PORT_443_TCP_ADDR=10.100.0.1
> JAVA_VERSION=8u151
> KUBERNETES_PORT=tcp://10.100.0.1:443
> PWD=/opt/spark/work-dir
> HOME=/root
> SPARK_LOCAL_DIRS=/var/data/spark-b569b0ae-b7ef-4f91-bcd5-0f55535d3564
> KUBERNETES_SERVICE_PORT_HTTPS=443
> KUBERNETES_PORT_443_TCP_PORT=443
> SPARK_HOME=/opt/spark
> SPARK_DRIVER_URL=spark://CoarseGrainedSchedulerspark-pi-1526555613156-driver-svc.default.svc:7078
> KUBERNETES_PORT_443_TCP=tcp://10.100.0.1:443
> SPARK_EXECUTOR_POD_IP=9.0.9.77
> TERM=xterm
> SPARK_EXECUTOR_ID=1
> SHLVL=1
> KUBERNETES_SERVICE_PORT=443
> SPARK_CONF_DIR=/opt/spark/conf
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/jvm/java-1.8-openjdk/jre/bin:/usr/lib/jvm/java-1.8-openjdk/bin
> JAVA_ALPINE_VERSION=8.151.12-r0
> KUBERNETES_SERVICE_HOST=10.100.0.1
> **My_password=39528$vdg7Jb**
> _=/usr/bin/env
>

Author: Stavros Kontopoulos 

Closes #21317 from skonto/k8s-fix-env-secrets.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/21e1fc7d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/21e1fc7d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/21e1fc7d

Branch: refs/heads/master
Commit: 21e1fc7d4aed688d7b685be6ce93f76752159c98
Parents: cc976f6
Author: Stavros Kontopoulos 
Authored: Thu May 31 14:28:33 2018 -0700
Committer: Anirudh Ramanathan 
Committed: Thu May 31 14:28:33 2018 -0700

--
 docs/running-on-kubernetes.md   | 22 
 .../org/apache/spark/deploy/k8s/Config.scala|  2 +
 .../spark/deploy/k8s/KubernetesConf.scala   | 11 +++-
 .../k8s/features/EnvSecretsFeatureStep.scala| 57 +++
 .../k8s/submit/KubernetesDriverBuilder.scala| 11 +++-
 .../cluster/k8s/KubernetesExecutorBuilder.scala | 12 +++-
 .../spark/deploy/k8s/KubernetesConfSuite.scala  | 12 +++-
 .../features/BasicDriverFeatureStepSuite.scala  |  2 +
 .../BasicExecutorFeatureStepSuite.scala |  3 +
 ...rKubernetesCredentialsFeatureStepSuite.scala |  3 +
 .../DriverServiceFeatureStepSuite.scala |  6 ++
 .../features/EnvSecretsFeatureStepSuite.scala   | 59 
 .../features/KubernetesFeaturesTestUtils.scala  |  7 ++-
 .../features/LocalDirsFeatureStepSuite.scala|  1 +
 .../features/MountSecretsFeatureStepSuite.scala |  1 +
 .../spark/deploy/k8s/submit/ClientSuite.scala   |  1 +
 .../submit/KubernetesDriverBuilderSuite.scala   | 13 -
 .../k8s/KubernetesExecutorBuilderSuite.scala| 11 +++-
 18 files changed, 222 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/21e1fc7d/docs/running-on-kubernetes.md
--

[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-31 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21317
  
Merging to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...

2018-05-31 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21317#discussion_r192206460
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -602,4 +608,20 @@ specific to Spark on Kubernetes.

spark.kubernetes.executor.secrets.spark-secret=/etc/secrets.
   
 
+
--- End diff --

LGTM, thanks @skonto; will merge once tests pass.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21317: [SPARK-24232][k8s] Add support for secret env var...

2018-05-31 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21317#discussion_r192168016
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -602,4 +608,20 @@ specific to Spark on Kubernetes.

spark.kubernetes.executor.secrets.spark-secret=/etc/secrets.
   
 
+
--- End diff --

Please add a link to the docs here. 
https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21462: [SPARK-24428][K8S] Fix unused code

2018-05-30 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21462#discussion_r191936265
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -121,8 +121,8 @@ This URI is the location of the example jar that is 
already in the Docker image.
 
 If your application's dependencies are all hosted in remote locations like 
HDFS or HTTP servers, they may be referred to
 by their appropriate remote URIs. Also, application dependencies can be 
pre-mounted into custom-built Docker images.
-Those dependencies can be added to the classpath by referencing them with 
`local://` URIs and/or setting the
-`SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. The 
`local://` scheme is also required when referring to
--- End diff --

We should confirm that the new init-container-less implementation retains 
the capability of doing that - maybe through an e2e test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21462: [SPARK-24428][K8S] Fix unused code

2018-05-30 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21462#discussion_r191930265
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -121,8 +121,8 @@ This URI is the location of the example jar that is 
already in the Docker image.
 
 If your application's dependencies are all hosted in remote locations like 
HDFS or HTTP servers, they may be referred to
 by their appropriate remote URIs. Also, application dependencies can be 
pre-mounted into custom-built Docker images.
-Those dependencies can be added to the classpath by referencing them with 
`local://` URIs and/or setting the
-`SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. The 
`local://` scheme is also required when referring to
--- End diff --

Did we remove the ability to put jars somewhere in the image and add them 
to the classpath at runtime? We had this before with 
https://github.com/apache/spark/pull/20193. 

cc @mccheah 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21462: [SPARK-24428][K8S] Fix unused code

2018-05-30 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21462
  
jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use the Kubernetes API to popu...

2018-05-30 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21366#discussion_r191865387
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -154,6 +154,24 @@ private[spark] object Config extends Logging {
   .checkValue(interval => interval > 0, s"Logging interval must be a 
positive time value.")
   .createWithDefaultString("1s")
 
+  val KUBERNETES_EXECUTOR_API_POLLING_INTERVAL =
+ConfigBuilder("spark.kubernetes.executor.apiPollingInterval")
+  .doc("Interval between polls against the Kubernetes API server to 
inspect the " +
+"state of executors.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .checkValue(interval => interval > 0, s"API server polling interval 
must be a" +
+" positive time value.")
+  .createWithDefaultString("30s")
+
+  val KUBERNETES_EXECUTOR_EVENT_PROCESSING_INTERVAL =
--- End diff --

I think this option is hard to reason about and relies on understanding an 
implementation detail (the event queue). Why not just pick a default and leave 
it at that? What scenario do we see for the user to try and choose this value?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use the Kubernetes API to popu...

2018-05-30 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21366#discussion_r191868513
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsEventQueue.scala
 ---
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import io.fabric8.kubernetes.api.model.Pod
+
+private[spark] trait ExecutorPodsEventQueue {
--- End diff --

just curious - is there no event queue mechanism within Spark itself for 
reuse here? Somewhat tangentially, there looks to be 
[EventLoop](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/EventLoop.scala)
 in `org.apache.spark.util`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use the Kubernetes API to popu...

2018-05-30 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21366#discussion_r191866216
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
 ---
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong}
+
+import io.fabric8.kubernetes.api.model.{Pod, PodBuilder}
+import io.fabric8.kubernetes.client.KubernetesClient
+import scala.collection.mutable
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesConf
+import org.apache.spark.internal.Logging
+
+private[spark] class ExecutorPodsAllocator(
+conf: SparkConf,
+executorBuilder: KubernetesExecutorBuilder,
+kubernetesClient: KubernetesClient,
+eventQueue: ExecutorPodsEventQueue) extends Logging {
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+
+  private val totalExpectedExecutors = new AtomicInteger(0)
+
+  private val podAllocationSize = 
conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE)
+
+  private val podAllocationDelay = 
conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(throw new SparkException("Must specify the driver pod 
name"))
+
+  private val driverPod = kubernetesClient.pods()
+.withName(kubernetesDriverPodName)
+.get()
+
+  // Use sets of ids instead of counters to be able to handle duplicate 
events.
+
+  // Executor IDs that have been requested from Kubernetes but are not 
running yet.
+  private val pendingExecutors = mutable.Set.empty[Long]
+
+  // We could use CoarseGrainedSchedulerBackend#totalRegisteredExecutors 
here for tallying the
+  // executors that are running. But, here we choose instead to maintain 
all state within this
+  // class from the persecptive of the k8s API. Therefore whether or not 
this scheduler loop
+  // believes an executor is running is dictated by the K8s API rather 
than Spark's RPC events.
+  // We may need to consider where these perspectives may differ and which 
perspective should
+  // take precedence.
+  private val runningExecutors = mutable.Set.empty[Long]
+
+  def start(applicationId: String): Unit = {
+eventQueue.addSubscriber(podAllocationDelay) { updatedPods =>
+  processUpdatedPodEvents(applicationId, updatedPods)
+}
+  }
+
+  def setTotalExpectedExecutors(total: Int): Unit = 
totalExpectedExecutors.set(total)
+
+  private def processUpdatedPodEvents(applicationId: String, updatedPods: 
Seq[Pod]): Unit = {
+updatedPods.foreach { updatedPod =>
+  val execId = 
updatedPod.getMetadata.getLabels.get(SPARK_EXECUTOR_ID_LABEL).toLong
+  val phase = updatedPod.getStatus.getPhase.toLowerCase
+  phase match {
+case "running" =>
+  pendingExecutors -= execId
+  runningExecutors += execId
+case "failed" | "succeeded" | "error" =>
+  pendingExecutors -= execId
+  runningExecutors -= execId
+  }
+}
+
+val currentRunningExecutors = runningExecutors.size
+val currentTotalExpectedExecutors = totalExpectedExecutors.get
+if (pendingExecutors.isEmpty && currentRunningExecutors < 
currentTotalExpectedExecutors) {
+  val numExecutorsToAllocate = math.min(
+currentTotalExpectedExecutors - currentRunningExecutors, 
podAllocationSize)
+  logInfo(s"Going to request $numExecutorsToAllocate executors from 
Kubernetes.")
+  val newExecutorIds = mutable.Buffer.empty[Long]
+  

[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use the Kubernetes API to popu...

2018-05-30 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21366#discussion_r191869618
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleEventHandler.scala
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import com.google.common.cache.{Cache, CacheBuilder}
+import io.fabric8.kubernetes.api.model.Pod
+import io.fabric8.kubernetes.client.KubernetesClient
+import scala.collection.JavaConverters._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.scheduler.ExecutorExited
+import org.apache.spark.util.Utils
+
+private[spark] class ExecutorPodsLifecycleEventHandler(
+conf: SparkConf,
+executorBuilder: KubernetesExecutorBuilder,
+kubernetesClient: KubernetesClient,
+podsEventQueue: ExecutorPodsEventQueue,
+// Use a best-effort to track which executors have been removed 
already. It's not generally
+// job-breaking if we remove executors more than once but it's ideal 
if we make an attempt
+// to avoid doing so. Expire cache entries so that this data structure 
doesn't grow beyond
+// bounds.
+removedExecutorsCache: Cache[java.lang.Long, java.lang.Long]) {
+
+  import ExecutorPodsLifecycleEventHandler._
+
+  private val eventProcessingInterval = 
conf.get(KUBERNETES_EXECUTOR_EVENT_PROCESSING_INTERVAL)
+
+  def start(schedulerBackend: KubernetesClusterSchedulerBackend): Unit = {
+podsEventQueue.addSubscriber(eventProcessingInterval) { updatedPods =>
+  updatedPods.foreach { updatedPod =>
+processUpdatedPod(schedulerBackend, updatedPod)
+  }
+}
+  }
+
+  private def processUpdatedPod(
+  schedulerBackend: KubernetesClusterSchedulerBackend, updatedPod: 
Pod) = {
+val execId = 
updatedPod.getMetadata.getLabels.get(SPARK_EXECUTOR_ID_LABEL).toLong
+if (isDeleted(updatedPod)) {
+  removeExecutorFromSpark(schedulerBackend, updatedPod, execId)
+} else {
+  updatedPod.getStatus.getPhase.toLowerCase match {
+// TODO (SPARK-24135) - handle more classes of errors
+case "error" | "failed" | "succeeded" =>
+  // If deletion failed on a previous try, we can try again if 
resync informs us the pod
+  // is still around.
+  // Delete as best attempt - duplicate deletes will throw an 
exception but the end state
+  // of getting rid of the pod is what matters.
+  Utils.tryLogNonFatalError {
+kubernetesClient
+  .pods()
+  .withName(updatedPod.getMetadata.getName)
+  .delete()
+  }
+  removeExecutorFromSpark(schedulerBackend, updatedPod, execId)
+case _ =>
+  }
+}
+  }
+
+  private def removeExecutorFromSpark(
--- End diff --

Why not just `removeExecutor`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use the Kubernetes API to popu...

2018-05-30 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21366#discussion_r191871593
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
 ---
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong}
+
+import io.fabric8.kubernetes.api.model.{Pod, PodBuilder}
+import io.fabric8.kubernetes.client.KubernetesClient
+import scala.collection.mutable
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesConf
+import org.apache.spark.internal.Logging
+
+private[spark] class ExecutorPodsAllocator(
+conf: SparkConf,
+executorBuilder: KubernetesExecutorBuilder,
+kubernetesClient: KubernetesClient,
+eventQueue: ExecutorPodsEventQueue) extends Logging {
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+
+  private val totalExpectedExecutors = new AtomicInteger(0)
+
+  private val podAllocationSize = 
conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE)
+
+  private val podAllocationDelay = 
conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(throw new SparkException("Must specify the driver pod 
name"))
+
+  private val driverPod = kubernetesClient.pods()
+.withName(kubernetesDriverPodName)
+.get()
+
+  // Use sets of ids instead of counters to be able to handle duplicate 
events.
+
+  // Executor IDs that have been requested from Kubernetes but are not 
running yet.
+  private val pendingExecutors = mutable.Set.empty[Long]
+
+  // We could use CoarseGrainedSchedulerBackend#totalRegisteredExecutors 
here for tallying the
+  // executors that are running. But, here we choose instead to maintain 
all state within this
+  // class from the persecptive of the k8s API. Therefore whether or not 
this scheduler loop
+  // believes an executor is running is dictated by the K8s API rather 
than Spark's RPC events.
+  // We may need to consider where these perspectives may differ and which 
perspective should
+  // take precedence.
+  private val runningExecutors = mutable.Set.empty[Long]
+
+  def start(applicationId: String): Unit = {
+eventQueue.addSubscriber(podAllocationDelay) { updatedPods =>
+  processUpdatedPodEvents(applicationId, updatedPods)
+}
+  }
+
+  def setTotalExpectedExecutors(total: Int): Unit = 
totalExpectedExecutors.set(total)
+
+  private def processUpdatedPodEvents(applicationId: String, updatedPods: 
Seq[Pod]): Unit = {
--- End diff --

We're actually processing pods here right? Not the events themselves from 
the watch. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use the Kubernetes API to popu...

2018-05-30 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21366#discussion_r191865834
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
 ---
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster.k8s
+
+import java.util.concurrent.atomic.{AtomicInteger, AtomicLong}
+
+import io.fabric8.kubernetes.api.model.{Pod, PodBuilder}
+import io.fabric8.kubernetes.client.KubernetesClient
+import scala.collection.mutable
+
+import org.apache.spark.{SparkConf, SparkException}
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.KubernetesConf
+import org.apache.spark.internal.Logging
+
+private[spark] class ExecutorPodsAllocator(
+conf: SparkConf,
+executorBuilder: KubernetesExecutorBuilder,
+kubernetesClient: KubernetesClient,
+eventQueue: ExecutorPodsEventQueue) extends Logging {
+
+  private val EXECUTOR_ID_COUNTER = new AtomicLong(0L)
+
+  private val totalExpectedExecutors = new AtomicInteger(0)
+
+  private val podAllocationSize = 
conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE)
+
+  private val podAllocationDelay = 
conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY)
+
+  private val kubernetesDriverPodName = conf
+.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(throw new SparkException("Must specify the driver pod 
name"))
+
+  private val driverPod = kubernetesClient.pods()
+.withName(kubernetesDriverPodName)
+.get()
+
+  // Use sets of ids instead of counters to be able to handle duplicate 
events.
--- End diff --

nit: hanging comment? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21317: [SPARK-24232][k8s] Add support for secret env vars

2018-05-17 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21317
  
@skonto, looks like a test runner issue here. 
@ssuchter, can you PTAL?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...

2018-05-16 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21279
  
oops, I didn't notice this. Sorry @jerryshao, will take a look in the next 
day or two.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-10 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21238
  
SG. @liyinan926, let's revisit this if we hear from 2.3 users.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-10 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21238
  
@mccheah, wdyt? I just haven't heard from any users here of 2.3 - if you 
think it's useful for 2.3.1 and low risk, then please feel free to propose a 
cherrypick. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-10 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21238
  
Maintenance releases most often have fixes for stability. We could maybe 
backport this since it's not a new feature but an omission from before. If it 
is going to be some effort, thanks to all the refactors that went in so far, I 
think we should think twice about whether we need to. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-24137][K8S] Mount local directories as empty dir volumes.

2018-05-10 Thread foxish
Repository: spark
Updated Branches:
  refs/heads/master f4fed0512 -> 6282fc64e


[SPARK-24137][K8S] Mount local directories as empty dir volumes.

## What changes were proposed in this pull request?

Drastically improves performance and won't cause Spark applications to fail 
because they write too much data to the Docker image's specific file system. 
The file system's directories that back emptydir volumes are generally larger 
and more performant.

## How was this patch tested?

Has been in use via the prototype version of Kubernetes support, but lost in 
the transition to here.

Author: mcheah 

Closes #21238 from mccheah/mount-local-dirs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6282fc64
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6282fc64
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6282fc64

Branch: refs/heads/master
Commit: 6282fc64e32fc2f70e79ace14efd4922e4535dbb
Parents: f4fed05
Author: mcheah 
Authored: Thu May 10 11:36:41 2018 -0700
Committer: Anirudh Ramanathan 
Committed: Thu May 10 11:36:41 2018 -0700

--
 .../main/scala/org/apache/spark/SparkConf.scala |   5 +-
 .../k8s/features/LocalDirsFeatureStep.scala |  77 +
 .../k8s/submit/KubernetesDriverBuilder.scala|  10 +-
 .../cluster/k8s/KubernetesExecutorBuilder.scala |   9 +-
 .../features/LocalDirsFeatureStepSuite.scala| 111 +++
 .../submit/KubernetesDriverBuilderSuite.scala   |  13 ++-
 .../k8s/KubernetesExecutorBuilderSuite.scala|  12 +-
 7 files changed, 223 insertions(+), 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6282fc64/core/src/main/scala/org/apache/spark/SparkConf.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index 129956e..dab4095 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -454,8 +454,9 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable 
with Logging with Seria
*/
   private[spark] def validateSettings() {
 if (contains("spark.local.dir")) {
-  val msg = "In Spark 1.0 and later spark.local.dir will be overridden by 
the value set by " +
-"the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and 
LOCAL_DIRS in YARN)."
+  val msg = "Note that spark.local.dir will be overridden by the value set 
by " +
+"the cluster manager (via SPARK_LOCAL_DIRS in 
mesos/standalone/kubernetes and LOCAL_DIRS" +
+" in YARN)."
   logWarning(msg)
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6282fc64/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
new file mode 100644
index 000..70b3073
--- /dev/null
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import java.nio.file.Paths
+import java.util.UUID
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, HasMetadata, 
PodBuilder, VolumeBuilder, VolumeMountBuilder}
+
+import org.apache.spark.deploy.k8s.{KubernetesConf, 
KubernetesDriverSpecificConf, KubernetesRoleSpecificConf, SparkPod}
+
+private[spark] class LocalDirsFeatureStep(
+conf: KubernetesConf[_ <: KubernetesRoleSpecificConf],
+defaultLocalDir: String = s"/var/data/spark-${UUID.randomUUID}")
+  extends KubernetesFeatureConfigStep {
+
+  // 

[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-10 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21238
  
LGTM. Merging to master. 
Thanks @mccheah 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-09 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21092
  
jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

2018-05-09 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21092
  
jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21238: [SPARK-24137][K8s] Mount local directories as emp...

2018-05-09 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21238#discussion_r187116886
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -455,7 +455,8 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
   private[spark] def validateSettings() {
 if (contains("spark.local.dir")) {
   val msg = "In Spark 1.0 and later spark.local.dir will be overridden 
by the value set by " +
-"the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and 
LOCAL_DIRS in YARN)."
+"the cluster manager (via SPARK_LOCAL_DIRS in 
mesos/standalone/kubernetes and LOCAL_DIRS" +
--- End diff --

oops, I deleted a comment here accidentally. @rxin said that we could 
remove this warning about Spark 1.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

2018-05-08 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21241#discussion_r186879387
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -320,50 +322,83 @@ private[spark] class 
KubernetesClusterSchedulerBackend(
 override def eventReceived(action: Action, pod: Pod): Unit = {
   val podName = pod.getMetadata.getName
   val podIP = pod.getStatus.getPodIP
-
+  val podPhase = pod.getStatus.getPhase
   action match {
-case Action.MODIFIED if (pod.getStatus.getPhase == "Running"
+case Action.MODIFIED if (podPhase == "Running"
 && pod.getMetadata.getDeletionTimestamp == null) =>
   val clusterNodeName = pod.getSpec.getNodeName
   logInfo(s"Executor pod $podName ready, launched at 
$clusterNodeName as IP $podIP.")
   executorPodsByIPs.put(podIP, pod)
 
-case Action.DELETED | Action.ERROR =>
+case Action.MODIFIED if (podPhase == "Init:Error" || podPhase == 
"Init:CrashLoopBackoff")
+  && pod.getMetadata.getDeletionTimestamp == null =>
   val executorId = getExecutorId(pod)
-  logDebug(s"Executor pod $podName at IP $podIP was at $action.")
-  if (podIP != null) {
-executorPodsByIPs.remove(podIP)
+  failedInitExecutors.add(executorId)
+  if (failedInitExecutors.size >= executorMaxInitErrors) {
+val errorMessage = s"Aborting Spark application because 
$executorMaxInitErrors" +
+  s" executors failed to start. The maximum number of allowed 
startup failures is" +
+  s" $executorMaxInitErrors. Please contact your cluster 
administrator or increase" +
+  s" your setting of 
${KUBERNETES_EXECUTOR_MAX_INIT_ERRORS.key}."
+logError(errorMessage)
+
KubernetesClusterSchedulerBackend.this.scheduler.sc.stopInNewThread()
--- End diff --

The controllers have rate-limiting queues with exponential backoff. In the 
past, we've had issues (https://github.com/kubernetes/kubernetes/issues/30628, 
https://github.com/kubernetes/kubernetes/issues/27634 and many more) where a 
misconfigured queue has caused controllers to spew pods and retry. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

2018-05-08 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21241#discussion_r186869138
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -320,50 +322,83 @@ private[spark] class 
KubernetesClusterSchedulerBackend(
 override def eventReceived(action: Action, pod: Pod): Unit = {
   val podName = pod.getMetadata.getName
   val podIP = pod.getStatus.getPodIP
-
+  val podPhase = pod.getStatus.getPhase
   action match {
-case Action.MODIFIED if (pod.getStatus.getPhase == "Running"
+case Action.MODIFIED if (podPhase == "Running"
 && pod.getMetadata.getDeletionTimestamp == null) =>
   val clusterNodeName = pod.getSpec.getNodeName
   logInfo(s"Executor pod $podName ready, launched at 
$clusterNodeName as IP $podIP.")
   executorPodsByIPs.put(podIP, pod)
 
-case Action.DELETED | Action.ERROR =>
+case Action.MODIFIED if (podPhase == "Init:Error" || podPhase == 
"Init:CrashLoopBackoff")
--- End diff --

The termination reason is also a good source of info. kubectl looks at a 
set of these and turns it into what you see in the describe output - so, 
[similar 
logic](https://github.com/kubernetes/kubernetes/blob/6b94e872c63eeea2ed4fdc510c008e4ff9713953/pkg/printers/internalversion/printers.go#L547-L573)
 could be exercised. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

2018-05-08 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21241#discussion_r186861330
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -320,50 +322,83 @@ private[spark] class 
KubernetesClusterSchedulerBackend(
 override def eventReceived(action: Action, pod: Pod): Unit = {
   val podName = pod.getMetadata.getName
   val podIP = pod.getStatus.getPodIP
-
+  val podPhase = pod.getStatus.getPhase
   action match {
-case Action.MODIFIED if (pod.getStatus.getPhase == "Running"
+case Action.MODIFIED if (podPhase == "Running"
 && pod.getMetadata.getDeletionTimestamp == null) =>
   val clusterNodeName = pod.getSpec.getNodeName
   logInfo(s"Executor pod $podName ready, launched at 
$clusterNodeName as IP $podIP.")
   executorPodsByIPs.put(podIP, pod)
 
-case Action.DELETED | Action.ERROR =>
+case Action.MODIFIED if (podPhase == "Init:Error" || podPhase == 
"Init:CrashLoopBackoff")
+  && pod.getMetadata.getDeletionTimestamp == null =>
   val executorId = getExecutorId(pod)
-  logDebug(s"Executor pod $podName at IP $podIP was at $action.")
-  if (podIP != null) {
-executorPodsByIPs.remove(podIP)
+  failedInitExecutors.add(executorId)
+  if (failedInitExecutors.size >= executorMaxInitErrors) {
+val errorMessage = s"Aborting Spark application because 
$executorMaxInitErrors" +
+  s" executors failed to start. The maximum number of allowed 
startup failures is" +
+  s" $executorMaxInitErrors. Please contact your cluster 
administrator or increase" +
+  s" your setting of 
${KUBERNETES_EXECUTOR_MAX_INIT_ERRORS.key}."
+logError(errorMessage)
+
KubernetesClusterSchedulerBackend.this.scheduler.sc.stopInNewThread()
+throw new SparkException(errorMessage)
   }
+  handleFailedPod(action, pod, podName, podIP)
 
-  val executorExitReason = if (action == Action.ERROR) {
-logWarning(s"Received error event of executor pod $podName. 
Reason: " +
-  pod.getStatus.getReason)
-executorExitReasonOnError(pod)
-  } else if (action == Action.DELETED) {
-logWarning(s"Received delete event of executor pod $podName. 
Reason: " +
-  pod.getStatus.getReason)
-executorExitReasonOnDelete(pod)
-  } else {
-throw new IllegalStateException(
-  s"Unknown action that should only be DELETED or ERROR: 
$action")
-  }
-  podsWithKnownExitReasons.put(pod.getMetadata.getName, 
executorExitReason)
-
-  if 
(!disconnectedPodsByExecutorIdPendingRemoval.containsKey(executorId)) {
-log.warn(s"Executor with id $executorId was not marked as 
disconnected, but the " +
-  s"watch received an event of type $action for this executor. 
The executor may " +
-  "have failed to start in the first place and never 
registered with the driver.")
-  }
-  disconnectedPodsByExecutorIdPendingRemoval.put(executorId, pod)
+case Action.DELETED | Action.ERROR =>
+  handleFailedPod(action, pod, podName, podIP)
 
 case _ => logDebug(s"Received event of executor pod $podName: " + 
action)
   }
 }
 
+private def handleFailedPod(action: Action, pod: Pod, podName: String, 
podIP: String) = {
+  val executorId = getExecutorId(pod)
+  logDebug(s"Executor pod $podName at IP $podIP was at $action.")
+  if (podIP != null) {
+executorPodsByIPs.remove(podIP)
+  }
+
+  val executorExitReason = if (action == Action.ERROR) {
+logWarning(s"Received error event of executor pod $podName. 
Reason: " +
+  pod.getStatus.getReason)
+executorExitReasonOnError(pod)
+  } else if (action == Action.DELETED) {
+logWarning(s"Received delete event of executor pod $podName. 
Reason: " +
+  pod.getStatus.getReason)
+executorExitReasonOnDelete(pod)
+  } else if (action == Action.MODIFIED) {
+executorExitReasonOnInitError(pod)
--- End diff --

There isn't a doc, but I'm putting together an initial list. We can grow it 
as we discover more during operations. Wou

[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

2018-05-08 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21241#discussion_r186857969
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -320,50 +322,83 @@ private[spark] class 
KubernetesClusterSchedulerBackend(
 override def eventReceived(action: Action, pod: Pod): Unit = {
   val podName = pod.getMetadata.getName
   val podIP = pod.getStatus.getPodIP
-
+  val podPhase = pod.getStatus.getPhase
   action match {
-case Action.MODIFIED if (pod.getStatus.getPhase == "Running"
+case Action.MODIFIED if (podPhase == "Running"
 && pod.getMetadata.getDeletionTimestamp == null) =>
   val clusterNodeName = pod.getSpec.getNodeName
   logInfo(s"Executor pod $podName ready, launched at 
$clusterNodeName as IP $podIP.")
   executorPodsByIPs.put(podIP, pod)
 
-case Action.DELETED | Action.ERROR =>
+case Action.MODIFIED if (podPhase == "Init:Error" || podPhase == 
"Init:CrashLoopBackoff")
+  && pod.getMetadata.getDeletionTimestamp == null =>
   val executorId = getExecutorId(pod)
-  logDebug(s"Executor pod $podName at IP $podIP was at $action.")
-  if (podIP != null) {
-executorPodsByIPs.remove(podIP)
+  failedInitExecutors.add(executorId)
+  if (failedInitExecutors.size >= executorMaxInitErrors) {
+val errorMessage = s"Aborting Spark application because 
$executorMaxInitErrors" +
+  s" executors failed to start. The maximum number of allowed 
startup failures is" +
+  s" $executorMaxInitErrors. Please contact your cluster 
administrator or increase" +
+  s" your setting of 
${KUBERNETES_EXECUTOR_MAX_INIT_ERRORS.key}."
+logError(errorMessage)
+
KubernetesClusterSchedulerBackend.this.scheduler.sc.stopInNewThread()
--- End diff --

Always re-requesting executors has the potential to overwhelm etcd with 
failed pods especially for long running jobs. This seems overly conservative 
but a reasonable place to start - fail entire task upon N failures. With 
dynamic allocation, it should be possible to check that at least 
`spark.dynamicAllocation.minExecutors` are alive and make a decision 
accordingly.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-05-08 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21260
  
jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

2018-05-08 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21241#discussion_r186643210
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -320,50 +322,83 @@ private[spark] class 
KubernetesClusterSchedulerBackend(
 override def eventReceived(action: Action, pod: Pod): Unit = {
   val podName = pod.getMetadata.getName
   val podIP = pod.getStatus.getPodIP
-
+  val podPhase = pod.getStatus.getPhase
   action match {
-case Action.MODIFIED if (pod.getStatus.getPhase == "Running"
+case Action.MODIFIED if (podPhase == "Running"
 && pod.getMetadata.getDeletionTimestamp == null) =>
   val clusterNodeName = pod.getSpec.getNodeName
   logInfo(s"Executor pod $podName ready, launched at 
$clusterNodeName as IP $podIP.")
   executorPodsByIPs.put(podIP, pod)
 
-case Action.DELETED | Action.ERROR =>
+case Action.MODIFIED if (podPhase == "Init:Error" || podPhase == 
"Init:CrashLoopBackoff")
+  && pod.getMetadata.getDeletionTimestamp == null =>
   val executorId = getExecutorId(pod)
-  logDebug(s"Executor pod $podName at IP $podIP was at $action.")
-  if (podIP != null) {
-executorPodsByIPs.remove(podIP)
+  failedInitExecutors.add(executorId)
+  if (failedInitExecutors.size >= executorMaxInitErrors) {
+val errorMessage = s"Aborting Spark application because 
$executorMaxInitErrors" +
+  s" executors failed to start. The maximum number of allowed 
startup failures is" +
+  s" $executorMaxInitErrors. Please contact your cluster 
administrator or increase" +
+  s" your setting of 
${KUBERNETES_EXECUTOR_MAX_INIT_ERRORS.key}."
+logError(errorMessage)
+
KubernetesClusterSchedulerBackend.this.scheduler.sc.stopInNewThread()
+throw new SparkException(errorMessage)
   }
+  handleFailedPod(action, pod, podName, podIP)
 
-  val executorExitReason = if (action == Action.ERROR) {
-logWarning(s"Received error event of executor pod $podName. 
Reason: " +
-  pod.getStatus.getReason)
-executorExitReasonOnError(pod)
-  } else if (action == Action.DELETED) {
-logWarning(s"Received delete event of executor pod $podName. 
Reason: " +
-  pod.getStatus.getReason)
-executorExitReasonOnDelete(pod)
-  } else {
-throw new IllegalStateException(
-  s"Unknown action that should only be DELETED or ERROR: 
$action")
-  }
-  podsWithKnownExitReasons.put(pod.getMetadata.getName, 
executorExitReason)
-
-  if 
(!disconnectedPodsByExecutorIdPendingRemoval.containsKey(executorId)) {
-log.warn(s"Executor with id $executorId was not marked as 
disconnected, but the " +
-  s"watch received an event of type $action for this executor. 
The executor may " +
-  "have failed to start in the first place and never 
registered with the driver.")
-  }
-  disconnectedPodsByExecutorIdPendingRemoval.put(executorId, pod)
+case Action.DELETED | Action.ERROR =>
+  handleFailedPod(action, pod, podName, podIP)
 
 case _ => logDebug(s"Received event of executor pod $podName: " + 
action)
   }
 }
 
+private def handleFailedPod(action: Action, pod: Pod, podName: String, 
podIP: String) = {
+  val executorId = getExecutorId(pod)
+  logDebug(s"Executor pod $podName at IP $podIP was at $action.")
+  if (podIP != null) {
+executorPodsByIPs.remove(podIP)
+  }
+
+  val executorExitReason = if (action == Action.ERROR) {
+logWarning(s"Received error event of executor pod $podName. 
Reason: " +
+  pod.getStatus.getReason)
+executorExitReasonOnError(pod)
+  } else if (action == Action.DELETED) {
+logWarning(s"Received delete event of executor pod $podName. 
Reason: " +
+  pod.getStatus.getReason)
+executorExitReasonOnDelete(pod)
+  } else if (action == Action.MODIFIED) {
+executorExitReasonOnInitError(pod)
--- End diff --

I think `Action.MODIFIED` can be fired for a lot of other reasons. I was 
thinking that we should just use cont

[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

2018-05-08 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21241#discussion_r186642656
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -561,6 +561,13 @@ specific to Spark on Kubernetes.
 This is distinct from spark.executor.cores: it is only 
used and takes precedence over spark.executor.cores for specifying 
the executor pod cpu request if set. Task 
 parallelism, e.g., number of tasks an executor can run concurrently is 
not affected by this.
 
+
+  spark.kubernetes.executor.maxInitFailures
+  10
+  
+Maximum number of times executors are allowed to fail with an 
Init:Error state before failing the application. Note that Init:Error failures 
should not be caused by Spark itself because Spark does not attach 
init-containers to pods. Init-containers can be attached by the cluster itself. 
Users should check with their cluster administrator if these kinds of failures 
to start the executor pod occur frequently.
--- End diff --

Also, I would change the description here, because using init containers 
for injecting init-containers through mutable webhooks is not something that's 
all that common. Also should be linked to 
https://kubernetes.io/docs/admin/extensible-admission-controllers/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

2018-05-08 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21241#discussion_r186641837
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -561,6 +561,13 @@ specific to Spark on Kubernetes.
 This is distinct from spark.executor.cores: it is only 
used and takes precedence over spark.executor.cores for specifying 
the executor pod cpu request if set. Task 
 parallelism, e.g., number of tasks an executor can run concurrently is 
not affected by this.
 
+
+  spark.kubernetes.executor.maxInitFailures
+  10
+  
+Maximum number of times executors are allowed to fail with an 
Init:Error state before failing the application. Note that Init:Error failures 
should not be caused by Spark itself because Spark does not attach 
init-containers to pods. Init-containers can be attached by the cluster itself. 
Users should check with their cluster administrator if these kinds of failures 
to start the executor pod occur frequently.
--- End diff --

This is very specific and covering exactly one kind of error. I'd like this 
property to cover all initialization errors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

2018-05-08 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21241#discussion_r186642338
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala
 ---
@@ -320,50 +322,83 @@ private[spark] class 
KubernetesClusterSchedulerBackend(
 override def eventReceived(action: Action, pod: Pod): Unit = {
   val podName = pod.getMetadata.getName
   val podIP = pod.getStatus.getPodIP
-
+  val podPhase = pod.getStatus.getPhase
   action match {
-case Action.MODIFIED if (pod.getStatus.getPhase == "Running"
+case Action.MODIFIED if (podPhase == "Running"
 && pod.getMetadata.getDeletionTimestamp == null) =>
   val clusterNodeName = pod.getSpec.getNodeName
   logInfo(s"Executor pod $podName ready, launched at 
$clusterNodeName as IP $podIP.")
   executorPodsByIPs.put(podIP, pod)
 
-case Action.DELETED | Action.ERROR =>
+case Action.MODIFIED if (podPhase == "Init:Error" || podPhase == 
"Init:CrashLoopBackoff")
--- End diff --

Maybe have a separate set for all the error states we want to check. Having 
one place would make this easier to change in future.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21202: [SPARK-24129] [K8S] Add option to pass --build-arg's to ...

2018-05-02 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21202
  
jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21202: [SPARK-24129] [K8S] Add option to pass --build-arg's to ...

2018-05-02 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21202
  
LGTM pending test run


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-04-30 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21095
  
The testing is not a blocker for the review. When I said "tests for 
non-hostpath type volumes", I meant to say that we want to cover more than just 
hostpath mounts with the initial PR - because we might end up with something 
too specific. Sorry if that wasn't clear from my comment. I think doing 
`EmptyDir` mounts alongside hostpath (with as much code sharing as possible) 
would be a good idea since it's the most common volume type. With those two, I 
think we can push this forward and have the e2e come along side by side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-04-26 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21095
  
@madanadit, sorry about the long turnaround time here. I made a cursory 
pass over it - and it looks good. Before getting deeper into the code, I think 
we need more exhaustive unit testing for non-hostpath type volumes, and a 
couple of [e2e 
tests](https://github.com/apache-spark-on-k8s/spark-integration). Once those 
are done, I'm also happy to do a more detailed review here.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-04-18 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21095
  
@madanadit, thanks for following up with the PR and for your [design 
doc](https://docs.google.com/document/d/15-mk7UnOYNTXoF6EKaVlelWYc9DTrTXrYoodwDuAwY4/edit#heading=h.8jlem461uvwv).
 I had a few comments on the doc which I've added for discussion. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes

2018-04-18 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21095
  
jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

2018-04-18 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21092
  
Thanks for taking this on @ifilonenko. Left some initial comments on the PR 
without going too much in depth - since as you noted, it's WIP.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

2018-04-18 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r182523365
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala
 ---
@@ -29,9 +30,11 @@ private[spark] class KubernetesExecutorBuilder(
   def buildFromFeatures(
 kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]): 
SparkPod = {
 val baseFeatures = Seq(provideBasicStep(kubernetesConf))
-val allFeatures = if 
(kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
-  baseFeatures ++ Seq(provideSecretsStep(kubernetesConf))
-} else baseFeatures
+val maybeRoleSecretNamesStep = if 
(kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
+  Some(provideSecretsStep(kubernetesConf)) } else None
+val allFeatures: Seq[KubernetesFeatureConfigStep] =
--- End diff --

It does not need any changes/arg passing during executor pod construction?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

2018-04-18 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r182522479
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
 ---
@@ -71,7 +77,7 @@ private[spark] class BasicDriverFeatureStep(
   ("cpu", new QuantityBuilder(false).withAmount(limitCores).build())
 }
 
-val driverContainer = new ContainerBuilder(pod.container)
+val withoutArgsDriverContainer: ContainerBuilder = new 
ContainerBuilder(pod.container)
--- End diff --

The previous name seemed clearer to me. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

2018-04-18 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r182517091
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
@@ -62,6 +69,14 @@ case "$SPARK_K8S_CMD" in
   "$@"
 )
 ;;
+  driver-py)
+CMD=(
+  "$SPARK_HOME/bin/spark-submit"
+  --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
+  --deploy-mode client
+  "$@" $PYSPARK_PRIMARY $PYSPARK_SECONDARY
--- End diff --

Can we have more descriptive names for `PYSPARK_PRIMARY` and 
`PYSPARK_SECONDARY`? Maybe `PYSPARK_MAINAPP` and `PYSPARK_ARGS`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

2018-04-18 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r182518301
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
 ---
@@ -0,0 +1,33 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ARG base_img
+FROM $base_img
+WORKDIR /
+COPY python /opt/spark/python
+RUN apk add --no-cache python && \
+python -m ensurepip && \
+rm -r /usr/lib/python*/ensurepip && \
+pip install --upgrade pip setuptools && \
+rm -r /root/.cache
+ENV PYTHON_VERSION 2.7.13
--- End diff --

If we set this, are we implicitly imposing a contract on the base image to 
have this particular version of python installed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

2018-04-18 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r182520090
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilderSuite.scala
 ---
@@ -89,6 +97,29 @@ class KubernetesDriverBuilderSuite extends SparkFunSuite 
{
   SECRETS_STEP_TYPE)
   }
 
+  test("Apply Python step if main resource is python.") {
+val conf = KubernetesConf(
--- End diff --

Unrelated to this PR, but @mccheah, should we have something like the 
fluent/builder pattern here for `KubernetesConf` since it's grown to quite a 
few params. I'm happy to take a stab at it if we agree that's a good direction.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

2018-04-18 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21092#discussion_r182524173
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
 ---
@@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
 .addToRequests("memory", driverMemoryQuantity)
 .addToLimits("memory", driverMemoryQuantity)
 .endResources()
-  .addToArgs("driver")
+  .addToArgs(driverDockerContainer)
   .addToArgs("--properties-file", SPARK_CONF_PATH)
   .addToArgs("--class", conf.roleSpecificConf.mainClass)
-  // The user application jar is merged into the spark.jars list and 
managed through that
-  // property, so there is no need to reference it explicitly here.
-  .addToArgs(SparkLauncher.NO_RESOURCE)
-  .addToArgs(conf.roleSpecificConf.appArgs: _*)
-  .build()
 
+val driverContainer =
+  if (driverDockerContainer == "driver-py") {
--- End diff --

Wondering if we can discover if it's a Python application in a better way 
here. Probably using the built up spark conf?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

2018-04-18 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21092
  
cc @holdenk 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-13 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21032
  
@madanadit @liyinan926, can we do a short doc with all the types of volumes 
and config options and run that through the rest of the community?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[1/3] spark git commit: [SPARK-22839][K8S] Refactor to unify driver and executor pod builder APIs

2018-04-13 Thread foxish
Repository: spark
Updated Branches:
  refs/heads/master 0323e6146 -> a83ae0d9b


http://git-wip-us.apache.org/repos/asf/spark/blob/a83ae0d9/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountSecretsFeatureStepSuite.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountSecretsFeatureStepSuite.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountSecretsFeatureStepSuite.scala
new file mode 100644
index 000..9d02f56
--- /dev/null
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountSecretsFeatureStepSuite.scala
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import io.fabric8.kubernetes.api.model.PodBuilder
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.deploy.k8s.{KubernetesConf, 
KubernetesExecutorSpecificConf, SecretVolumeUtils, SparkPod}
+
+class MountSecretsFeatureStepSuite extends SparkFunSuite {
+
+  private val SECRET_FOO = "foo"
+  private val SECRET_BAR = "bar"
+  private val SECRET_MOUNT_PATH = "/etc/secrets/driver"
+
+  test("mounts all given secrets") {
+val baseDriverPod = SparkPod.initialPod()
+val secretNamesToMountPaths = Map(
+  SECRET_FOO -> SECRET_MOUNT_PATH,
+  SECRET_BAR -> SECRET_MOUNT_PATH)
+val sparkConf = new SparkConf(false)
+val kubernetesConf = KubernetesConf(
+  sparkConf,
+  KubernetesExecutorSpecificConf("1", new PodBuilder().build()),
+  "resource-name-prefix",
+  "app-id",
+  Map.empty,
+  Map.empty,
+  secretNamesToMountPaths,
+  Map.empty)
+
+val step = new MountSecretsFeatureStep(kubernetesConf)
+val driverPodWithSecretsMounted = step.configurePod(baseDriverPod).pod
+val driverContainerWithSecretsMounted = 
step.configurePod(baseDriverPod).container
+
+Seq(s"$SECRET_FOO-volume", s"$SECRET_BAR-volume").foreach { volumeName =>
+  assert(SecretVolumeUtils.podHasVolume(driverPodWithSecretsMounted, 
volumeName))
+}
+Seq(s"$SECRET_FOO-volume", s"$SECRET_BAR-volume").foreach { volumeName =>
+  assert(SecretVolumeUtils.containerHasVolume(
+driverContainerWithSecretsMounted, volumeName, SECRET_MOUNT_PATH))
+}
+  }
+}

http://git-wip-us.apache.org/repos/asf/spark/blob/a83ae0d9/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
index 6a50159..c1b203e 100644
--- 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala
@@ -16,22 +16,17 @@
  */
 package org.apache.spark.deploy.k8s.submit
 
-import scala.collection.JavaConverters._
-
-import com.google.common.collect.Iterables
 import io.fabric8.kubernetes.api.model._
 import io.fabric8.kubernetes.client.{KubernetesClient, Watch}
 import io.fabric8.kubernetes.client.dsl.{MixedOperation, 
NamespaceListVisitFromServerGetDeleteRecreateWaitApplicable, PodResource}
 import org.mockito.{ArgumentCaptor, Mock, MockitoAnnotations}
 import org.mockito.Mockito.{doReturn, verify, when}
-import org.mockito.invocation.InvocationOnMock
-import org.mockito.stubbing.Answer
 import org.scalatest.BeforeAndAfter
 import org.scalatest.mockito.MockitoSugar._
 
 import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesDriverSpec, 
KubernetesDriverSpecificConf, SparkPod}
 import org.apache.spark.deploy.k8s.Constants._
-import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
 
 class ClientSuite extends SparkFunSuite with BeforeAndAfter {
 
@@ -39,6 +34,74 @@ class ClientSuite extends 

[2/3] spark git commit: [SPARK-22839][K8S] Refactor to unify driver and executor pod builder APIs

2018-04-13 Thread foxish
http://git-wip-us.apache.org/repos/asf/spark/blob/a83ae0d9/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DependencyResolutionStep.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DependencyResolutionStep.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DependencyResolutionStep.scala
deleted file mode 100644
index 43de329..000
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DependencyResolutionStep.scala
+++ /dev/null
@@ -1,61 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.spark.deploy.k8s.submit.steps
-
-import java.io.File
-
-import io.fabric8.kubernetes.api.model.ContainerBuilder
-
-import org.apache.spark.deploy.k8s.Constants._
-import org.apache.spark.deploy.k8s.KubernetesUtils
-import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
-
-/**
- * Step that configures the classpath, spark.jars, and spark.files for the 
driver given that the
- * user may provide remote files or files with local:// schemes.
- */
-private[spark] class DependencyResolutionStep(
-sparkJars: Seq[String],
-sparkFiles: Seq[String]) extends DriverConfigurationStep {
-
-  override def configureDriver(driverSpec: KubernetesDriverSpec): 
KubernetesDriverSpec = {
-val resolvedSparkJars = KubernetesUtils.resolveFileUrisAndPath(sparkJars)
-val resolvedSparkFiles = KubernetesUtils.resolveFileUrisAndPath(sparkFiles)
-
-val sparkConf = driverSpec.driverSparkConf.clone()
-if (resolvedSparkJars.nonEmpty) {
-  sparkConf.set("spark.jars", resolvedSparkJars.mkString(","))
-}
-if (resolvedSparkFiles.nonEmpty) {
-  sparkConf.set("spark.files", resolvedSparkFiles.mkString(","))
-}
-val resolvedDriverContainer = if (resolvedSparkJars.nonEmpty) {
-  new ContainerBuilder(driverSpec.driverContainer)
-.addNewEnv()
-  .withName(ENV_MOUNTED_CLASSPATH)
-  .withValue(resolvedSparkJars.mkString(File.pathSeparator))
-  .endEnv()
-.build()
-} else {
-  driverSpec.driverContainer
-}
-
-driverSpec.copy(
-  driverContainer = resolvedDriverContainer,
-  driverSparkConf = sparkConf)
-  }
-}

http://git-wip-us.apache.org/repos/asf/spark/blob/a83ae0d9/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverConfigurationStep.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverConfigurationStep.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverConfigurationStep.scala
deleted file mode 100644
index 17614e0..000
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverConfigurationStep.scala
+++ /dev/null
@@ -1,30 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.spark.deploy.k8s.submit.steps
-
-import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
-
-/**
- * Represents a step in configuring the Spark driver pod.
- */
-private[spark] trait DriverConfigurationStep {
-
-  /**
-   * Apply some transformation to the previous state 

[3/3] spark git commit: [SPARK-22839][K8S] Refactor to unify driver and executor pod builder APIs

2018-04-13 Thread foxish
[SPARK-22839][K8S] Refactor to unify driver and executor pod builder APIs

## What changes were proposed in this pull request?

Breaks down the construction of driver pods and executor pods in a way that 
uses a common abstraction for both spark-submit creating the driver and 
KubernetesClusterSchedulerBackend creating the executor. Encourages more code 
reuse and is more legible than the older approach.

The high-level design is discussed in more detail on the JIRA ticket. This pull 
request is the implementation of that design with some minor changes in the 
implementation details.

No user-facing behavior should break as a result of this change.

## How was this patch tested?

Migrated all unit tests from the old submission steps architecture to the new 
architecture. Integration tests should not have to change and pass given that 
this shouldn't change any outward behavior.

Author: mcheah 

Closes #20910 from mccheah/spark-22839-incremental.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a83ae0d9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a83ae0d9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a83ae0d9

Branch: refs/heads/master
Commit: a83ae0d9bc1b8f4909b9338370efe4020079bea7
Parents: 0323e61
Author: mcheah 
Authored: Fri Apr 13 08:43:58 2018 -0700
Committer: Anirudh Ramanathan 
Committed: Fri Apr 13 08:43:58 2018 -0700

--
 .../org/apache/spark/deploy/k8s/Config.scala|   2 +-
 .../spark/deploy/k8s/KubernetesConf.scala   | 184 ++
 .../spark/deploy/k8s/KubernetesDriverSpec.scala |  31 +++
 .../spark/deploy/k8s/KubernetesUtils.scala  |  11 -
 .../deploy/k8s/MountSecretsBootstrap.scala  |  72 --
 .../org/apache/spark/deploy/k8s/SparkPod.scala  |  34 +++
 .../k8s/features/BasicDriverFeatureStep.scala   | 136 ++
 .../k8s/features/BasicExecutorFeatureStep.scala | 179 ++
 ...DriverKubernetesCredentialsFeatureStep.scala | 216 
 .../k8s/features/DriverServiceFeatureStep.scala |  97 
 .../features/KubernetesFeatureConfigStep.scala  |  71 ++
 .../k8s/features/MountSecretsFeatureStep.scala  |  62 +
 .../k8s/submit/DriverConfigOrchestrator.scala   | 145 ---
 .../submit/KubernetesClientApplication.scala|  80 +++---
 .../k8s/submit/KubernetesDriverBuilder.scala|  56 +
 .../k8s/submit/KubernetesDriverSpec.scala   |  47 
 .../steps/BasicDriverConfigurationStep.scala| 163 
 .../submit/steps/DependencyResolutionStep.scala |  61 -
 .../submit/steps/DriverConfigurationStep.scala  |  30 ---
 .../steps/DriverKubernetesCredentialsStep.scala | 245 ---
 .../submit/steps/DriverMountSecretsStep.scala   |  38 ---
 .../steps/DriverServiceBootstrapStep.scala  | 104 
 .../cluster/k8s/ExecutorPodFactory.scala| 227 -
 .../cluster/k8s/KubernetesClusterManager.scala  |  12 +-
 .../k8s/KubernetesClusterSchedulerBackend.scala |  20 +-
 .../cluster/k8s/KubernetesExecutorBuilder.scala |  41 
 .../spark/deploy/k8s/KubernetesConfSuite.scala  | 175 +
 .../spark/deploy/k8s/KubernetesUtilsTest.scala  |  36 ---
 .../features/BasicDriverFeatureStepSuite.scala  | 153 
 .../BasicExecutorFeatureStepSuite.scala | 179 ++
 ...rKubernetesCredentialsFeatureStepSuite.scala | 174 +
 .../DriverServiceFeatureStepSuite.scala | 227 +
 .../features/KubernetesFeaturesTestUtils.scala  |  61 +
 .../features/MountSecretsFeatureStepSuite.scala |  58 +
 .../spark/deploy/k8s/submit/ClientSuite.scala   | 216 
 .../submit/DriverConfigOrchestratorSuite.scala  | 131 --
 .../submit/KubernetesDriverBuilderSuite.scala   | 102 
 .../BasicDriverConfigurationStepSuite.scala | 122 -
 .../steps/DependencyResolutionStepSuite.scala   |  69 --
 .../DriverKubernetesCredentialsStepSuite.scala  | 153 
 .../steps/DriverMountSecretsStepSuite.scala |  49 
 .../steps/DriverServiceBootstrapStepSuite.scala | 180 --
 .../cluster/k8s/ExecutorPodFactorySuite.scala   | 195 ---
 ...KubernetesClusterSchedulerBackendSuite.scala |  37 +--
 .../k8s/KubernetesExecutorBuilderSuite.scala|  75 ++
 45 files changed, 2482 insertions(+), 2274 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a83ae0d9/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 

[GitHub] spark issue #20910: [SPARK-22839] [K8s] Refactor to unify driver and executo...

2018-04-13 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20910
  
Merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-12 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21032
  
Yes, that is what I was indicating as well. We should have a design for all
different volume types. Extending this PR as the design evolves is a good
way to go about that.

On Thu, Apr 12, 2018, 11:37 AM Yinan Li <notificati...@github.com> wrote:

> I think we should have a more general solution so it works for other types
> of volumes. Particularly, we should have a design discussion on the format
> of values of such a property so it works for any type of volumes. I 
suggest
> we hold on on this PR until we setup a general solution. @foxish
> <https://github.com/foxish> @felixcheung <https://github.com/felixcheung>
> .
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21032#issuecomment-380903531>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/AA3U51kqvGuyzur2h9KXvBYaWcZU6ME5ks5tn571gaJpZM4TOyZd>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21032: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-04-12 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21032#discussion_r181176057
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -117,6 +117,13 @@ private[spark] object Config extends Logging {
   .stringConf
   .createWithDefault("spark")
 
+  val KUBERNETES_EXECUTOR_VOLUMES =
--- End diff --

Also, do we need a symmetric option for the drivers also?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21032: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-04-12 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/21032#discussion_r181172885
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -117,6 +117,13 @@ private[spark] object Config extends Logging {
   .stringConf
   .createWithDefault("spark")
 
+  val KUBERNETES_EXECUTOR_VOLUMES =
--- End diff --

If we are adding this option, it should be a more general API - allowing 
PVs, hostpath volumes and emptyDir volumes to be mounted. In the near future, 
hostpath volumes will be superseded by local PVs - 
https://github.com/kubernetes/kubernetes/issues/7562


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21032: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-12 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/21032
  
Jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20910: [SPARK-22839] [K8s] Refactor to unify driver and executo...

2018-04-12 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20910
  
@mccheah I think it's fine as is now. We can take care of moving 
abstractions between the submission client and the driver in a future PR if 
necessary. Just the scala style issue needs taking care of; and then this LGTM; 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20910: [SPARK-22839] [K8s] Refactor to unify driver and ...

2018-04-10 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/20910#discussion_r180535990
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesSpec.scala
 ---
@@ -14,17 +14,18 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.spark.deploy.k8s.submit.steps
+package org.apache.spark.deploy.k8s
 
-import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
+import io.fabric8.kubernetes.api.model.HasMetadata
 
-/**
- * Represents a step in configuring the Spark driver pod.
- */
-private[spark] trait DriverConfigurationStep {
+private[spark] case class KubernetesSpec(
--- End diff --

This class is named as though it applies to driver and executor 
construction. Maybe `KubernetesDriverSpec`? It's also a bit unclear to me what 
purpose this abstraction serves as opposed to the way 
`KubernetesExecutorBuilder` goes about building the pod.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20989: [SPARK-23529][K8s] Support mounting hostPath volumes for...

2018-04-05 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20989
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-23668][K8S] Add config option for passing through k8s Pod.spec.imagePullSecrets

2018-04-04 Thread foxish
Repository: spark
Updated Branches:
  refs/heads/master a35523653 -> cccaaa14a


[SPARK-23668][K8S] Add config option for passing through k8s 
Pod.spec.imagePullSecrets

## What changes were proposed in this pull request?

Pass through the `imagePullSecrets` option to the k8s pod in order to allow 
user to access private image registries.

See 
https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/

## How was this patch tested?

Unit tests + manual testing.

Manual testing procedure:
1. Have private image registry.
2. Spark-submit application with no `spark.kubernetes.imagePullSecret` set. Do 
`kubectl describe pod ...`. See the error message:
```
Error syncing pod, skipping: failed to "StartContainer" for 
"spark-kubernetes-driver" with ErrImagePull: "rpc error: code = 2 desc = Error: 
Status 400 trying to pull repository ...: \"{\\n  \\\"errors\\\" : [ {\\n
\\\"status\\\" : 400,\\n\\\"message\\\" : \\\"Unsupported docker v1 
repository request for '...'\\\"\\n  } ]\\n}\""
```
3. Create secret `kubectl create secret docker-registry ...`
4. Spark-submit with `spark.kubernetes.imagePullSecret` set to the new secret. 
See that deployment was successful.

Author: Andrew Korzhuev 
Author: Andrew Korzhuev 

Closes #20811 from andrusha/spark-23668-image-pull-secrets.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cccaaa14
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cccaaa14
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cccaaa14

Branch: refs/heads/master
Commit: cccaaa14ad775fb981e501452ba2cc06ff5c0f0a
Parents: a355236
Author: Andrew Korzhuev 
Authored: Wed Apr 4 12:30:52 2018 -0700
Committer: Anirudh Ramanathan 
Committed: Wed Apr 4 12:30:52 2018 -0700

--
 .../org/apache/spark/deploy/k8s/Config.scala|  7 
 .../spark/deploy/k8s/KubernetesUtils.scala  | 13 +++
 .../steps/BasicDriverConfigurationStep.scala|  7 +++-
 .../cluster/k8s/ExecutorPodFactory.scala|  4 +++
 .../spark/deploy/k8s/KubernetesUtilsTest.scala  | 36 
 .../BasicDriverConfigurationStepSuite.scala |  8 -
 .../cluster/k8s/ExecutorPodFactorySuite.scala   |  5 +++
 7 files changed, 78 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cccaaa14/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
index 405ea47..82f6c71 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
@@ -54,6 +54,13 @@ private[spark] object Config extends Logging {
   .checkValues(Set("Always", "Never", "IfNotPresent"))
   .createWithDefault("IfNotPresent")
 
+  val IMAGE_PULL_SECRETS =
+ConfigBuilder("spark.kubernetes.container.image.pullSecrets")
+  .doc("Comma separated list of the Kubernetes secrets used " +
+"to access private image registries.")
+  .stringConf
+  .createOptional
+
   val KUBERNETES_AUTH_DRIVER_CONF_PREFIX =
   "spark.kubernetes.authenticate.driver"
   val KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX =

http://git-wip-us.apache.org/repos/asf/spark/blob/cccaaa14/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
index 5bc0701..5b2bb81 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
@@ -16,6 +16,8 @@
  */
 package org.apache.spark.deploy.k8s
 
+import io.fabric8.kubernetes.api.model.LocalObjectReference
+
 import org.apache.spark.SparkConf
 import org.apache.spark.util.Utils
 
@@ -35,6 +37,17 @@ private[spark] object KubernetesUtils {
 sparkConf.getAllWithPrefix(prefix).toMap
   }
 
+  /**
+   * Parses comma-separated list of imagePullSecrets into K8s-understandable 
format
+   */
+  def parseImagePullSecrets(imagePullSecrets: Option[String]): 
List[LocalObjectReference] = {
+imagePullSecrets match {
+  case 

[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-04-04 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20811
  
LGTM.
Merging to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-04-04 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20811
  
@mccheah - heads up, this will likely lead to a rebase on 
https://github.com/apache/spark/pull/20910.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-04-04 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20811
  
@andrusha, no worries. I think this is going to be rather important for 
folks with private registries. Thanks for following up on it!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-04-04 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20811
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-04-04 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20811
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...

2018-04-03 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/20811#discussion_r178982578
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -54,6 +54,12 @@ private[spark] object Config extends Logging {
   .checkValues(Set("Always", "Never", "IfNotPresent"))
   .createWithDefault("IfNotPresent")
 
+  val IMAGE_PULL_SECRET =
+ConfigBuilder("spark.kubernetes.container.image.pullSecret")
--- End diff --

Also should be named `image.pullSecrets` to stay consistent with the field 
we're setting on the pods.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...

2018-04-03 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/20811#discussion_r178982439
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -54,6 +54,12 @@ private[spark] object Config extends Logging {
   .checkValues(Set("Always", "Never", "IfNotPresent"))
   .createWithDefault("IfNotPresent")
 
+  val IMAGE_PULL_SECRET =
+ConfigBuilder("spark.kubernetes.container.image.pullSecret")
--- End diff --

Based on a conversation with the node team, this is meant to be an array of 
possibly many secret names; since one could have multiple secrets specified for 
different registries and spread across multiple k8s secrets. 

So, I'd suggest we implement the full API - this should be capable of 
accepting a comma separated list of one or more secret names.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-04-03 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20811
  
LGTM after one comment. Thanks for fixing the style issues @andrusha 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20811: [SPARK-23668][K8S] Add config option for passing ...

2018-04-03 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/20811#discussion_r178924651
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -54,6 +54,12 @@ private[spark] object Config extends Logging {
   .checkValues(Set("Always", "Never", "IfNotPresent"))
   .createWithDefault("IfNotPresent")
 
+  val IMAGE_PULL_SECRET =
+ConfigBuilder("spark.kubernetes.imagePullSecret")
--- End diff --

Given there's an option `spark.kubernetes.container.image.pullPolicy`, we 
should make this consistent as `spark.kubernetes.container.image.pullSecret`






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-04-03 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20811
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



spark git commit: [SPARK-23285][K8S] Add a config property for specifying physical executor cores

2018-04-02 Thread foxish
Repository: spark
Updated Branches:
  refs/heads/master 6151f29f9 -> fe2b7a456


[SPARK-23285][K8S] Add a config property for specifying physical executor cores

## What changes were proposed in this pull request?

As mentioned in SPARK-23285, this PR introduces a new configuration property 
`spark.kubernetes.executor.cores` for specifying the physical CPU cores 
requested for each executor pod. This is to avoid changing the semantics of 
`spark.executor.cores` and `spark.task.cpus` and their role in task scheduling, 
task parallelism, dynamic resource allocation, etc. The new configuration 
property only determines the physical CPU cores available to an executor. An 
executor can still run multiple tasks simultaneously by using appropriate 
values for `spark.executor.cores` and `spark.task.cpus`.

## How was this patch tested?

Unit tests.

felixcheung srowen jiangxb1987 jerryshao mccheah foxish

Author: Yinan Li <y...@google.com>
Author: Yinan Li <liyinan...@gmail.com>

Closes #20553 from liyinan926/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fe2b7a45
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fe2b7a45
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fe2b7a45

Branch: refs/heads/master
Commit: fe2b7a4568d65a62da6e6eb00fff05f248b4332c
Parents: 6151f29
Author: Yinan Li <y...@google.com>
Authored: Mon Apr 2 12:20:55 2018 -0700
Committer: Anirudh Ramanathan <ramanath...@google.com>
Committed: Mon Apr 2 12:20:55 2018 -0700

--
 docs/running-on-kubernetes.md   | 15 ---
 .../org/apache/spark/deploy/k8s/Config.scala|  6 +
 .../cluster/k8s/ExecutorPodFactory.scala| 12 ++---
 .../cluster/k8s/ExecutorPodFactorySuite.scala   | 27 
 4 files changed, 53 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fe2b7a45/docs/running-on-kubernetes.md
--
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 975b28d..9c46449 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -549,14 +549,23 @@ specific to Spark on Kubernetes.
   spark.kubernetes.driver.limit.cores
   (none)
   
-Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
+Specify a hard cpu 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for the driver pod.
   
 
 
+  spark.kubernetes.executor.request.cores
+  (none)
+  
+Specify the cpu request for each executor pod. Values conform to the 
Kubernetes 
[convention](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu).
 
+Example values include 0.1, 500m, 1.5, 5, etc., with the definition of cpu 
units documented in [CPU 
units](https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#cpu-units).
   
+This is distinct from spark.executor.cores: it is only used 
and takes precedence over spark.executor.cores for specifying the 
executor pod cpu request if set. Task 
+parallelism, e.g., number of tasks an executor can run concurrently is not 
affected by this.
+
+
   spark.kubernetes.executor.limit.cores
   (none)
   
-Specify the hard CPU 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
+Specify a hard cpu 
[limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container)
 for each executor pod launched for the Spark Application.
   
 
 
@@ -593,4 +602,4 @@ specific to Spark on Kubernetes.
spark.kubernetes.executor.secrets.spark-secret=/etc/secrets.
   
 
-
\ No newline at end of file
+

http://git-wip-us.apache.org/repos/asf/spark/blob/fe2b7a45/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
--
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
index da34a7e..405ea47 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
@@ -91,6 +91,12 @@ private[spark] object Config extends L

[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-04-02 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20811
  
@andrusha, can you address the style check failure that @felixcheung 
pointed out. This should be good to merge after that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20553: [SPARK-23285][K8S] Add a config property for specifying ...

2018-04-02 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20553
  
Merging once tests pass. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >