[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2018-01-02 Thread echarles
Github user echarles commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r159282180
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

@yangw1234 Adding `libc6-compat` on the spark-k8s base image works for me. 
I read and write parquet without issue. May be you had it only on the 'driver` 
image or made other changes that override the correct behavior (just random 
guess...)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-18 Thread yangw1234
Github user yangw1234 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r157666054
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

@foxish The libc6-compat did not fix our issue because libiomp5.so can not 
be loaded.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-18 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r157643149
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

@yangw1234, can you confirm that libc6-compat fixed the issue you had?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-13 Thread erikerlandson
Github user erikerlandson commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r156841175
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

I'm in favor of using centos.  The compat layer makes me a bit nervous.  It 
may work fine but it smells like the kind of thing that could cause inscrutable 
binary incompatability bugs


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19717


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-11 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r156176319
  
--- Diff: docs/configuration.md ---
@@ -157,13 +157,33 @@ of the most common options to set are:
 or in your default properties file.
   
 
+
+  spark.driver.memoryOverhead
+  driverMemory * 0.10, with minimum of 384 
+  
+The amount of off-heap memory (in megabytes) to be allocated per 
driver in cluster mode. This is
+memory that accounts for things like VM overheads, interned strings, 
other native overheads, etc.
+This tends to grow with the container size (typically 6-10%). This 
option is supported in Yarn
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-11 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r156176301
  
--- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
@@ -1146,6 +1146,27 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 }
   }
 
+  test("check Kubernetes master URL") {
+val k8sMasterURLHttps = 
Utils.checkAndGetK8sMasterUrl("k8s://https://host:port;)
+assert(k8sMasterURLHttps == "k8s:https://host:port;)
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-11 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r156172100
  
--- Diff: docs/configuration.md ---
@@ -157,13 +157,33 @@ of the most common options to set are:
 or in your default properties file.
   
 
+
+  spark.driver.memoryOverhead
+  driverMemory * 0.10, with minimum of 384 
+  
+The amount of off-heap memory (in megabytes) to be allocated per 
driver in cluster mode. This is
+memory that accounts for things like VM overheads, interned strings, 
other native overheads, etc.
+This tends to grow with the container size (typically 6-10%). This 
option is supported in Yarn
--- End diff --

YARN. I'd also simplify:

"This option is currently supported on YARN and Kubernetes."

Same below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-11 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r156172335
  
--- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
@@ -1146,6 +1146,27 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 }
   }
 
+  test("check Kubernetes master URL") {
+val k8sMasterURLHttps = 
Utils.checkAndGetK8sMasterUrl("k8s://https://host:port;)
+assert(k8sMasterURLHttps == "k8s:https://host:port;)
--- End diff --

`===` (also in other places)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155979284
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155979118
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,42 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL. Prefix
+   * "k8s:" is appended to the resolved URL as the prefix is used by 
KubernetesClusterManager
+   * in canCreate to determine if the KubernetesClusterManager should be 
used.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
+
+// To handle master URLs, e.g., k8s://host:port.
+if (!masterWithoutK8sPrefix.contains("://")) {
--- End diff --

This is used to differentiate `https://host:port` vs. `host:port`. So 
`contains` is the right method here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155979008
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -294,6 +298,16 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
   }
 }
 
+if (clusterManager == KUBERNETES) {
+  args.master = Utils.checkAndGetK8sMasterUrl(args.master)
+  // Make sure YARN is included in our build if we're trying to use it
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155956779
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,42 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL. Prefix
+   * "k8s:" is appended to the resolved URL as the prefix is used by 
KubernetesClusterManager
+   * in canCreate to determine if the KubernetesClusterManager should be 
used.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
+
+// To handle master URLs, e.g., k8s://host:port.
+if (!masterWithoutK8sPrefix.contains("://")) {
--- End diff --

startsWith instead of contains?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155957034
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155956719
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -294,6 +298,16 @@ object SparkSubmit extends CommandLineUtils with 
Logging {
   }
 }
 
+if (clusterManager == KUBERNETES) {
+  args.master = Utils.checkAndGetK8sMasterUrl(args.master)
+  // Make sure YARN is included in our build if we're trying to use it
--- End diff --

this should say KUBERNETES


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155956360
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,43 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

we could release with Dockerfile as-is (since the source or lineage doesn't 
have licensing issue)
we just need to figure out the process of releasing the "convenient" docker 
images "binaries" on docker hub. there's probably a way to do but let's table 
that for now.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-09 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155935022
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,43 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

Got it. Should we defer updating the licensing information later when we 
have clearer on how we will maintain and release the docker images?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155934917
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,43 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

I was referring to the docker image openjdk
 
https://github.com/docker-library/openjdk/blob/b4f29ba829765552239bd18f272fcdaf09eca259/LICENSE

of course, the openjdk bin/bit is not under MIT license. I'm actually not 
quite sure how "releasing" the docker image goes from the licensing point of 
view...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155908121
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -119,5 +117,46 @@ private[spark] object Config extends Logging {
 "must be a positive integer")
   .createWithDefault(10)
 
+  val WAIT_FOR_APP_COMPLETION =
+ConfigBuilder("spark.kubernetes.submission.waitAppCompletion")
+  .doc("In cluster mode, whether to wait for the application to finish 
before exiting the " +
+"launcher process.")
+  .booleanConf
+  .createWithDefault(true)
+
+  val REPORT_INTERVAL =
+ConfigBuilder("spark.kubernetes.report.interval")
+  .doc("Interval between reports of the current app status in cluster 
mode.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .checkValue(interval => interval > 0, s"Logging interval must be a 
positive time value.")
+  .createWithDefaultString("1s")
+
+  private[spark] val JARS_DOWNLOAD_LOCATION =
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155908117
  
--- Diff: docs/configuration.md ---
@@ -157,13 +157,31 @@ of the most common options to set are:
 or in your default properties file.
   
 
+
+  spark.driver.memoryOverhead
+  driverMemory * 0.10, with minimum of 384 
+  
+The amount of off-heap memory (in megabytes) to be allocated per 
driver in cluster mode. This is
+memory that accounts for things like VM overheads, interned strings, 
other native overheads, etc.
+This tends to grow with the container size (typically 6-10%).
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155900585
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -119,5 +117,46 @@ private[spark] object Config extends Logging {
 "must be a positive integer")
   .createWithDefault(10)
 
+  val WAIT_FOR_APP_COMPLETION =
+ConfigBuilder("spark.kubernetes.submission.waitAppCompletion")
+  .doc("In cluster mode, whether to wait for the application to finish 
before exiting the " +
+"launcher process.")
+  .booleanConf
+  .createWithDefault(true)
+
+  val REPORT_INTERVAL =
+ConfigBuilder("spark.kubernetes.report.interval")
+  .doc("Interval between reports of the current app status in cluster 
mode.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .checkValue(interval => interval > 0, s"Logging interval must be a 
positive time value.")
+  .createWithDefaultString("1s")
+
+  private[spark] val JARS_DOWNLOAD_LOCATION =
--- End diff --

nit: `private[spark]` is redundant in this object.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155900650
  
--- Diff: docs/configuration.md ---
@@ -157,13 +157,31 @@ of the most common options to set are:
 or in your default properties file.
   
 
+
+  spark.driver.memoryOverhead
+  driverMemory * 0.10, with minimum of 384 
+  
+The amount of off-heap memory (in megabytes) to be allocated per 
driver in cluster mode. This is
+memory that accounts for things like VM overheads, interned strings, 
other native overheads, etc.
+This tends to grow with the container size (typically 6-10%).
--- End diff --

Should mention that not all cluster managers support this option, since 
this is now in the common configuration doc. Same below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155898310
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
+  s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-")
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
--- End diff --

Ah, I may have forgotten to actually make the change. Anyway, It's done now 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155896644
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
+  s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-")
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
--- End diff --

I don't follow. The code still says `Option.empty` when Matt asked it to be 
replace with `None`.


---

-

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155858142
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
+  s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-")
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
--- End diff --

Yes, it will eventually looks like 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155857389
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -380,6 +380,10 @@ class SparkContext(config: SparkConf) extends Logging {
 "Deployment to YARN is not supported directly by SparkContext. 
Please use spark-submit.")
 }
 
+if (master.startsWith("k8s") && deployMode == "client") {
--- End diff --

Moved the check into `KubernetesClusterManager`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155853123
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitOptionParser.java 
---
@@ -114,7 +114,7 @@
 { QUEUE },
 { REPOSITORIES },
 { STATUS },
-{ TOTAL_EXECUTOR_CORES },
+{ TOTAL_EXECUTOR_CORES }
--- End diff --

Reverted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155853090
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -668,7 +668,9 @@ private[spark] object SparkConf extends Logging {
 MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM.key -> Seq(
   AlternateConfig("spark.reducer.maxReqSizeShuffleToMem", "2.3")),
 LISTENER_BUS_EVENT_QUEUE_CAPACITY.key -> Seq(
-  AlternateConfig("spark.scheduler.listenerbus.eventqueue.size", 
"2.3"))
+  AlternateConfig("spark.scheduler.listenerbus.eventqueue.size", 
"2.3")),
+"spark.driver.memoryOverhead" -> Seq(
--- End diff --

Yes, we do need. Combined `spark.yarn.executor.memoryOverhead` and 
`spark.kubernetes.executor.memoryOverhead` into `spark.executor.memoryOverhead`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155849351
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

Done in 
https://github.com/apache/spark/pull/19717/commits/67bc847e4f71bd6974735ac43090a0df266bc5f3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155848638
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

I think we can do what's done in 
https://github.com/apache-spark-on-k8s/spark/pull/550 by adding libc6-compat. 
From what I read, libc6 and glibc seem to be the same version of libc. See 
http://www.linux-m68k.org/faq/glibcinfo.html.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155847061
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitOptionParser.java 
---
@@ -114,7 +114,7 @@
 { QUEUE },
 { REPOSITORIES },
 { STATUS },
-{ TOTAL_EXECUTOR_CORES },
+{ TOTAL_EXECUTOR_CORES }
--- End diff --

Don't change this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155847828
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
+  s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-")
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
--- End diff --

Doesn't look done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155845669
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -380,6 +380,10 @@ class SparkContext(config: SparkConf) extends Logging {
 "Deployment to YARN is not supported directly by SparkContext. 
Please use spark-submit.")
 }
 
+if (master.startsWith("k8s") && deployMode == "client") {
--- End diff --

Can you do this in `KubernetesClusterManager` instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155845314
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -668,7 +668,9 @@ private[spark] object SparkConf extends Logging {
 MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM.key -> Seq(
   AlternateConfig("spark.reducer.maxReqSizeShuffleToMem", "2.3")),
 LISTENER_BUS_EVENT_QUEUE_CAPACITY.key -> Seq(
-  AlternateConfig("spark.scheduler.listenerbus.eventqueue.size", 
"2.3"))
+  AlternateConfig("spark.scheduler.listenerbus.eventqueue.size", 
"2.3")),
+"spark.driver.memoryOverhead" -> Seq(
--- End diff --

`DRIVER_MEMORY_OVERHEAD.key`.

Don't you also need one for the executor in k8s?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155846548
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

When we send in the integration tests they will need to specify images so 
we're going to have to decide sooner or later.

I think using centos is fine, or fixing the base image to support glibc? 
Can't the Docker image be configured to use glibc despite using alpine?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155809447
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

/cc @foxish @mccheah thoughts on finding and testing an alternative base 
image vs. removing it from this PR?  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-08 Thread jason-dai
Github user jason-dai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155787931
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

I don't think it makes sense if the reference docker image does not work 
for common use cases because of the glibc issues (see discussions at 
https://github.com/apache-spark-on-k8s/spark/issues/559 and 
https://github.com/apache-spark-on-k8s/spark/issues/326). Now that we are in 
the process of merging the feature to Spark, I think we should fix these issues 
(or simply remove the reference image) before the merge.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-07 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155711631
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

@jason-dai Can you follow up on 
https://github.com/apache-spark-on-k8s/spark/issues/326 directly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-07 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155710752
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

The docker files in this PR are for references only for now, and are by no 
means the official docker images moving forward. We will address concerns 
around the base image as part of SPARK-22647.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-07 Thread jason-dai
Github user jason-dai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155710219
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

I wonder if there are any follow-ups of this particular issue 
(apache-spark-on-k8s#326).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-07 Thread yangw1234
Github user yangw1234 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155707713
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

The base image **openjdk:8-alpine** uses **musl libc** instead of 
**glibc**, so certain library (like Intel MKL for native BLAS support in mllib) 
cannot run on the container.  
There is also report on openjdk compatibility issue on alpine 
(https://github.com/apache-spark-on-k8s/spark/issues/326). 
Should we change to a base image that supports glibc?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-06 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155314730
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -388,6 +388,33 @@ class SparkSubmitSuite
 conf.get("spark.ui.enabled") should be ("false")
   }
 
+  test("handles k8s cluster mode") {
+val clArgs = Seq(
+  "--deploy-mode", "cluster",
+  "--master", "k8s://host:port",
+  "--executor-memory", "5g",
+  "--class", "org.SomeClass",
+  "--driver-memory", "4g",
+  "--conf", "spark.kubernetes.namespace=spark",
+  "--conf", "spark.kubernetes.driver.docker.image=bar",
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155313063
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -388,6 +388,33 @@ class SparkSubmitSuite
 conf.get("spark.ui.enabled") should be ("false")
   }
 
+  test("handles k8s cluster mode") {
+val clArgs = Seq(
+  "--deploy-mode", "cluster",
+  "--master", "k8s://host:port",
+  "--executor-memory", "5g",
+  "--class", "org.SomeClass",
+  "--driver-memory", "4g",
+  "--conf", "spark.kubernetes.namespace=spark",
+  "--conf", "spark.kubernetes.driver.docker.image=bar",
--- End diff --

Should also remove 
https://github.com/apache/spark/pull/19717/files#diff-63a5d817d2d45ae24de577f6a1bd80f9R595?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-06 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155311310
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -388,6 +388,33 @@ class SparkSubmitSuite
 conf.get("spark.ui.enabled") should be ("false")
   }
 
+  test("handles k8s cluster mode") {
+val clArgs = Seq(
+  "--deploy-mode", "cluster",
+  "--master", "k8s://host:port",
+  "--executor-memory", "5g",
+  "--class", "org.SomeClass",
+  "--driver-memory", "4g",
+  "--conf", "spark.kubernetes.namespace=spark",
+  "--conf", "spark.kubernetes.driver.docker.image=bar",
--- End diff --

`--kubernetes-namespace` has been removed in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-06 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155311302
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -41,6 +41,10 @@ package object config {
 .bytesConf(ByteUnit.MiB)
 .createWithDefaultString("1g")
 
+  private[spark] val DRIVER_MEMORY_OVERHEAD = 
ConfigBuilder("spark.driver.memoryOverhead")
--- End diff --

Yes, it's added under `spark.driver.memory`. See 
https://github.com/apache-spark-on-k8s/spark/blob/0e8ca012dc6d5dfd5645f45b6d1bb84e1e9e72a5/docs/configuration.md.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155303789
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -388,6 +388,33 @@ class SparkSubmitSuite
 conf.get("spark.ui.enabled") should be ("false")
   }
 
+  test("handles k8s cluster mode") {
+val clArgs = Seq(
+  "--deploy-mode", "cluster",
+  "--master", "k8s://host:port",
+  "--executor-memory", "5g",
+  "--class", "org.SomeClass",
+  "--driver-memory", "4g",
+  "--conf", "spark.kubernetes.namespace=spark",
+  "--conf", "spark.kubernetes.driver.docker.image=bar",
--- End diff --

Should we also test the arg "--kubernetes-namespace"?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155305418
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -55,14 +63,26 @@ private[spark] object Config extends Logging {
   val CA_CERT_FILE_CONF_SUFFIX = "caCertFile"
 
   val KUBERNETES_SERVICE_ACCOUNT_NAME =
-ConfigBuilder(s"$APISERVER_AUTH_DRIVER_CONF_PREFIX.serviceAccountName")
+
ConfigBuilder(s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.serviceAccountName")
   .doc("Service account that is used when running the driver pod. The 
driver pod uses " +
 "this service account when requesting executor pods from the API 
server. If specific " +
 "credentials are given for the driver pod to use, the driver will 
favor " +
 "using those credentials instead.")
   .stringConf
   .createOptional
 
+  val KUBERNETES_DRIVER_LIMIT_CORES =
+ConfigBuilder("spark.kubernetes.driver.limit.cores")
+  .doc("Specify the hard cpu limit for the driver pod")
+  .stringConf
+  .createOptional
+
+  val KUBERNETES_EXECUTOR_LIMIT_CORES =
--- End diff --

Seems this config is never used?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155300795
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -41,6 +41,10 @@ package object config {
 .bytesConf(ByteUnit.MiB)
 .createWithDefaultString("1g")
 
+  private[spark] val DRIVER_MEMORY_OVERHEAD = 
ConfigBuilder("spark.driver.memoryOverhead")
--- End diff --

nit: should also add doc for this config here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-06 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155306429
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -119,5 +130,46 @@ private[spark] object Config extends Logging {
 "must be a positive integer")
   .createWithDefault(10)
 
+  val WAIT_FOR_APP_COMPLETION =
+ConfigBuilder("spark.kubernetes.submission.waitAppCompletion")
+  .doc("In cluster mode, whether to wait for the application to finish 
before exiting the " +
+"launcher process.")
+  .booleanConf
+  .createWithDefault(true)
+
+  val REPORT_INTERVAL =
+ConfigBuilder("spark.kubernetes.report.interval")
+  .doc("Interval between reports of the current app status in cluster 
mode.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .checkValue(interval => interval > 0, s"Logging interval must be a 
positive time value.")
+  .createWithDefaultString("1s")
+
+  private[spark] val JARS_DOWNLOAD_LOCATION =
+ConfigBuilder("spark.kubernetes.mountDependencies.jarsDownloadDir")
+  .doc("Location to download jars to in the driver and executors. When 
using" +
+" spark-submit, this directory must be empty and will be mounted 
as an empty directory" +
+" volume on the driver and executor pod.")
+  .stringConf
+  .createWithDefault("/var/spark-data/spark-jars")
+
+  private[spark] val FILES_DOWNLOAD_LOCATION =
+ConfigBuilder("spark.kubernetes.mountDependencies.filesDownloadDir")
+  .doc("Location to download files to in the driver and executors. 
When using" +
+" spark-submit, this directory must be empty and will be mounted 
as an empty directory" +
+" volume on the driver and executor pods.")
+  .stringConf
+  .createWithDefault("/var/spark-data/spark-files")
+
+  val KUBERNETES_AUTH_SUBMISSION_CONF_PREFIX =
+"spark.kubernetes.authenticate.submission"
+
   val KUBERNETES_NODE_SELECTOR_PREFIX = "spark.kubernetes.node.selector."
+
+  val KUBERNETES_DRIVER_LABEL_PREFIX = "spark.kubernetes.driver.label."
+  val KUBERNETES_DRIVER_ANNOTATION_PREFIX = 
"spark.kubernetes.driver.annotation."
+
+  val KUBERNETES_EXECUTOR_LABEL_PREFIX = "spark.kubernetes.executor.label."
--- End diff --

This prefix is not used either.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155148270
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155125699
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155054674
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155047041
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,25 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
--- End diff --

OK, did some changes to make all supported formats work. See 
https://github.com/apache/spark/pull/19717/commits/51844cc60753456cdf8c2d34a660c239a3c8aa13.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155042677
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,25 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
--- End diff --

(see 
https://spark.apache.org/docs/latest/running-on-mesos.html#using-a-mesos-master-url)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155042592
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,25 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
--- End diff --

This decision is to match how Mesos works with Zookeeper, as these Strings 
are valid IIRC: `mesos://zk://host1:2181,host2:2181,host3:2181/mesos`. That's 
just the precedent we are using, but we can use a different convention.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155041880
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155041617
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,25 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
--- End diff --

Are those URIs set in stone? This seems more readable to me:

```
scala> new URI("k8s:http://foo:1234;)
res0: java.net.URI = k8s:http://foo:1234

scala> res0.getScheme
res1: String = k8s

scala> res0.getSchemeSpecificPart
res2: String = http://foo:1234
```

It also allows parsing using the `URI` APIs instead of doing string 
manipulation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155040383
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,25 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
--- End diff --

All of the following should be supported and resolved as follows:

- `k8s://host:port` -> `https://host:port`
- `k8s://https://host:port` -> `https://host:port`
- `k8s://http://host:port` -> `http://host:port`

Think we just need to use whatever code that is necessary to get to this 
state.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155039780
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {
  

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155039435
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {
  

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155039001
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {
  

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155038714
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,25 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
--- End diff --

With the changes, URLs like `k8s://host:port` is no longer valid as the 
part `host:port` is parsed as a URI with a `host` scheme. Instead, 
`k8s:///host:port` should be used. I'm not super familiar with URIs, so I'm not 
sure if this is desirable. /cc @mccheah @foxish.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155022020
  
--- Diff: docs/running-on-yarn.md ---
@@ -234,18 +234,11 @@ To use a custom metrics.properties for the 
application master and executors, upd
 The amount of off-heap memory (in megabytes) to be allocated per 
executor. This is memory that accounts for things like VM overheads, interned 
strings, other native overheads, etc. This tends to grow with the executor size 
(typically 6-10%).
   
 
-
-  spark.yarn.driver.memoryOverhead
--- End diff --

Ah, done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155016979
  
--- Diff: docs/running-on-yarn.md ---
@@ -234,18 +234,11 @@ To use a custom metrics.properties for the 
application master and executors, upd
 The amount of off-heap memory (in megabytes) to be allocated per 
executor. This is memory that accounts for things like VM overheads, interned 
strings, other native overheads, etc. This tends to grow with the executor size 
(typically 6-10%).
   
 
-
-  spark.yarn.driver.memoryOverhead
--- End diff --

Look for `configsWithAlternatives` in `SparkConf.scala`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155016194
  
--- Diff: docs/running-on-yarn.md ---
@@ -234,18 +234,11 @@ To use a custom metrics.properties for the 
application master and executors, upd
 The amount of off-heap memory (in megabytes) to be allocated per 
executor. This is memory that accounts for things like VM overheads, interned 
strings, other native overheads, etc. This tends to grow with the executor size 
(typically 6-10%).
   
 
-
-  spark.yarn.driver.memoryOverhead
--- End diff --

Is there an example PR for deprecating a config property that I can follow?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155013035
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,25 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
+if (masterWithoutK8sPrefix.startsWith("https://;)) {
+  masterWithoutK8sPrefix
+} else if (masterWithoutK8sPrefix.startsWith("http://;)) {
+  logWarning("Kubernetes master URL uses HTTP instead of HTTPS.")
+  masterWithoutK8sPrefix
+} else {
--- End diff --

Yeah, the code only applies if no scheme exists. Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155012873
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,25 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r155007004
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.UUID
+
+import com.google.common.primitives.Longs
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix = {
+val uuid = UUID.nameUUIDFromBytes(Longs.toByteArray(launchTime))
+s"$appName-$uuid".toLowerCase.replaceAll("\\.", "-")
+  }
+
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
+  }
+  

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154955367
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {
   

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154952728
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -590,6 +600,11 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 |  the node running the Application 
Master via the Secure
 |  Distributed Cache, for renewing the 
login tickets and the
 |  delegation tokens periodically.
+|
+| Kubernetes only:
+|  --kubernetes-namespace NS   The namespace in the Kubernetes 
cluster within which the
--- End diff --

+1, this can be removed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154952559
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -119,5 +139,60 @@ private[spark] object Config extends Logging {
 "must be a positive integer")
   .createWithDefault(10)
 
+  val WAIT_FOR_APP_COMPLETION =
+ConfigBuilder("spark.kubernetes.submission.waitAppCompletion")
+  .doc("In cluster mode, whether to wait for the application to finish 
before exiting the " +
+"launcher process.")
+  .booleanConf
+  .createWithDefault(true)
+
+  val REPORT_INTERVAL =
+ConfigBuilder("spark.kubernetes.report.interval")
+  .doc("Interval between reports of the current app status in cluster 
mode.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .createWithDefaultString("1s")
+
+  private[spark] val JARS_DOWNLOAD_LOCATION =
+ConfigBuilder("spark.kubernetes.mountDependencies.jarsDownloadDir")
+  .doc("Location to download jars to in the driver and executors. When 
using" +
+" spark-submit, this directory must be empty and will be mounted 
as an empty directory" +
+" volume on the driver and executor pod.")
+  .stringConf
+  .createWithDefault("/var/spark-data/spark-jars")
--- End diff --

We do need this directory to be writable and we can definitely document it. 
We are running things as root in the reference images within the container, and 
we can instead create and use a more restricted user - but that would imply an 
assumption on our end. @ash211, can you comment on your setup?

K8s also offers a way to change on the fly that using 
[PodSecurityPolicy](https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups),
 one can change the `runAsUser` config to run as a particular user. So, one can 
create a user "spark" on their nodes and have pod security policy enforce that 
all containers run as that user, i.e. our default to root doesn't prevent 
customization in that regard.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154875878
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.UUID
+
+import com.google.common.primitives.Longs
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix = {
+val uuid = UUID.nameUUIDFromBytes(Longs.toByteArray(launchTime))
+s"$appName-$uuid".toLowerCase.replaceAll("\\.", "-")
+  }
+
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
+  }
+  

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154870951
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,25 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
--- End diff --

Can we change this String representation to `URI` to make the below check 
more robust.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154874378
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154872371
  
--- Diff: docs/running-on-yarn.md ---
@@ -234,18 +234,11 @@ To use a custom metrics.properties for the 
application master and executors, upd
 The amount of off-heap memory (in megabytes) to be allocated per 
executor. This is memory that accounts for things like VM overheads, interned 
strings, other native overheads, etc. This tends to grow with the executor size 
(typically 6-10%).
   
 
-
-  spark.yarn.driver.memoryOverhead
--- End diff --

I think we should make this configuration backward compatible, user should 
still be able to use `spark.yarn.driver.memoryOverhead` with warning log, like 
other deprecated configurations.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-05 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154871648
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2744,6 +2744,25 @@ private[spark] object Utils extends Logging {
 }
   }
 
+  /**
+   * Check the validity of the given Kubernetes master URL and return the 
resolved URL.
+   */
+  def checkAndGetK8sMasterUrl(rawMasterURL: String): String = {
+require(rawMasterURL.startsWith("k8s://"),
+  "Kubernetes master URL must start with k8s://.")
+val masterWithoutK8sPrefix = rawMasterURL.substring("k8s://".length)
+if (masterWithoutK8sPrefix.startsWith("https://;)) {
+  masterWithoutK8sPrefix
+} else if (masterWithoutK8sPrefix.startsWith("http://;)) {
+  logWarning("Kubernetes master URL uses HTTP instead of HTTPS.")
+  masterWithoutK8sPrefix
+} else {
--- End diff --

Should this be happened only when scheme is not existed?  From the code 
looks like if user misconfigured with different scheme, the code will also be 
executed here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154846305
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -388,6 +388,32 @@ class SparkSubmitSuite
 conf.get("spark.ui.enabled") should be ("false")
   }
 
+  test("handles k8s cluster mode") {
+val clArgs = Seq(
+  "--deploy-mode", "cluster",
+  "--master", "k8s://host:port",
+  "--executor-memory", "5g",
+  "--class", "org.SomeClass",
+  "--kubernetes-namespace", "foo",
+  "--driver-memory", "4g",
+  "--conf", "spark.kubernetes.driver.docker.image=bar",
+  "/home/thejar.jar",
+  "arg1")
+val appArgs = new SparkSubmitArguments(clArgs)
+val (childArgs, classpath, conf, mainClass) = 
prepareSubmitEnvironment(appArgs)
+
+val childArgsMap = childArgs.grouped(2).map(a => a(0) -> a(1)).toMap
+childArgsMap.get("--primary-java-resource") should be 
(Some("file:/home/thejar.jar"))
+childArgsMap.get("--main-class") should be (Some("org.SomeClass"))
+childArgsMap.get("--arg") should be (Some("arg1"))
+mainClass should be ("org.apache.spark.deploy.k8s.submit.Client")
+classpath should have length (0)
+conf.get("spark.executor.memory") should be ("5g")
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154836110
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
--- End diff --

I think the biggest constraint here is labels, which cannot be longer than 
63 characters. API object names have a limit of 253 characters. Are we using 
the prefix in any label?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154834535
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {
  

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154834099
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -119,5 +139,60 @@ private[spark] object Config extends Logging {
 "must be a positive integer")
   .createWithDefault(10)
 
+  val WAIT_FOR_APP_COMPLETION =
+ConfigBuilder("spark.kubernetes.submission.waitAppCompletion")
+  .doc("In cluster mode, whether to wait for the application to finish 
before exiting the " +
+"launcher process.")
+  .booleanConf
+  .createWithDefault(true)
+
+  val REPORT_INTERVAL =
+ConfigBuilder("spark.kubernetes.report.interval")
+  .doc("Interval between reports of the current app status in cluster 
mode.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .createWithDefaultString("1s")
+
+  private[spark] val JARS_DOWNLOAD_LOCATION =
+ConfigBuilder("spark.kubernetes.mountDependencies.jarsDownloadDir")
+  .doc("Location to download jars to in the driver and executors. When 
using" +
+" spark-submit, this directory must be empty and will be mounted 
as an empty directory" +
+" volume on the driver and executor pod.")
+  .stringConf
+  .createWithDefault("/var/spark-data/spark-jars")
--- End diff --

To add to @vanzin's query - do we have a default user we expect to run the 
container as ? (or not run as ?)
Or does it not matter either way and is up to the user ?

Also, would be great if we documentation assumptions like this in the k8s 
documentation - which paths should be available, should be writable, ports 
accessible, etc (perhaps this is already in the docs, and I am yet to find it)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154833977
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
+  s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-")
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
+  }
+  mayBeResource
+} else {
+  Option.empty
--- End diff --

Done.


---

-
To 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154833969
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
+  s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-")
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154833947
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154833745
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BaseDriverConfigurationStep.scala
 ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit.steps
+
+import scala.collection.JavaConverters._
+
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder, 
EnvVarSourceBuilder, PodBuilder, QuantityBuilder}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
+
+/**
+ * Represents the initial setup required for the driver.
+ */
+private[spark] class BaseDriverConfigurationStep(
+kubernetesAppId: String,
+kubernetesResourceNamePrefix: String,
+driverLabels: Map[String, String],
+dockerImagePullPolicy: String,
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) extends DriverConfigurationStep {
+
+  private val kubernetesDriverPodName = 
submissionSparkConf.get(KUBERNETES_DRIVER_POD_NAME)
+.getOrElse(s"$kubernetesResourceNamePrefix-driver")
+
+  private val driverExtraClasspath = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_CLASS_PATH)
+
+  private val driverDockerImage = 
submissionSparkConf.get(DRIVER_DOCKER_IMAGE)
+
+  // CPU settings
+  private val driverCpuCores = 
submissionSparkConf.getOption("spark.driver.cores").getOrElse("1")
+  private val driverLimitCores = 
submissionSparkConf.get(KUBERNETES_DRIVER_LIMIT_CORES)
+
+  // Memory settings
+  private val driverMemoryMiB = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_MEMORY)
+  private val driverMemoryString = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_MEMORY.key,
+org.apache.spark.internal.config.DRIVER_MEMORY.defaultValueString)
+  private val memoryOverheadMiB = submissionSparkConf
+.get(KUBERNETES_DRIVER_MEMORY_OVERHEAD)
+.getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * driverMemoryMiB).toInt,
+  MEMORY_OVERHEAD_MIN_MIB))
+  private val driverContainerMemoryWithOverheadMiB = driverMemoryMiB + 
memoryOverheadMiB
+
+  override def configureDriver(
+  driverSpec: KubernetesDriverSpec): KubernetesDriverSpec = {
+val driverExtraClasspathEnv = driverExtraClasspath.map { classPath =>
+  new EnvVarBuilder()
+.withName(ENV_SUBMIT_EXTRA_CLASSPATH)
+.withValue(classPath)
+.build()
+}
--- End diff --

custom docker images ftw !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154833630
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
--- End diff --

I cant see rest of the context of this conversation - but UUID will be 36 
odd bytes, well within the 63 char limit ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154831875
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154815505
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverKubernetesCredentialsStep.scala
 ---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit.steps
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.language.implicitConversions
+
+import com.google.common.io.{BaseEncoding, Files}
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, PodBuilder, 
Secret, SecretBuilder}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
+
+/**
+ * Mounts Kubernetes credentials into the driver pod. The driver will use 
such mounted credentials
+ * to request executors.
+ */
+private[spark] class DriverKubernetesCredentialsStep(
+submissionSparkConf: SparkConf,
+kubernetesResourceNamePrefix: String) extends DriverConfigurationStep {
+
+  private val maybeMountedOAuthTokenFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$OAUTH_TOKEN_FILE_CONF_SUFFIX")
+  private val maybeMountedClientKeyFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CLIENT_KEY_FILE_CONF_SUFFIX")
+  private val maybeMountedClientCertFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CLIENT_CERT_FILE_CONF_SUFFIX")
+  private val maybeMountedCaCertFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CA_CERT_FILE_CONF_SUFFIX")
+  private val driverServiceAccount = 
submissionSparkConf.get(KUBERNETES_SERVICE_ACCOUNT_NAME)
+
+  override def configureDriver(driverSpec: KubernetesDriverSpec): 
KubernetesDriverSpec = {
+val driverSparkConf = driverSpec.driverSparkConf.clone()
+
+val oauthTokenBase64 = submissionSparkConf
+
.getOption(s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$OAUTH_TOKEN_CONF_SUFFIX")
+.map { token =>
+  
BaseEncoding.base64().encode(token.getBytes(StandardCharsets.UTF_8))
+}
+val caCertDataBase64 = safeFileConfToBase64(
+s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$CA_CERT_FILE_CONF_SUFFIX",
+"Driver CA cert file provided at %s does not exist or is not a 
file.")
+val clientKeyDataBase64 = safeFileConfToBase64(
+
s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$CLIENT_KEY_FILE_CONF_SUFFIX",
+"Driver client key file provided at %s does not exist or is not a 
file.")
+val clientCertDataBase64 = safeFileConfToBase64(
--- End diff --

These types of files are typically not password protected. Java KeyStores 
are. (Which reminds me that we should eventually support JKS files for these 
Kubernetes clients).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154814332
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {
  

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154816844
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
+  s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-")
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
+  }
+  mayBeResource
+} else {
+  Option.empty
--- End diff --

Nit: think `None` reads better.


---


[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154816823
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
+  s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-")
+  private val dockerImagePullPolicy = 
submissionSparkConf.get(DOCKER_IMAGE_PULL_POLICY)
+  private val jarsDownloadPath = 
submissionSparkConf.get(JARS_DOWNLOAD_LOCATION)
+  private val filesDownloadPath = 
submissionSparkConf.get(FILES_DOWNLOAD_LOCATION)
+
+  def getAllConfigurationSteps(): Seq[DriverConfigurationStep] = {
+val driverCustomLabels = ConfigurationUtils.parsePrefixedKeyValuePairs(
+  submissionSparkConf,
+  KUBERNETES_DRIVER_LABEL_PREFIX)
+require(!driverCustomLabels.contains(SPARK_APP_ID_LABEL), "Label with 
key " +
+  s"$SPARK_APP_ID_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+require(!driverCustomLabels.contains(SPARK_ROLE_LABEL), "Label with 
key " +
+  s"$SPARK_ROLE_LABEL is not allowed as it is reserved for Spark 
bookkeeping " +
+  "operations.")
+
+val allDriverLabels = driverCustomLabels ++ Map(
+  SPARK_APP_ID_LABEL -> kubernetesAppId,
+  SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
+
+val initialSubmissionStep = new BaseDriverConfigurationStep(
+  kubernetesAppId,
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  dockerImagePullPolicy,
+  appName,
+  mainClass,
+  appArgs,
+  submissionSparkConf)
+
+val driverAddressStep = new DriverServiceBootstrapStep(
+  kubernetesResourceNamePrefix,
+  allDriverLabels,
+  submissionSparkConf,
+  new SystemClock)
+
+val kubernetesCredentialsStep = new DriverKubernetesCredentialsStep(
+  submissionSparkConf, kubernetesResourceNamePrefix)
+
+val additionalMainAppJar = if (mainAppResource.nonEmpty) {
+   val mayBeResource = mainAppResource.get match {
+case JavaMainAppResource(resource) if resource != 
SparkLauncher.NO_RESOURCE =>
+  Some(resource)
+case _ => Option.empty
--- End diff --

Nit: think `None` reads better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For 

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154814488
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {
  

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154814516
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/Client.scala
 ---
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import java.util.{Collections, UUID}
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import io.fabric8.kubernetes.api.model._
+import io.fabric8.kubernetes.client.KubernetesClient
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.SparkApplication
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.SparkKubernetesClientFactory
+import org.apache.spark.deploy.k8s.submit.steps.DriverConfigurationStep
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.Utils
+
+/**
+ * Encapsulates arguments to the submission client.
+ *
+ * @param mainAppResource the main application resource if any
+ * @param mainClass the main class of the application to run
+ * @param driverArgs arguments to the driver
+ */
+private[spark] case class ClientArguments(
+ mainAppResource: Option[MainAppResource],
+ mainClass: String,
+ driverArgs: Array[String])
+
+private[spark] object ClientArguments {
+
+  def fromCommandLineArgs(args: Array[String]): ClientArguments = {
+var mainAppResource: Option[MainAppResource] = None
+var mainClass: Option[String] = None
+val driverArgs = mutable.ArrayBuffer.empty[String]
+
+args.sliding(2, 2).toList.foreach {
+  case Array("--primary-java-resource", primaryJavaResource: String) =>
+mainAppResource = Some(JavaMainAppResource(primaryJavaResource))
+  case Array("--main-class", clazz: String) =>
+mainClass = Some(clazz)
+  case Array("--arg", arg: String) =>
+driverArgs += arg
+  case other =>
+val invalid = other.mkString(" ")
+throw new RuntimeException(s"Unknown arguments: $invalid")
+}
+
+require(mainClass.isDefined, "Main class must be specified via 
--main-class")
+
+ClientArguments(
+  mainAppResource,
+  mainClass.get,
+  driverArgs.toArray)
+  }
+}
+
+/**
+ * Submits a Spark application to run on Kubernetes by creating the driver 
pod and starting a
+ * watcher that monitors and logs the application status. Waits for the 
application to terminate if
+ * spark.kubernetes.submission.waitAppCompletion is true.
+ *
+ * @param submissionSteps steps that collectively configure the driver
+ * @param submissionSparkConf the submission client Spark configuration
+ * @param kubernetesClient the client to talk to the Kubernetes API server
+ * @param waitForAppCompletion a flag indicating whether the client should 
wait for the application
+ * to complete
+ * @param appName the application name
+ * @param loggingPodStatusWatcher a watcher that monitors and logs the 
application status
+ */
+private[spark] class Client(
+submissionSteps: Seq[DriverConfigurationStep],
+submissionSparkConf: SparkConf,
+kubernetesClient: KubernetesClient,
+waitForAppCompletion: Boolean,
+appName: String,
+loggingPodStatusWatcher: LoggingPodStatusWatcher) extends Logging {
+
+  private val driverJavaOptions = submissionSparkConf.get(
+org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS)
+
+   /**
+* Run command that initializes a DriverSpec that will be updated after 
each
+* DriverConfigurationStep in the sequence that is passed in. The final 
KubernetesDriverSpec
+* will be used to build the Driver Container, Driver Pod, and 
Kubernetes Resources
+*/
+  def run(): Unit = {
  

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154814933
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
--- End diff --

UUIDs are a bit long which leads to name length problems since Kubernetes 
requires some names to be < 63 characters. I would agree that a UUID would be 
better if we can get away with it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154818466
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -119,5 +139,60 @@ private[spark] object Config extends Logging {
 "must be a positive integer")
   .createWithDefault(10)
 
+  val WAIT_FOR_APP_COMPLETION =
+ConfigBuilder("spark.kubernetes.submission.waitAppCompletion")
+  .doc("In cluster mode, whether to wait for the application to finish 
before exiting the " +
+"launcher process.")
+  .booleanConf
+  .createWithDefault(true)
+
+  val REPORT_INTERVAL =
+ConfigBuilder("spark.kubernetes.report.interval")
+  .doc("Interval between reports of the current app status in cluster 
mode.")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .createWithDefaultString("1s")
+
+  private[spark] val JARS_DOWNLOAD_LOCATION =
+ConfigBuilder("spark.kubernetes.mountDependencies.jarsDownloadDir")
+  .doc("Location to download jars to in the driver and executors. When 
using" +
+" spark-submit, this directory must be empty and will be mounted 
as an empty directory" +
+" volume on the driver and executor pod.")
+  .stringConf
+  .createWithDefault("/var/spark-data/spark-jars")
--- End diff --

/cc @foxish and @mccheah. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154818296
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -590,6 +600,11 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 |  the node running the Application 
Master via the Secure
 |  Distributed Cache, for renewing the 
login tickets and the
 |  delegation tokens periodically.
+|
+| Kubernetes only:
+|  --kubernetes-namespace NS   The namespace in the Kubernetes 
cluster within which the
--- End diff --

Reverted.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154813008
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -590,6 +600,11 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 |  the node running the Application 
Master via the Secure
 |  Distributed Cache, for renewing the 
login tickets and the
 |  delegation tokens periodically.
+|
+| Kubernetes only:
+|  --kubernetes-namespace NS   The namespace in the Kubernetes 
cluster within which the
--- End diff --

This was more for convenience to allow command line parameters for setting 
common Kubernetes options. We added these to be similar in spirit to the other 
cluster managers. We can revert these.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154807233
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -590,6 +600,11 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 |  the node running the Application 
Master via the Secure
 |  Distributed Cache, for renewing the 
login tickets and the
 |  delegation tokens periodically.
+|
+| Kubernetes only:
+|  --kubernetes-namespace NS   The namespace in the Kubernetes 
cluster within which the
--- End diff --

I don't think it's absolutely necessary to have a parameter for the 
namespace. `--kubernetes-namespace` is not significantly shorter than 
`spark.kubernetes.namespace`. @mccheah @foxish WDYT?  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154806055
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverKubernetesCredentialsStep.scala
 ---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit.steps
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.language.implicitConversions
+
+import com.google.common.io.{BaseEncoding, Files}
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, PodBuilder, 
Secret, SecretBuilder}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
+
+/**
+ * Mounts Kubernetes credentials into the driver pod. The driver will use 
such mounted credentials
+ * to request executors.
+ */
+private[spark] class DriverKubernetesCredentialsStep(
+submissionSparkConf: SparkConf,
+kubernetesResourceNamePrefix: String) extends DriverConfigurationStep {
+
+  private val maybeMountedOAuthTokenFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$OAUTH_TOKEN_FILE_CONF_SUFFIX")
+  private val maybeMountedClientKeyFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CLIENT_KEY_FILE_CONF_SUFFIX")
+  private val maybeMountedClientCertFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CLIENT_CERT_FILE_CONF_SUFFIX")
+  private val maybeMountedCaCertFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CA_CERT_FILE_CONF_SUFFIX")
+  private val driverServiceAccount = 
submissionSparkConf.get(KUBERNETES_SERVICE_ACCOUNT_NAME)
+
+  override def configureDriver(driverSpec: KubernetesDriverSpec): 
KubernetesDriverSpec = {
+val driverSparkConf = driverSpec.driverSparkConf.clone()
+
+val oauthTokenBase64 = submissionSparkConf
+
.getOption(s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$OAUTH_TOKEN_CONF_SUFFIX")
+.map { token =>
+  
BaseEncoding.base64().encode(token.getBytes(StandardCharsets.UTF_8))
+}
+val caCertDataBase64 = safeFileConfToBase64(
+s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$CA_CERT_FILE_CONF_SUFFIX",
+"Driver CA cert file provided at %s does not exist or is not a 
file.")
+val clientKeyDataBase64 = safeFileConfToBase64(
+
s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$CLIENT_KEY_FILE_CONF_SUFFIX",
+"Driver client key file provided at %s does not exist or is not a 
file.")
+val clientCertDataBase64 = safeFileConfToBase64(
+
s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$CLIENT_CERT_FILE_CONF_SUFFIX",
+"Driver client cert file provided at %s does not exist or is not a 
file.")
+
+val driverSparkConfWithCredentialsLocations = 
setDriverPodKubernetesCredentialLocations(
+driverSparkConf,
+oauthTokenBase64,
+caCertDataBase64,
+clientKeyDataBase64,
+clientCertDataBase64)
+
+val kubernetesCredentialsSecret = createCredentialsSecret(
+oauthTokenBase64,
+caCertDataBase64,
+clientKeyDataBase64,
+clientCertDataBase64)
+
+val driverPodWithMountedKubernetesCredentials = 
kubernetesCredentialsSecret.map { secret =>
+  new PodBuilder(driverSpec.driverPod)
+.editOrNewSpec()
+  .addNewVolume()
+.withName(DRIVER_CREDENTIALS_SECRET_VOLUME_NAME)
+
.withNewSecret().withSecretName(secret.getMetadata.getName).endSecret()
+.endVolume()
+  .endSpec()
+.build()
+}.getOrElse(
+  

[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154802853
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.ConfigurationUtils
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.steps._
+import org.apache.spark.launcher.SparkLauncher
+import org.apache.spark.util.SystemClock
+
+/**
+ * Constructs the complete list of driver configuration steps to run to 
deploy the Spark driver.
+ */
+private[spark] class DriverConfigurationStepsOrchestrator(
+namespace: String,
+kubernetesAppId: String,
+launchTime: Long,
+mainAppResource: Option[MainAppResource],
+appName: String,
+mainClass: String,
+appArgs: Array[String],
+submissionSparkConf: SparkConf) {
+
+  // The resource name prefix is derived from the application name, making 
it easy to connect the
+  // names of the Kubernetes resources from e.g. kubectl or the Kubernetes 
dashboard to the
+  // application the user submitted. However, we can't use the application 
name in the label, as
+  // label values are considerably restrictive, e.g. must be no longer 
than 63 characters in
+  // length. So we generate a separate identifier for the app ID itself, 
and bookkeeping that
+  // requires finding "all pods for this application" should use the 
kubernetesAppId.
+  private val kubernetesResourceNamePrefix =
--- End diff --

Yes, it needs to be unique. Changed to use `UUID.nameUUIDFromBytes` based 
on `launchTime`. @mccheah any concern about this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-04 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r154802593
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverKubernetesCredentialsStep.scala
 ---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit.steps
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.language.implicitConversions
+
+import com.google.common.io.{BaseEncoding, Files}
+import io.fabric8.kubernetes.api.model.{ContainerBuilder, PodBuilder, 
Secret, SecretBuilder}
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
+
+/**
+ * Mounts Kubernetes credentials into the driver pod. The driver will use 
such mounted credentials
+ * to request executors.
+ */
+private[spark] class DriverKubernetesCredentialsStep(
+submissionSparkConf: SparkConf,
+kubernetesResourceNamePrefix: String) extends DriverConfigurationStep {
+
+  private val maybeMountedOAuthTokenFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$OAUTH_TOKEN_FILE_CONF_SUFFIX")
+  private val maybeMountedClientKeyFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CLIENT_KEY_FILE_CONF_SUFFIX")
+  private val maybeMountedClientCertFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CLIENT_CERT_FILE_CONF_SUFFIX")
+  private val maybeMountedCaCertFile = submissionSparkConf.getOption(
+  
s"$KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX.$CA_CERT_FILE_CONF_SUFFIX")
+  private val driverServiceAccount = 
submissionSparkConf.get(KUBERNETES_SERVICE_ACCOUNT_NAME)
+
+  override def configureDriver(driverSpec: KubernetesDriverSpec): 
KubernetesDriverSpec = {
+val driverSparkConf = driverSpec.driverSparkConf.clone()
+
+val oauthTokenBase64 = submissionSparkConf
+
.getOption(s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$OAUTH_TOKEN_CONF_SUFFIX")
+.map { token =>
+  
BaseEncoding.base64().encode(token.getBytes(StandardCharsets.UTF_8))
+}
+val caCertDataBase64 = safeFileConfToBase64(
+s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$CA_CERT_FILE_CONF_SUFFIX",
+"Driver CA cert file provided at %s does not exist or is not a 
file.")
+val clientKeyDataBase64 = safeFileConfToBase64(
+
s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$CLIENT_KEY_FILE_CONF_SUFFIX",
+"Driver client key file provided at %s does not exist or is not a 
file.")
+val clientCertDataBase64 = safeFileConfToBase64(
+
s"$KUBERNETES_AUTH_DRIVER_CONF_PREFIX.$CLIENT_CERT_FILE_CONF_SUFFIX",
+"Driver client cert file provided at %s does not exist or is not a 
file.")
+
+val driverSparkConfWithCredentialsLocations = 
setDriverPodKubernetesCredentialLocations(
+driverSparkConf,
+oauthTokenBase64,
+caCertDataBase64,
+clientKeyDataBase64,
+clientCertDataBase64)
+
+val kubernetesCredentialsSecret = createCredentialsSecret(
+oauthTokenBase64,
+caCertDataBase64,
+clientKeyDataBase64,
+clientCertDataBase64)
+
+val driverPodWithMountedKubernetesCredentials = 
kubernetesCredentialsSecret.map { secret =>
+  new PodBuilder(driverSpec.driverPod)
+.editOrNewSpec()
+  .addNewVolume()
+.withName(DRIVER_CREDENTIALS_SECRET_VOLUME_NAME)
+
.withNewSecret().withSecretName(secret.getMetadata.getName).endSecret()
+.endVolume()
+  .endSpec()
+.build()
+}.getOrElse(
+  

  1   2   >