[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r183162778 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala --- @@ -101,17 +109,29 @@ private[spark] object KubernetesConf { appId: String, mainAppResource: Option[MainAppResource], mainClass: String, - appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = { + appArgs: Array[String], + maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = { val sparkConfWithMainAppJar = sparkConf.clone() +val additionalFiles = mutable.ArrayBuffer.empty[String] mainAppResource.foreach { - case JavaMainAppResource(res) => -val previousJars = sparkConf - .getOption("spark.jars") - .map(_.split(",")) - .getOrElse(Array.empty) -if (!previousJars.contains(res)) { - sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res)) -} +case JavaMainAppResource(res) => + val previousJars = sparkConf +.getOption("spark.jars") +.map(_.split(",")) +.getOrElse(Array.empty) + if (!previousJars.contains(res)) { +sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res)) + } +case nonJVM: NonJVMResource => + nonJVM match { +case PythonMainAppResource(res) => + additionalFiles += res + maybePyFiles.foreach{maybePyFiles => +additionalFiles.appendAll(maybePyFiles.split(","))} + sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res) + sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_APP_ARGS, appArgs.mkString(" ")) + } + sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4) --- End diff -- Very true, will need to ensure that it does not override the set value --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r183162555 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -54,7 +54,7 @@ private[spark] class BasicExecutorFeatureStep( private val memoryOverheadMiB = kubernetesConf .get(EXECUTOR_MEMORY_OVERHEAD) -.getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * executorMemoryMiB).toInt, +.getOrElse(math.max((kubernetesConf.get(MEMORY_OVERHEAD_FACTOR) * executorMemoryMiB).toInt, --- End diff -- Yup! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r183162517 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala --- @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep( .addToRequests("memory", driverMemoryQuantity) .addToLimits("memory", driverMemoryQuantity) .endResources() - .addToArgs("driver") + .addToArgs(driverDockerContainer) .addToArgs("--properties-file", SPARK_CONF_PATH) .addToArgs("--class", conf.roleSpecificConf.mainClass) - // The user application jar is merged into the spark.jars list and managed through that - // property, so there is no need to reference it explicitly here. - .addToArgs(SparkLauncher.NO_RESOURCE) - .addToArgs(conf.roleSpecificConf.appArgs: _*) - .build() +val driverContainer = + if (driverDockerContainer == "driver-py") { --- End diff -- We currently are only running the Python and future R step when we are leveraging a Python (or R) driver process. Else the user would just specify the spark-py docker-image no? and then just continue to run a non-Python driver process. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r183078482 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala --- @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep( .addToRequests("memory", driverMemoryQuantity) .addToLimits("memory", driverMemoryQuantity) .endResources() - .addToArgs("driver") + .addToArgs(driverDockerContainer) .addToArgs("--properties-file", SPARK_CONF_PATH) .addToArgs("--class", conf.roleSpecificConf.mainClass) - // The user application jar is merged into the spark.jars list and managed through that - // property, so there is no need to reference it explicitly here. - .addToArgs(SparkLauncher.NO_RESOURCE) - .addToArgs(conf.roleSpecificConf.appArgs: _*) - .build() +val driverContainer = + if (driverDockerContainer == "driver-py") { --- End diff -- The second way is the approach that I envisioned and tried to implement. It seems that the approach (without putting too much work on the KubernetesConf) breaks the contract we defined tho. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r182567449 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala --- @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep( .addToRequests("memory", driverMemoryQuantity) .addToLimits("memory", driverMemoryQuantity) .endResources() - .addToArgs("driver") + .addToArgs(driverDockerContainer) .addToArgs("--properties-file", SPARK_CONF_PATH) .addToArgs("--class", conf.roleSpecificConf.mainClass) - // The user application jar is merged into the spark.jars list and managed through that - // property, so there is no need to reference it explicitly here. - .addToArgs(SparkLauncher.NO_RESOURCE) - .addToArgs(conf.roleSpecificConf.appArgs: _*) - .build() +val driverContainer = + if (driverDockerContainer == "driver-py") { --- End diff -- We can check the appResource but that was already done. I thought it would be overkill to check twice since it was already handled in setting `driverDockerContainer` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r182567225 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala --- @@ -29,9 +30,11 @@ private[spark] class KubernetesExecutorBuilder( def buildFromFeatures( kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]): SparkPod = { val baseFeatures = Seq(provideBasicStep(kubernetesConf)) -val allFeatures = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) { - baseFeatures ++ Seq(provideSecretsStep(kubernetesConf)) -} else baseFeatures +val maybeRoleSecretNamesStep = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) { + Some(provideSecretsStep(kubernetesConf)) } else None +val allFeatures: Seq[KubernetesFeatureConfigStep] = --- End diff -- No, but there will be more features and I thought that doing options in the description of `allFeatures` was cleaner --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r182567081 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala --- @@ -71,7 +77,7 @@ private[spark] class BasicDriverFeatureStep( ("cpu", new QuantityBuilder(false).withAmount(limitCores).build()) } -val driverContainer = new ContainerBuilder(pod.container) +val withoutArgsDriverContainer: ContainerBuilder = new ContainerBuilder(pod.container) --- End diff -- Yes. look below --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/21092#discussion_r182566963 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile --- @@ -0,0 +1,33 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +ARG base_img +FROM $base_img +WORKDIR / +COPY python /opt/spark/python +RUN apk add --no-cache python && \ +python -m ensurepip && \ +rm -r /usr/lib/python*/ensurepip && \ +pip install --upgrade pip setuptools && \ +rm -r /root/.cache +ENV PYTHON_VERSION 2.7.13 --- End diff -- That is what I brought up in the PR description. And why this still a WIP. I need to investigate the proper way to determine whether we ship these containers with Python2 or Python3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...
GitHub user ifilonenko opened a pull request: https://github.com/apache/spark/pull/21092 [SPARK-23984][K8S][WIP] Initial Python Bindings for PySpark on K8s ## What changes were proposed in this pull request? Introducing Python Bindings for PySpark. - [ ] Running PySpark Jobs - [ ] Increased Default Memory Overhead value - [ ] Dependency Management for virtualenv/conda ## How was this patch tested? This patch was tested with - [ ] Unit Tests - [ ] Integration tests with [this addition](https://github.com/apache-spark-on-k8s/spark-integration/pull/46) ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run SparkPi with a test secret mounted into the driver and executor pods - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run PySpark on simple pi.py example Run completed in 4 minutes, 3 seconds. Total number of tests run: 9 Suites: completed 2, aborted 0 Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` ## Problematic Comments from [ifilonenko] - [ ] Currently Docker image is built with Python2 --> needs to be generic for Python2/3 - [ ] `--py-files` is properly distributing but it seems that example commands like ``` exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner /opt/spark/examples/src/main/python/pi.py /opt/spark/examples/src/main/python/sort.py ``` is causing errors of `/opt/spark/examples/src/main/python/pi.py` thinking that `/opt/spark/examples/src/main/python/sort.py is an argument You can merge this pull request into a Git repository by running: $ git pull https://github.com/ifilonenko/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21092.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21092 commit fb5b9ed83d4e5ed73bc44b9d719ac0e52702655e Author: Ilan Filonenko Date: 2018-04-16T03:23:43Z initial architecture for PySpark w/o dockerfile work commit b7b3db0abfbf425120fa21cc61e603c5d766f8af Author: Ilan Filonenko Date: 2018-04-17T19:13:45Z included entrypoint logic commit 98cef8ceb0f04cfcefbc482c2a0fe39c75f620c4 Author: Ilan Filonenko Date: 2018-04-18T02:22:55Z satisfying integration tests commit dc670dcd07944ae30b9b425c26250a21986b2699 Author: Ilan Filonenko Date: 2018-04-18T05:20:12Z end-to-end working pyspark commit eabe4b9b784f37cca3dd9bcff17110944b50f5c8 Author: Ilan Filonenko Date: 2018-04-18T05:20:42Z Merge pull request #1 from ifilonenko/py-spark Initial architecture for PySpark w/o dependency management --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r175262319 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -53,14 +53,10 @@ fi case "$SPARK_K8S_CMD" in driver) CMD=( - ${JAVA_HOME}/bin/java - "${SPARK_JAVA_OPTS[@]}" - -cp "$SPARK_CLASSPATH" - -Xms$SPARK_DRIVER_MEMORY - -Xmx$SPARK_DRIVER_MEMORY - -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS - $SPARK_DRIVER_CLASS - $SPARK_DRIVER_ARGS + "$SPARK_HOME/bin/spark-submit" + --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" + --deploy-mode client + "$@" --- End diff -- +1 on explicit envs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/20669 Results from integration testing: ``` Discovery starting. Discovery completed in 123 milliseconds. Run starting. Expected test count is: 8 KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run SparkPi with a test secret mounted into the driver and executor pods - Test extraJVMProprties being present on Driver - Run FileCheck using a Remote Data File Run completed in 3 minutes, 22 seconds. Total number of tests run: 8 Suites: completed 2, aborted 0 Tests: succeeded 8, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174902563 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala --- @@ -110,30 +109,29 @@ private[spark] class Client( for (nextStep <- submissionSteps) { currentDriverSpec = nextStep.configureDriver(currentDriverSpec) } - -val resolvedDriverJavaOpts = currentDriverSpec - .driverSparkConf - // Remove this as the options are instead extracted and set individually below using - // environment variables with prefix SPARK_JAVA_OPT_. - .remove(org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS) - .getAll - .map { -case (confKey, confValue) => s"-D$confKey=$confValue" - } ++ driverJavaOptions.map(Utils.splitCommandString).getOrElse(Seq.empty) -val driverJavaOptsEnvs: Seq[EnvVar] = resolvedDriverJavaOpts.zipWithIndex.map { - case (option, index) => -new EnvVarBuilder() - .withName(s"$ENV_JAVA_OPT_PREFIX$index") - .withValue(option) - .build() -} - +val configMapName = s"$kubernetesResourceNamePrefix-driver-conf-map" +val configMap = buildConfigMap(configMapName, currentDriverSpec.driverSparkConf) +// The include of the ENV_VAR for "SPARK_CONF_DIR" is to allow for the +// Spark command builder to pickup on the Java Options present in the ConfigMap val resolvedDriverContainer = new ContainerBuilder(currentDriverSpec.driverContainer) - .addAllToEnv(driverJavaOptsEnvs.asJava) + .addNewEnv() +.withName(SPARK_CONF_DIR_ENV) +.withValue(SPARK_CONF_PATH) --- End diff -- Do the executors require a SPARK_CONF_DIR directory to be defined as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174630216 --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkRemoteFileTest.scala --- @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// scalastyle:off println +package org.apache.spark.examples + +import java.io.File + +import org.apache.spark.SparkFiles +import org.apache.spark.sql.SparkSession + +/** Usage: SparkRemoteFileTest [file] */ +object SparkRemoteFileTest { --- End diff -- To test the presence of a remote file being mounted on the executors via the spark-submit being run by the driver. Should I add a Javadoc (`HDFSTest.scala` didn't include one) but I can if necessary --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174629033 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -89,26 +56,16 @@ private[spark] object KubernetesUtils { def getOnlyRemoteFiles(uris: Iterable[String]): Iterable[String] = { uris.filter { uri => val scheme = Utils.resolveURI(uri).getScheme - scheme != "file" && scheme != "local" + scheme != "local" } } - private def resolveFileUri( - uri: String, - fileDownloadPath: String, - assumesDownloaded: Boolean): String = { + private def resolveFileUri(uri: String): String = { val fileUri = Utils.resolveURI(uri) -val fileScheme = Option(fileUri.getScheme).getOrElse("file") +val fileScheme = Option(fileUri.getScheme).getOrElse("non-local") --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174628986 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -89,26 +56,16 @@ private[spark] object KubernetesUtils { def getOnlyRemoteFiles(uris: Iterable[String]): Iterable[String] = { uris.filter { uri => val scheme = Utils.resolveURI(uri).getScheme - scheme != "file" && scheme != "local" + scheme != "local" --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174628963 --- Diff: bin/docker-image-tool.sh --- @@ -63,9 +63,11 @@ function build { error "Cannot find docker image. This script must be run from a runnable distribution of Apache Spark." fi + local DOCKERFILE=${DOCKERFILE:-"$IMG_PATH/spark/Dockerfile"} --- End diff -- Not a problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174583140 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -89,26 +56,16 @@ private[spark] object KubernetesUtils { def getOnlyRemoteFiles(uris: Iterable[String]): Iterable[String] = { uris.filter { uri => val scheme = Utils.resolveURI(uri).getScheme - scheme != "file" && scheme != "local" + scheme != "local" --- End diff -- At the moment is not being used. Should it be removed then? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174582921 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala --- @@ -110,30 +109,29 @@ private[spark] class Client( for (nextStep <- submissionSteps) { currentDriverSpec = nextStep.configureDriver(currentDriverSpec) } - -val resolvedDriverJavaOpts = currentDriverSpec - .driverSparkConf - // Remove this as the options are instead extracted and set individually below using - // environment variables with prefix SPARK_JAVA_OPT_. - .remove(org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS) - .getAll - .map { -case (confKey, confValue) => s"-D$confKey=$confValue" - } ++ driverJavaOptions.map(Utils.splitCommandString).getOrElse(Seq.empty) -val driverJavaOptsEnvs: Seq[EnvVar] = resolvedDriverJavaOpts.zipWithIndex.map { - case (option, index) => -new EnvVarBuilder() - .withName(s"$ENV_JAVA_OPT_PREFIX$index") - .withValue(option) - .build() -} - +val configMapName = s"$kubernetesResourceNamePrefix-driver-conf-map" +val configMap = buildConfigMap(configMapName, currentDriverSpec.driverSparkConf) +// The include of the ENV_VAR for "SPARK_CONF_DIR" is to allow for the +// Spark command builder to pickup on the Java Options present in the ConfigMap val resolvedDriverContainer = new ContainerBuilder(currentDriverSpec.driverContainer) - .addAllToEnv(driverJavaOptsEnvs.asJava) + .addNewEnv() +.withName(SPARK_CONF_DIR_ENV) +.withValue(SPARK_CONF_PATH) --- End diff -- I am flexible to that, anyone else have thoughts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174582150 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala --- @@ -33,7 +33,9 @@ private[spark] class KubernetesClusterManager extends ExternalClusterManager wit override def canCreate(masterURL: String): Boolean = masterURL.startsWith("k8s") override def createTaskScheduler(sc: SparkContext, masterURL: String): TaskScheduler = { -if (masterURL.startsWith("k8s") && sc.deployMode == "client") { +if (masterURL.startsWith("k8s") && + sc.deployMode == "client" && + !sc.conf.contains(KUBERNETES_EXECUTOR_POD_NAME_PREFIX)) { --- End diff -- I believe that logic might be beyond the scope of this PR. But I could add that if it seems appropriate. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174350660 --- Diff: bin/docker-image-tool.sh --- @@ -63,9 +63,11 @@ function build { error "Cannot find docker image. This script must be run from a runnable distribution of Apache Spark." fi + local DOCKERFILE=${DOCKERFILE:-"$IMG_PATH/spark/Dockerfile"} --- End diff -- Uhm, this technically could be a separate PR. I was initially piggy-backing off the work of Marcelo in being able to build Dockerfiles via -f. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174348342 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala --- @@ -108,62 +111,42 @@ class ClientSuite extends SparkFunSuite with BeforeAndAfter { SecondTestConfigurationStep.containerName) } - test("The client should create the secondary Kubernetes resources.") { + test("The client should create Kubernetes resources") { val submissionClient = new Client( submissionSteps, new SparkConf(false), kubernetesClient, false, "spark", - loggingPodStatusWatcher) + loggingPodStatusWatcher, + KUBERNETES_RESOURCE_PREFIX) submissionClient.run() val createdPod = createdPodArgumentCaptor.getValue val otherCreatedResources = createdResourcesArgumentCaptor.getAllValues -assert(otherCreatedResources.size === 1) -val createdResource = Iterables.getOnlyElement(otherCreatedResources).asInstanceOf[Secret] -assert(createdResource.getMetadata.getName === FirstTestConfigurationStep.secretName) -assert(createdResource.getData.asScala === - Map(FirstTestConfigurationStep.secretKey -> FirstTestConfigurationStep.secretData)) -val ownerReference = Iterables.getOnlyElement(createdResource.getMetadata.getOwnerReferences) -assert(ownerReference.getName === createdPod.getMetadata.getName) -assert(ownerReference.getKind === DRIVER_POD_KIND) -assert(ownerReference.getUid === DRIVER_POD_UID) -assert(ownerReference.getApiVersion === DRIVER_POD_API_VERSION) - } - - test("The client should attach the driver container with the appropriate JVM options.") { -val sparkConf = new SparkConf(false) - .set("spark.logConf", "true") - .set( -org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS, - "-XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails") -val submissionClient = new Client( - submissionSteps, - sparkConf, - kubernetesClient, - false, - "spark", - loggingPodStatusWatcher) -submissionClient.run() -val createdPod = createdPodArgumentCaptor.getValue +assert(otherCreatedResources.size === 2) +otherCreatedResources.toArray.foreach{ --- End diff -- I thought it would be easier to just go through all the resources in a single loop, do you think this is better? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174345430 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -89,26 +56,16 @@ private[spark] object KubernetesUtils { def getOnlyRemoteFiles(uris: Iterable[String]): Iterable[String] = { uris.filter { uri => val scheme = Utils.resolveURI(uri).getScheme - scheme != "file" && scheme != "local" + scheme != "local" } } - private def resolveFileUri( - uri: String, - fileDownloadPath: String, - assumesDownloaded: Boolean): String = { + private def resolveFileUri(uri: String): String = { val fileUri = Utils.resolveURI(uri) -val fileScheme = Option(fileUri.getScheme).getOrElse("file") +val fileScheme = Option(fileUri.getScheme).getOrElse("non-local") --- End diff -- Valid, `match` statement is on `_` so it didn't matter what I put in. I can change this if need be --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174345374 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala --- @@ -89,26 +56,16 @@ private[spark] object KubernetesUtils { def getOnlyRemoteFiles(uris: Iterable[String]): Iterable[String] = { uris.filter { uri => val scheme = Utils.resolveURI(uri).getScheme - scheme != "file" && scheme != "local" + scheme != "local" --- End diff -- Unnecessary, but it should not be allowed in our Utils as we can't resolve the location without `SparkFiles.get()`which makes that logic useless. No? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r174345265 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala --- @@ -180,6 +179,26 @@ private[spark] class Client( originalMetadata.setOwnerReferences(Collections.singletonList(driverPodOwnerReference)) } } + + // Build a Config Map that will house both the properties file and the java options file --- End diff -- Forgot to add it. Will resolve this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/20669 Newest push passes all tests (with this merged I will then merge in [this](https://github.com/apache-spark-on-k8s/spark-integration/pull/42/files)) ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run SparkPi with a test secret mounted into the driver and executor pods - Run FileCheck using a Remote Data File Run completed in 2 minutes, 37 seconds. Total number of tests run: 7 Suites: completed 2, aborted 0 Tests: succeeded 7, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` I welcome the opinion of the community on the strategy for passing spark.driver.extraJavaOptions to the driver as I am currently specifying the `SPARK_CONF_DIR` to be pointed at the JAVA_PROPERTIES file. Open to any better suggestions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r173665102 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverConfigPropertiesStep.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.submit.steps + +import java.io.StringWriter +import java.util.Properties + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec + +/** + * Create a config map with the driver configuration and attach it to the pod. This needs to + * come at the end of the driver configuration so that all modifications to the Spark config + * are reflected in the generated config map. + */ +private[spark] class DriverConfigPropertiesStep(resourceNamePrefix: String) +extends DriverConfigurationStep { + + override def configureDriver(spec: KubernetesDriverSpec): KubernetesDriverSpec = { +val configMapName = s"$resourceNamePrefix-driver-conf-map" +val configMap = buildConfigMap(configMapName, spec.driverSparkConf) --- End diff -- I will also include `--conf spark.driver.extraJavaOptions=` in the spark-submit launched in the driver --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20669#discussion_r173663339 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverConfigPropertiesStep.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.submit.steps + +import java.io.StringWriter +import java.util.Properties + +import io.fabric8.kubernetes.api.model._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec + +/** + * Create a config map with the driver configuration and attach it to the pod. This needs to + * come at the end of the driver configuration so that all modifications to the Spark config + * are reflected in the generated config map. + */ +private[spark] class DriverConfigPropertiesStep(resourceNamePrefix: String) +extends DriverConfigurationStep { + + override def configureDriver(spec: KubernetesDriverSpec): KubernetesDriverSpec = { +val configMapName = s"$resourceNamePrefix-driver-conf-map" +val configMap = buildConfigMap(configMapName, spec.driverSparkConf) --- End diff -- I agree in that it should be in any order. I will restructure it as such. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20697: Initial checkin of k8s integration tests.
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/20697#discussion_r171426248 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala --- @@ -0,0 +1,391 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.integrationtest + +import java.io.File +import java.nio.file.{Path, Paths} +import java.util.UUID +import java.util.regex.Pattern + +import scala.collection.JavaConverters._ +import com.google.common.io.PatternFilenameFilter +import io.fabric8.kubernetes.api.model.{Container, Pod} +import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll, FunSuite} +import org.scalatest.concurrent.{Eventually, PatienceConfiguration} +import org.scalatest.time.{Minutes, Seconds, Span} + +import org.apache.spark.deploy.k8s.integrationtest.backend.{IntegrationTestBackend, IntegrationTestBackendFactory} +import org.apache.spark.deploy.k8s.integrationtest.config._ + +private[spark] class KubernetesSuite extends FunSuite with BeforeAndAfterAll with BeforeAndAfter { + + import KubernetesSuite._ + + private var testBackend: IntegrationTestBackend = _ + private var sparkHomeDir: Path = _ + private var kubernetesTestComponents: KubernetesTestComponents = _ + private var sparkAppConf: SparkAppConf = _ + private var image: String = _ + private var containerLocalSparkDistroExamplesJar: String = _ + private var appLocator: String = _ + private var driverPodName: String = _ + + override def beforeAll(): Unit = { +// The scalatest-maven-plugin gives system properties that are referenced but not set null +// values. We need to remove the null-value properties before initializing the test backend. +val nullValueProperties = System.getProperties.asScala + .filter(entry => entry._2.equals("null")) + .map(entry => entry._1.toString) +nullValueProperties.foreach { key => + System.clearProperty(key) +} + +val sparkDirProp = System.getProperty("spark.kubernetes.test.unpackSparkDir") +require(sparkDirProp != null, "Spark home directory must be provided in system properties.") +sparkHomeDir = Paths.get(sparkDirProp) +require(sparkHomeDir.toFile.isDirectory, + s"No directory found for spark home specified at $sparkHomeDir.") +val imageTag = getTestImageTag +val imageRepo = getTestImageRepo +image = s"$imageRepo/spark:$imageTag" + +val sparkDistroExamplesJarFile: File = sparkHomeDir.resolve(Paths.get("examples", "jars")) + .toFile + .listFiles(new PatternFilenameFilter(Pattern.compile("^spark-examples_.*\\.jar$")))(0) +containerLocalSparkDistroExamplesJar = s"local:///opt/spark/examples/jars/" + + s"${sparkDistroExamplesJarFile.getName}" +testBackend = IntegrationTestBackendFactory.getTestBackend +testBackend.initialize() +kubernetesTestComponents = new KubernetesTestComponents(testBackend.getKubernetesClient) + } + + override def afterAll(): Unit = { +testBackend.cleanUp() + } + + before { +appLocator = UUID.randomUUID().toString.replaceAll("-", "") +driverPodName = "spark-test-app-" + UUID.randomUUID().toString.replaceAll("-", "") +sparkAppConf = kubernetesTestComponents.newSparkAppConf() + .set("spark.kubernetes.container.image", image) + .set("spark.kubernetes.driver.pod.name", driverPodName) + .set("spark.kubernetes.driver.label.spark-app-locator", appLocator) + .set("spark.kubernetes.executor.label.spark-app-locator", appLocator) +
[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/20669 @vanzin @mccheah @foxish This is ready for your review as it passes unit tests and integration tests. At the moment, it is required that we modify the integration tests to call `SparkFiles.get()`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20669: [SPARK-22839][K8S][WIP] Remove the use of init-container...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/20669 @vanzin you are definitely right and that is what needs to happen. However, one of the functions of the init container was resolving the file location which atm isnât supported unless the spark-application that is testing this service uses `SparkFiles.get()`. This is something I am looking into right now either on the integration testing side or maybe there could be some way to bring back this resolution (as the file is currently being stored in /tmp) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...
GitHub user ifilonenko opened a pull request: https://github.com/apache/spark/pull/20669 [SPARK-22839][K8S] Remove the use of init-container for downloading remote dependencies ## What changes were proposed in this pull request? Removal of the init-container for downloading remote dependencies. Built off of the work done by @vanzin in an attempt to refactor driver/executor configuration elaborated in [this](https://issues.apache.org/jira/browse/SPARK-22839) ticket. ## How was this patch tested? This patch was tested with unit and integration tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ifilonenko/spark remove-init-container Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20669.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20669 commit 2fefd0edf2f15ba66620fd507bd0cd7ce01bcd1e Author: Ilan Filonenko Date: 2018-02-24T23:25:45Z Removed the use of init-container for downloading remote dependencies --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org