[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10553709 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import scala.collection.mutable.ArrayBuffer + +/** + * Parses and encapsulates arguments from the spark-submit script. + */ +private[spark] class SparkSubmitArguments(args: Array[String]) { + var master: String = null + var deployMode: String = null + var executorMemory: String = null + var executorCores: String = null + var totalExecutorCores: String = null + var driverMemory: String = null + var driverCores: String = null + var supervise: Boolean = false + var queue: String = null + var numExecutors: String = null + var files: String = null + var archives: String = null + var mainClass: String = null + var primaryResource: String = null + var name: String = null + var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]() + var moreJars: String = null + + loadEnvVars() + parseArgs(args.toList) + + def loadEnvVars() { +master = System.getenv(MASTER) +deployMode = System.getenv(DEPLOY_MODE) + } + +// TODO: some deploy mode and master need to be specified by default. local seems like a good choice. or we should just error? +// TODO: throw an exception instead of exiting to make things easier on tests? + + def parseArgs(args: List[String]) { +if (args.size == 0) { + printUsageAndExit(1) + System.exit(1) +} +primaryResource = args(0) +parseOpts(args.tail) + } + + def parseOpts(opts: List[String]): Unit = opts match { +case (--name) :: value :: tail = + name = value + parseOpts(tail) + +case (--master) :: value :: tail = + master = value + parseOpts(tail) + +case (--class) :: value :: tail = + mainClass = value + parseOpts(tail) + +case (--deploy-mode) :: value :: tail = + if (value != client value != cluster) { +System.err.println(--deploy-mode must be either \client\ or \cluster\) +System.exit(1) + } + deployMode = value + parseOpts(tail) + +case (--num-executors) :: value :: tail = + numExecutors = value + parseOpts(tail) + +case (--total-executor-cores) :: value :: tail = + totalExecutorCores = value + parseOpts(tail) + +case (--executor-cores) :: value :: tail = + executorCores = value + parseOpts(tail) + +case (--executor-memory) :: value :: tail = + executorMemory = value + parseOpts(tail) + +case (--driver-memory) :: value :: tail = + driverMemory = value + parseOpts(tail) + +case (--driver-cores) :: value :: tail = + driverCores = value + parseOpts(tail) + +case (--supervise) :: tail = + supervise = true + parseOpts(tail) + +case (--queue) :: value :: tail = + queue = value + parseOpts(tail) + +case (--files) :: value :: tail = + files = value + parseOpts(tail) + +case (--archives) :: value :: tail = + archives = value + parseOpts(tail) + +case (--arg) :: value :: tail = + childArgs += value + parseOpts(tail) + +case (--more-jars) :: value :: tail = + moreJars = value + parseOpts(tail) + +case (--help | -h) :: tail = + printUsageAndExit(0) + +case Nil = + +case _ = + printUsageAndExit(1, opts) + } + + def printUsageAndExit(exitCode: Int, unknownParam: Any = null) { +if (unknownParam != null) { +
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37502813 @mateiz maybe you could take a pass on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37506962 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37507039 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13155/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37507037 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37601581 Yeah, workin on it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37611181 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37611180 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37611363 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37611365 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13174/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37381283 Updated patch takes review comments form @mridulm and @pwendell into account. spark.max.cores is now correctly handled. Jars passed in with --more-jars are not added to the classpath. And yarn-standalone|client are used to infer the deploy mode and vice versa. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37382160 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37382161 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37382223 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13127/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-3738 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10399046 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.File +import java.net.URL +import java.net.URLClassLoader + +import scala.collection.mutable.ArrayBuffer + +object SparkSubmit { + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +val appArgs = new SparkSubmitArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master.startsWith(local)) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or local) +System.exit(1) + } +} + +val deployOnCluster = appArgs.deployMode == cluster --- End diff -- On the other hand, it might make more sense to move towards consistency between yarn and standalone/mesos, for which MASTER only specifies the cluster manager, and not the application's deploy mode. For this, we would allow just giving --master to spark-submit as yarn, and yarn-client vs. yarn-standalone would be inferred depending on --deploy-mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10405655 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.File +import java.net.URL +import java.net.URLClassLoader + +import scala.collection.mutable.ArrayBuffer + +object SparkSubmit { + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +val appArgs = new SparkSubmitArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master.startsWith(local)) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or local) +System.exit(1) + } +} + +val deployOnCluster = appArgs.deployMode == cluster +val childClasspath = new ArrayBuffer[String]() +val childArgs = new ArrayBuffer[String]() +var childMainClass = + +if (clusterManager == MESOS deployOnCluster) { + System.err.println(Mesos does not support running the driver on the cluster) + System.exit(1) +} + +if (deployOnCluster clusterManager == STANDALONE) { + childMainClass = org.apache.spark.deploy.Client + childArgs += launch + childArgs += (appArgs.master, appArgs.primaryResource, appArgs.mainClass) +} else if (deployOnCluster clusterManager == YARN) { + childMainClass = org.apache.spark.deploy.yarn.Client + childArgs += (--jar, appArgs.primaryResource) + childArgs += (--class, appArgs.mainClass) +} else { + childMainClass = appArgs.mainClass + childClasspath += appArgs.primaryResource +} + +val options = List[OptionAssigner]( + new OptionAssigner(appArgs.driverMemory, YARN, true, clOption = --master-memory), + new OptionAssigner(appArgs.name, YARN, true, clOption = --name), + new OptionAssigner(appArgs.queue, YARN, true, clOption = --queue), + new OptionAssigner(appArgs.queue, YARN, false, sysProp = spark.yarn.queue), + new OptionAssigner(appArgs.numExecutors, YARN, true, clOption = --num-workers), + new OptionAssigner(appArgs.numExecutors, YARN, false, sysProp = spark.worker.instances), + new OptionAssigner(appArgs.executorMemory, YARN, false, clOption = --worker-memory), + new OptionAssigner(appArgs.executorMemory, STANDALONE, true, clOption = --memory), + new OptionAssigner(appArgs.executorMemory, STANDALONE | MESOS | YARN, false, sysProp = spark.executor.memory), + new OptionAssigner(appArgs.executorCores, YARN, true, clOption = --worker-cores), + new OptionAssigner(appArgs.executorCores, STANDALONE, true, clOption = --cores), + new OptionAssigner(appArgs.executorCores, STANDALONE | MESOS | YARN, false, sysProp = spark.cores.max), + new OptionAssigner(appArgs.files, YARN, false, sysProp = spark.yarn.dist.files), + new OptionAssigner(appArgs.files, YARN, true, clOption = --files), + new OptionAssigner(appArgs.archives, YARN, false, sysProp = spark.yarn.dist.archives), + new OptionAssigner(appArgs.archives, YARN, true, clOption = --archives), + new OptionAssigner(appArgs.moreJars, YARN, true, clOption = --addJars) +) + +// more jars +if (appArgs.moreJars != null !deployOnCluster) { + childClasspath += appArgs.moreJars +}
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user hsaputra commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10406142 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import scala.collection.mutable.ArrayBuffer + +private[spark] class SparkSubmitArguments(args: Array[String]) { --- End diff -- Please add class comment to explain why this class exist and how would it being used or relate to other classes. Few months from now it would make it easier to immediately understand how this class fits in the overall picture by just looking at the summary of the class than have to do a search of usage with IDE in the source repo =) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37088913 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37088937 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13063/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37088936 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-37088962 Newest patch includes tests and doc. @pwendell, do you have a link to the addJar patch? If it's definitely going to happen, I'll take out the classloader stuff here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10359483 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.BufferedReader +import java.io.InputStream +import java.io.InputStreamReader +import java.io.PrintStream + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map +import scala.collection.mutable.ArrayBuffer +import scala.collection.JavaConverters._ + +object SparkApp { + val CLIENT = 1 + val CLUSTER = 2 + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +println(args: + args.toList) +val appArgs = new SparkAppArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { --- End diff -- Definitely open to improvements over yarn-cluster, but yarn-batch doesn't sound right to me because this mode can be used to run long running apps as well like streaming. As this is orthogonal to spark-app, and there are a few related changes I'd like to make, I submitted separate pull request for this. Let's continue the discussion on https://github.com/apache/spark/pull/95. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36939213 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36939354 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36939470 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13027/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36939469 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36945861 I see, regarding the memory part, it sounds like we could do it in bash, but it might be kind of painful. We could do the following: - Look for just the driver memory and cluster mode arguments using bash - If we're not running in cluster mode, set the -Xmx and -Xms parameters when launching I agree that we shouldn't use the full memory you required if you submitted to a cluster. I'm not sure how hard it is to parse these arguments in bash -- it shouldn't be *that* hard, but we'll also have to do it in .cmd scripts on Windows and such. Otherwise It would be good to test how slow this is with two JVM launches (maybe we can avoid a lot of the slowness). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10372142 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.BufferedReader +import java.io.InputStream +import java.io.InputStreamReader +import java.io.PrintStream + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map +import scala.collection.mutable.ArrayBuffer +import scala.collection.JavaConverters._ + +object SparkApp { + val CLIENT = 1 + val CLUSTER = 2 + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +println(args: + args.toList) +val appArgs = new SparkAppArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master == local) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or be local) +System.exit(1) + } +} + +val deployMode = if (appArgs.deployMode == client) CLIENT else CLUSTER +val childEnv = new HashMap[String, String]() +val childClasspath = new ArrayBuffer[String]() +val childArgs = new ArrayBuffer[String]() +childArgs += System.getenv(SPARK_HOME) + /bin/spark-class + +if (clusterManager == MESOS deployMode == CLUSTER) { + System.err.println(Mesos does not support running the driver on the cluster) + System.exit(1) +} + +if (deployMode == CLUSTER clusterManager == STANDALONE) { + childArgs += org.apache.spark.deploy.Client + childArgs += launch + childArgs += appArgs.master + childArgs += appArgs.primaryResource + childArgs += appArgs.mainClass +} else if (deployMode == CLUSTER clusterManager == YARN) { + childArgs += org.apache.spark.deploy.yarn.Client + childArgs += --jar + childArgs += appArgs.primaryResource + childArgs += --class + childArgs += appArgs.mainClass +} else { + childClasspath += appArgs.primaryResource + childArgs += appArgs.mainClass +} + +// TODO: num-executors when not using YARN +val options = List[Opt]( + new Opt(appArgs.driverMemory, YARN, CLUSTER, null, --master-memory, null), + new Opt(appArgs.name, YARN, CLUSTER, null, --name, null), + new Opt(appArgs.queue, YARN, CLUSTER, null, --queue, null), + new Opt(appArgs.queue, YARN, CLIENT, SPARK_YARN_QUEUE, null, null), + new Opt(appArgs.numExecutors, YARN, CLUSTER, null, --num-workers, null), + new Opt(appArgs.executorMemory, YARN, CLIENT, SPARK_WORKER_MEMORY, null, null), + new Opt(appArgs.executorMemory, YARN, CLUSTER, null, --worker-memory, null), + new Opt(appArgs.executorMemory, STANDALONE, CLUSTER, null, --memory, null), + new Opt(appArgs.executorMemory, STANDALONE | MESOS, CLIENT, null, null, spark.executor.memory), + new Opt(appArgs.executorCores, YARN, CLIENT, SPARK_WORKER_CORES, null, null), + new Opt(appArgs.executorCores, YARN, CLUSTER, null, --worker-cores, null), + new Opt(appArgs.executorCores, STANDALONE, CLUSTER, null, --cores, null), + new Opt(appArgs.executorCores, STANDALONE | MESOS, CLIENT, null, null, spark.cores.max), + new Opt(appArgs.files, YARN, CLIENT, SPARK_YARN_DIST_FILES, null, null), + new
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36966736 I uploaded a new patch that doesn't start a new JVM and parses --driver-memory in bash. It wasn't as bad as I expected (thanks to some help from @umbrant and @atm). I've verified that it works with yarn with both deploy modes. I'm still planning to add some tests and doc, but I wanted to upload it with the new approach in case there are any comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36967209 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36967210 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10373180 --- Diff: bin/spark-submit --- @@ -0,0 +1,38 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +export SPARK_HOME=$(cd `dirname $0`/..; pwd) +ORIG_ARGS=$@ + +while (($#)); do + if [ $1 = --deploy-mode ]; then +DEPLOY_MODE=$2 + elif [ $1 = --driver-memory ]; then +DRIVER_MEMORY=$2 + fi + + shift +done + +if [ ! -z $DRIVER_MEMORY ] [ ! -z $DEPLOY_MODE ] [ $DEPLOY_MODE = client ]; then + export SPARK_MEM=$DRIVER_MEMORY +fi + +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS + --- End diff -- Are we envisioning a corresponding .cmd file once the review of this is done ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36967269 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13036/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36967267 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10373335 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.File +import java.net.URL +import java.net.URLClassLoader + +import scala.collection.mutable.ArrayBuffer + +object SparkSubmit { + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +val appArgs = new SparkSubmitArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master.startsWith(local)) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or local) +System.exit(1) + } +} + +val deployOnCluster = appArgs.deployMode == cluster +val childClasspath = new ArrayBuffer[String]() +val childArgs = new ArrayBuffer[String]() +var childMainClass = + +if (clusterManager == MESOS deployOnCluster) { + System.err.println(Mesos does not support running the driver on the cluster) + System.exit(1) +} + +if (deployOnCluster clusterManager == STANDALONE) { + childMainClass = org.apache.spark.deploy.Client + childArgs += launch + childArgs += (appArgs.master, appArgs.primaryResource, appArgs.mainClass) +} else if (deployOnCluster clusterManager == YARN) { + childMainClass = org.apache.spark.deploy.yarn.Client + childArgs += (--jar, appArgs.primaryResource) + childArgs += (--class, appArgs.mainClass) +} else { + childMainClass = appArgs.mainClass + childClasspath += appArgs.primaryResource +} + +val options = List[OptionAssigner]( + new OptionAssigner(appArgs.driverMemory, YARN, true, clOption = --master-memory), + new OptionAssigner(appArgs.name, YARN, true, clOption = --name), + new OptionAssigner(appArgs.queue, YARN, true, clOption = --queue), + new OptionAssigner(appArgs.queue, YARN, false, sysProp = spark.yarn.queue), + new OptionAssigner(appArgs.numExecutors, YARN, true, clOption = --num-workers), + new OptionAssigner(appArgs.numExecutors, YARN, false, sysProp = spark.worker.instances), + new OptionAssigner(appArgs.executorMemory, YARN, false, clOption = --worker-memory), + new OptionAssigner(appArgs.executorMemory, STANDALONE, true, clOption = --memory), + new OptionAssigner(appArgs.executorMemory, STANDALONE | MESOS | YARN, false, sysProp = spark.executor.memory), + new OptionAssigner(appArgs.executorCores, YARN, true, clOption = --worker-cores), + new OptionAssigner(appArgs.executorCores, STANDALONE, true, clOption = --cores), + new OptionAssigner(appArgs.executorCores, STANDALONE | MESOS | YARN, false, sysProp = spark.cores.max), + new OptionAssigner(appArgs.files, YARN, false, sysProp = spark.yarn.dist.files), + new OptionAssigner(appArgs.files, YARN, true, clOption = --files), + new OptionAssigner(appArgs.archives, YARN, false, sysProp = spark.yarn.dist.archives), + new OptionAssigner(appArgs.archives, YARN, true, clOption = --archives), + new OptionAssigner(appArgs.moreJars, YARN, true, clOption = --addJars) +) + +// more jars +if (appArgs.moreJars != null !deployOnCluster) { + childClasspath += appArgs.moreJars +}
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10373471 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.File +import java.net.URL +import java.net.URLClassLoader + +import scala.collection.mutable.ArrayBuffer + +object SparkSubmit { + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +val appArgs = new SparkSubmitArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master.startsWith(local)) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or local) +System.exit(1) + } +} + +val deployOnCluster = appArgs.deployMode == cluster +val childClasspath = new ArrayBuffer[String]() +val childArgs = new ArrayBuffer[String]() +var childMainClass = + +if (clusterManager == MESOS deployOnCluster) { + System.err.println(Mesos does not support running the driver on the cluster) + System.exit(1) +} + +if (deployOnCluster clusterManager == STANDALONE) { + childMainClass = org.apache.spark.deploy.Client + childArgs += launch + childArgs += (appArgs.master, appArgs.primaryResource, appArgs.mainClass) +} else if (deployOnCluster clusterManager == YARN) { + childMainClass = org.apache.spark.deploy.yarn.Client + childArgs += (--jar, appArgs.primaryResource) + childArgs += (--class, appArgs.mainClass) +} else { + childMainClass = appArgs.mainClass + childClasspath += appArgs.primaryResource +} + +val options = List[OptionAssigner]( + new OptionAssigner(appArgs.driverMemory, YARN, true, clOption = --master-memory), + new OptionAssigner(appArgs.name, YARN, true, clOption = --name), + new OptionAssigner(appArgs.queue, YARN, true, clOption = --queue), + new OptionAssigner(appArgs.queue, YARN, false, sysProp = spark.yarn.queue), + new OptionAssigner(appArgs.numExecutors, YARN, true, clOption = --num-workers), + new OptionAssigner(appArgs.numExecutors, YARN, false, sysProp = spark.worker.instances), + new OptionAssigner(appArgs.executorMemory, YARN, false, clOption = --worker-memory), + new OptionAssigner(appArgs.executorMemory, STANDALONE, true, clOption = --memory), + new OptionAssigner(appArgs.executorMemory, STANDALONE | MESOS | YARN, false, sysProp = spark.executor.memory), + new OptionAssigner(appArgs.executorCores, YARN, true, clOption = --worker-cores), + new OptionAssigner(appArgs.executorCores, STANDALONE, true, clOption = --cores), + new OptionAssigner(appArgs.executorCores, STANDALONE | MESOS | YARN, false, sysProp = spark.cores.max), + new OptionAssigner(appArgs.files, YARN, false, sysProp = spark.yarn.dist.files), + new OptionAssigner(appArgs.files, YARN, true, clOption = --files), + new OptionAssigner(appArgs.archives, YARN, false, sysProp = spark.yarn.dist.archives), + new OptionAssigner(appArgs.archives, YARN, true, clOption = --archives), + new OptionAssigner(appArgs.moreJars, YARN, true, clOption = --addJars) +) + +// more jars +if (appArgs.moreJars != null !deployOnCluster) { + childClasspath += appArgs.moreJars +}
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10374754 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.File +import java.net.URL +import java.net.URLClassLoader + +import scala.collection.mutable.ArrayBuffer + +object SparkSubmit { + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +val appArgs = new SparkSubmitArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master.startsWith(local)) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or local) +System.exit(1) + } +} + +val deployOnCluster = appArgs.deployMode == cluster --- End diff -- unfortunately not, since YARN and the standalone cluster both support running the driver inside or outside of the cluster, so now there is a second dimension. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10374821 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.File +import java.net.URL +import java.net.URLClassLoader + +import scala.collection.mutable.ArrayBuffer + +object SparkSubmit { + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +val appArgs = new SparkSubmitArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master.startsWith(local)) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or local) +System.exit(1) + } +} + +val deployOnCluster = appArgs.deployMode == cluster --- End diff -- But that is specified via the master already ... via yarn-standalone vs yarn-client. Not sure about standalone cluster : so probably useful. But if it can be inferred, would be better to simplify user interface. On Mar 7, 2014 12:21 PM, Patrick Wendell notificati...@github.com wrote: In core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master.startsWith(local)) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or local) +System.exit(1) + } +} + +val deployOnCluster = appArgs.deployMode == cluster unfortunately not, since YARN and the standalone cluster both support running the driver inside or outside of the cluster, so now there is a second dimension. -- Reply to this email directly or view it on GitHubhttps://github.com/apache/spark/pull/86/files#r10374754 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/86 SPARK-1126. spark-app preliminary This is a starting version of the spark-app script for running compiled binaries against Spark. It still needs tests and some polish. The only testing I've done so far has been using it to launch jobs in yarn-standalone mode against a pseudo-distributed cluster. This leaves out the changes required for launching python scripts. I think it might be best to save those for another JIRA/PR (while keeping to the design so that they won't require backwards-incompatible changes). You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-1126 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/86.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #86 commit e33a2233963ea771002b683572bd822d63689d00 Author: Sandy Ryza sa...@cloudera.com Date: 2014-03-04T07:54:02Z SPARK-1126. spark-app preliminary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10330438 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkAppArguments.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import scala.collection.mutable.ArrayBuffer + +private[spark] class SparkAppArguments(args: Array[String]) { + var master: String = null + var deployMode: String = null + var executorMemory: String = null + var executorCores: String = null + var driverMemory: String = null + var supervise: Boolean = false + var queue: String = null + var numExecutors: String = null + var files: String = null + var archives: String = null + var mainClass: String = null + var primaryResource: String = null + var name: String = null + var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]() + var moreJars: String = null + var clientClasspath: String = null + + loadEnvVars() + parseArgs(args.toList) + + def loadEnvVars() { +master = System.getenv(MASTER) +deployMode = System.getenv(DEPLOY_MODE) + } + + def parseArgs(args: List[String]) { +primaryResource = args(0) +parseOpts(args.tail) + } + + def parseOpts(opts: List[String]): Unit = opts match { +case (--name) :: value :: tail = + name = value + parseOpts(tail) + +case (--master) :: value :: tail = + master = value + parseOpts(tail) + +case (--class) :: value :: tail = + mainClass = value + parseOpts(tail) + +case (--deploy-mode) :: value :: tail = + if (value != client value != cluster) { +System.err.println(--deploy-mode must be either \client\ or \cluster\) +System.exit(1) + } + deployMode = value + parseOpts(tail) + +case (--num-executors) :: value :: tail = + numExecutors = value + parseOpts(tail) + +case (--executor-cores) :: value :: tail = + executorCores = value + parseOpts(tail) + +case (--executor-memory) :: value :: tail = + executorMemory = value + parseOpts(tail) + +case (--driver-memory) :: value :: tail = + driverMemory = value + parseOpts(tail) + +case (--supervise) :: tail = + supervise = true + parseOpts(tail) + +case (--queue) :: value :: tail = + queue = value + parseOpts(tail) + +case (--files) :: value :: tail = + files = value + parseOpts(tail) + +case (--archives) :: value :: tail = + archives = value + parseOpts(tail) + +case (--arg) :: value :: tail = + childArgs += value + parseOpts(tail) + +case (--more-jars) :: value :: tail = + moreJars = value + parseOpts(tail) + +case (--client-classpath) :: value :: tail = + clientClasspath = value + parseOpts(tail) + +case (--help | -h) :: tail = + printUsageAndExit(0) + +case Nil = + +case _ = + printUsageAndExit(1, opts) + } + + def printUsageAndExit(exitCode: Int, unknownParam: Any = null) { +if (unknownParam != null) { + System.err.println(Unknown/unsupported param + unknownParam) +} +System.err.println( + Usage: spark-app primary binary [options] \n + +Options:\n + + --master MASTER_URLspark://host:port, mesos://host:port, yarn, or local\n + --- End diff -- I'd disable the scalastyle here so that it doesn't produce build errors. In this case I think it's fine to violate the line limit: http://www.scalastyle.org/configuration.html also there is a different way to do multline strings in scala - but up to you...
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10330672 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkAppArguments.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import scala.collection.mutable.ArrayBuffer + +private[spark] class SparkAppArguments(args: Array[String]) { + var master: String = null + var deployMode: String = null + var executorMemory: String = null + var executorCores: String = null + var driverMemory: String = null + var supervise: Boolean = false + var queue: String = null + var numExecutors: String = null + var files: String = null + var archives: String = null + var mainClass: String = null + var primaryResource: String = null + var name: String = null + var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]() + var moreJars: String = null + var clientClasspath: String = null + + loadEnvVars() + parseArgs(args.toList) + + def loadEnvVars() { +master = System.getenv(MASTER) +deployMode = System.getenv(DEPLOY_MODE) + } + + def parseArgs(args: List[String]) { +primaryResource = args(0) +parseOpts(args.tail) + } + + def parseOpts(opts: List[String]): Unit = opts match { +case (--name) :: value :: tail = + name = value + parseOpts(tail) + +case (--master) :: value :: tail = + master = value + parseOpts(tail) + +case (--class) :: value :: tail = + mainClass = value + parseOpts(tail) + +case (--deploy-mode) :: value :: tail = + if (value != client value != cluster) { +System.err.println(--deploy-mode must be either \client\ or \cluster\) +System.exit(1) + } + deployMode = value + parseOpts(tail) + +case (--num-executors) :: value :: tail = + numExecutors = value + parseOpts(tail) + +case (--executor-cores) :: value :: tail = + executorCores = value + parseOpts(tail) + +case (--executor-memory) :: value :: tail = + executorMemory = value + parseOpts(tail) + +case (--driver-memory) :: value :: tail = + driverMemory = value + parseOpts(tail) + +case (--supervise) :: tail = + supervise = true + parseOpts(tail) + +case (--queue) :: value :: tail = + queue = value + parseOpts(tail) + +case (--files) :: value :: tail = + files = value + parseOpts(tail) + +case (--archives) :: value :: tail = + archives = value + parseOpts(tail) + +case (--arg) :: value :: tail = + childArgs += value + parseOpts(tail) + +case (--more-jars) :: value :: tail = + moreJars = value + parseOpts(tail) + +case (--client-classpath) :: value :: tail = + clientClasspath = value + parseOpts(tail) + +case (--help | -h) :: tail = + printUsageAndExit(0) + +case Nil = + +case _ = + printUsageAndExit(1, opts) + } + + def printUsageAndExit(exitCode: Int, unknownParam: Any = null) { +if (unknownParam != null) { + System.err.println(Unknown/unsupported param + unknownParam) +} +System.err.println( + Usage: spark-app primary binary [options] \n + +Options:\n + + --master MASTER_URLspark://host:port, mesos://host:port, yarn, or local\n + + --deploy-mode DEPLOY_MODE Mode to deploy the app in, either \client\ or \cluster\\n + + --class CLASS_NAME Name of your application's main class (required for Java apps)\n + + --arg ARG Argument to be passed to your application's main class.\n + +
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10330871 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.BufferedReader +import java.io.InputStream +import java.io.InputStreamReader +import java.io.PrintStream + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map +import scala.collection.mutable.ArrayBuffer +import scala.collection.JavaConverters._ + +object SparkApp { + val CLIENT = 1 + val CLUSTER = 2 + val YARN = 1 --- End diff -- These are created as flags, but since they are mutually exclusive would it make more sense for them to be an `Enumeration`? http://www.scala-lang.org/api/2.10.2/index.html#scala.Enumeration I.e. is it possible for something to be both YARN and LOCAL? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10331089 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.BufferedReader +import java.io.InputStream +import java.io.InputStreamReader +import java.io.PrintStream + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map +import scala.collection.mutable.ArrayBuffer +import scala.collection.JavaConverters._ + +object SparkApp { + val CLIENT = 1 + val CLUSTER = 2 + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +println(args: + args.toList) +val appArgs = new SparkAppArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { --- End diff -- This has bothered me too. Would love to make that change if it's not too late. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10331562 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkAppArguments.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import scala.collection.mutable.ArrayBuffer + +private[spark] class SparkAppArguments(args: Array[String]) { + var master: String = null + var deployMode: String = null + var executorMemory: String = null + var executorCores: String = null + var driverMemory: String = null + var supervise: Boolean = false + var queue: String = null + var numExecutors: String = null + var files: String = null + var archives: String = null + var mainClass: String = null + var primaryResource: String = null + var name: String = null + var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]() + var moreJars: String = null + var clientClasspath: String = null + + loadEnvVars() + parseArgs(args.toList) + + def loadEnvVars() { +master = System.getenv(MASTER) +deployMode = System.getenv(DEPLOY_MODE) + } + + def parseArgs(args: List[String]) { +primaryResource = args(0) +parseOpts(args.tail) + } + + def parseOpts(opts: List[String]): Unit = opts match { +case (--name) :: value :: tail = + name = value + parseOpts(tail) + +case (--master) :: value :: tail = + master = value + parseOpts(tail) + +case (--class) :: value :: tail = + mainClass = value + parseOpts(tail) + +case (--deploy-mode) :: value :: tail = + if (value != client value != cluster) { +System.err.println(--deploy-mode must be either \client\ or \cluster\) +System.exit(1) + } + deployMode = value + parseOpts(tail) + +case (--num-executors) :: value :: tail = + numExecutors = value + parseOpts(tail) + +case (--executor-cores) :: value :: tail = + executorCores = value + parseOpts(tail) + +case (--executor-memory) :: value :: tail = + executorMemory = value + parseOpts(tail) + +case (--driver-memory) :: value :: tail = + driverMemory = value + parseOpts(tail) + +case (--supervise) :: tail = + supervise = true + parseOpts(tail) + +case (--queue) :: value :: tail = + queue = value + parseOpts(tail) + +case (--files) :: value :: tail = + files = value + parseOpts(tail) + +case (--archives) :: value :: tail = + archives = value + parseOpts(tail) + +case (--arg) :: value :: tail = + childArgs += value + parseOpts(tail) + +case (--more-jars) :: value :: tail = + moreJars = value + parseOpts(tail) + +case (--client-classpath) :: value :: tail = + clientClasspath = value + parseOpts(tail) + +case (--help | -h) :: tail = + printUsageAndExit(0) + +case Nil = + +case _ = + printUsageAndExit(1, opts) + } + + def printUsageAndExit(exitCode: Int, unknownParam: Any = null) { +if (unknownParam != null) { + System.err.println(Unknown/unsupported param + unknownParam) +} +System.err.println( + Usage: spark-app primary binary [options] \n + +Options:\n + + --master MASTER_URLspark://host:port, mesos://host:port, yarn, or local\n + + --deploy-mode DEPLOY_MODE Mode to deploy the app in, either \client\ or \cluster\\n + + --class CLASS_NAME Name of your application's main class (required for Java apps)\n + + --arg ARG Argument to be passed to your application's main class.\n + +
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332560 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.BufferedReader +import java.io.InputStream +import java.io.InputStreamReader +import java.io.PrintStream + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map +import scala.collection.mutable.ArrayBuffer +import scala.collection.JavaConverters._ + +object SparkApp { + val CLIENT = 1 + val CLUSTER = 2 + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +println(args: + args.toList) +val appArgs = new SparkAppArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master == local) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or be local) +System.exit(1) + } +} + +val deployMode = if (appArgs.deployMode == client) CLIENT else CLUSTER +val childEnv = new HashMap[String, String]() +val childClasspath = new ArrayBuffer[String]() +val childArgs = new ArrayBuffer[String]() +childArgs += System.getenv(SPARK_HOME) + /bin/spark-class + +if (clusterManager == MESOS deployMode == CLUSTER) { + System.err.println(Mesos does not support running the driver on the cluster) + System.exit(1) +} + +if (deployMode == CLUSTER clusterManager == STANDALONE) { + childArgs += org.apache.spark.deploy.Client + childArgs += launch --- End diff -- needs whitespace ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332582 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.BufferedReader +import java.io.InputStream +import java.io.InputStreamReader +import java.io.PrintStream + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map +import scala.collection.mutable.ArrayBuffer +import scala.collection.JavaConverters._ + +object SparkApp { + val CLIENT = 1 + val CLUSTER = 2 + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +println(args: + args.toList) +val appArgs = new SparkAppArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master == local) { --- End diff -- This should be startsWith because you can do local[4] and stuff like that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332644 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.BufferedReader +import java.io.InputStream +import java.io.InputStreamReader +import java.io.PrintStream + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map +import scala.collection.mutable.ArrayBuffer +import scala.collection.JavaConverters._ + +object SparkApp { + val CLIENT = 1 + val CLUSTER = 2 + val YARN = 1 --- End diff -- At the very least though, separating these two sets of constants to make it clear that they cover different things would help. E.g. give them prefixes, like `MANAGER_LOCAL` and `MODE_CLIENT`, or maybe make the client vs cluster thing a boolean. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332654 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.BufferedReader +import java.io.InputStream +import java.io.InputStreamReader +import java.io.PrintStream + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map +import scala.collection.mutable.ArrayBuffer +import scala.collection.JavaConverters._ + +object SparkApp { + val CLIENT = 1 + val CLUSTER = 2 + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +println(args: + args.toList) +val appArgs = new SparkAppArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master == local) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or be local) +System.exit(1) + } +} + +val deployMode = if (appArgs.deployMode == client) CLIENT else CLUSTER +val childEnv = new HashMap[String, String]() +val childClasspath = new ArrayBuffer[String]() +val childArgs = new ArrayBuffer[String]() +childArgs += System.getenv(SPARK_HOME) + /bin/spark-class + +if (clusterManager == MESOS deployMode == CLUSTER) { + System.err.println(Mesos does not support running the driver on the cluster) + System.exit(1) +} + +if (deployMode == CLUSTER clusterManager == STANDALONE) { + childArgs += org.apache.spark.deploy.Client + childArgs += launch + childArgs += appArgs.master + childArgs += appArgs.primaryResource + childArgs += appArgs.mainClass +} else if (deployMode == CLUSTER clusterManager == YARN) { + childArgs += org.apache.spark.deploy.yarn.Client + childArgs += --jar + childArgs += appArgs.primaryResource + childArgs += --class + childArgs += appArgs.mainClass +} else { + childClasspath += appArgs.primaryResource + childArgs += appArgs.mainClass +} + +// TODO: num-executors when not using YARN +val options = List[Opt]( + new Opt(appArgs.driverMemory, YARN, CLUSTER, null, --master-memory, null), + new Opt(appArgs.name, YARN, CLUSTER, null, --name, null), + new Opt(appArgs.queue, YARN, CLUSTER, null, --queue, null), + new Opt(appArgs.queue, YARN, CLIENT, SPARK_YARN_QUEUE, null, null), + new Opt(appArgs.numExecutors, YARN, CLUSTER, null, --num-workers, null), + new Opt(appArgs.executorMemory, YARN, CLIENT, SPARK_WORKER_MEMORY, null, null), + new Opt(appArgs.executorMemory, YARN, CLUSTER, null, --worker-memory, null), + new Opt(appArgs.executorMemory, STANDALONE, CLUSTER, null, --memory, null), + new Opt(appArgs.executorMemory, STANDALONE | MESOS, CLIENT, null, null, spark.executor.memory), + new Opt(appArgs.executorCores, YARN, CLIENT, SPARK_WORKER_CORES, null, null), + new Opt(appArgs.executorCores, YARN, CLUSTER, null, --worker-cores, null), + new Opt(appArgs.executorCores, STANDALONE, CLUSTER, null, --cores, null), + new Opt(appArgs.executorCores, STANDALONE | MESOS, CLIENT, null, null, spark.cores.max), + new Opt(appArgs.files, YARN, CLIENT, SPARK_YARN_DIST_FILES, null, null), + new
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332670 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.BufferedReader +import java.io.InputStream +import java.io.InputStreamReader +import java.io.PrintStream + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map +import scala.collection.mutable.ArrayBuffer +import scala.collection.JavaConverters._ + +object SparkApp { + val CLIENT = 1 + val CLUSTER = 2 + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +println(args: + args.toList) +val appArgs = new SparkAppArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith(yarn)) { +clusterManager = YARN + } else if (appArgs.master.startsWith(spark)) { +clusterManager = STANDALONE + } else if (appArgs.master.startsWith(mesos)) { +clusterManager = MESOS + } else if (appArgs.master == local) { +clusterManager = LOCAL + } else { +System.err.println(master must start with yarn, mesos, spark, or be local) +System.exit(1) + } +} + +val deployMode = if (appArgs.deployMode == client) CLIENT else CLUSTER +val childEnv = new HashMap[String, String]() +val childClasspath = new ArrayBuffer[String]() +val childArgs = new ArrayBuffer[String]() +childArgs += System.getenv(SPARK_HOME) + /bin/spark-class + +if (clusterManager == MESOS deployMode == CLUSTER) { + System.err.println(Mesos does not support running the driver on the cluster) + System.exit(1) +} + +if (deployMode == CLUSTER clusterManager == STANDALONE) { + childArgs += org.apache.spark.deploy.Client + childArgs += launch + childArgs += appArgs.master + childArgs += appArgs.primaryResource + childArgs += appArgs.mainClass +} else if (deployMode == CLUSTER clusterManager == YARN) { + childArgs += org.apache.spark.deploy.yarn.Client + childArgs += --jar + childArgs += appArgs.primaryResource + childArgs += --class + childArgs += appArgs.mainClass +} else { + childClasspath += appArgs.primaryResource + childArgs += appArgs.mainClass +} + +// TODO: num-executors when not using YARN +val options = List[Opt]( + new Opt(appArgs.driverMemory, YARN, CLUSTER, null, --master-memory, null), + new Opt(appArgs.name, YARN, CLUSTER, null, --name, null), + new Opt(appArgs.queue, YARN, CLUSTER, null, --queue, null), + new Opt(appArgs.queue, YARN, CLIENT, SPARK_YARN_QUEUE, null, null), + new Opt(appArgs.numExecutors, YARN, CLUSTER, null, --num-workers, null), + new Opt(appArgs.executorMemory, YARN, CLIENT, SPARK_WORKER_MEMORY, null, null), + new Opt(appArgs.executorMemory, YARN, CLUSTER, null, --worker-memory, null), + new Opt(appArgs.executorMemory, STANDALONE, CLUSTER, null, --memory, null), + new Opt(appArgs.executorMemory, STANDALONE | MESOS, CLIENT, null, null, spark.executor.memory), + new Opt(appArgs.executorCores, YARN, CLIENT, SPARK_WORKER_CORES, null, null), + new Opt(appArgs.executorCores, YARN, CLUSTER, null, --worker-cores, null), + new Opt(appArgs.executorCores, STANDALONE, CLUSTER, null, --cores, null), + new Opt(appArgs.executorCores, STANDALONE | MESOS, CLIENT, null, null, spark.cores.max), + new Opt(appArgs.files, YARN, CLIENT, SPARK_YARN_DIST_FILES, null, null), + new
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332688 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkAppArguments.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import scala.collection.mutable.ArrayBuffer + +private[spark] class SparkAppArguments(args: Array[String]) { + var master: String = null + var deployMode: String = null + var executorMemory: String = null + var executorCores: String = null + var driverMemory: String = null + var supervise: Boolean = false + var queue: String = null + var numExecutors: String = null + var files: String = null + var archives: String = null + var mainClass: String = null + var primaryResource: String = null + var name: String = null + var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]() + var moreJars: String = null + var clientClasspath: String = null --- End diff -- clientClasspath is not used ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36828689 Thanks for taking a look, Matei. If we use system properties instead of env variables, the remaining reason we'd want to start a second JVM is to be able to have a --driver-memory property. The only way around this I can think of would be to require users to set this with an environment variable instead of a command line option. One small weird thing about this is that the client would still be given the max heap specified in driver SPARK_DRIVER_MEMORY even when the driver is being run on the cluster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---