[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-13 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10553709
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import scala.collection.mutable.ArrayBuffer
+
+/**
+ * Parses and encapsulates arguments from the spark-submit script.
+ */
+private[spark] class SparkSubmitArguments(args: Array[String]) {
+  var master: String = null
+  var deployMode: String = null
+  var executorMemory: String = null
+  var executorCores: String = null
+  var totalExecutorCores: String = null
+  var driverMemory: String = null
+  var driverCores: String = null
+  var supervise: Boolean = false
+  var queue: String = null
+  var numExecutors: String = null
+  var files: String = null
+  var archives: String = null
+  var mainClass: String = null
+  var primaryResource: String = null
+  var name: String = null
+  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
+  var moreJars: String = null
+
+  loadEnvVars()
+  parseArgs(args.toList)
+
+  def loadEnvVars() {
+master = System.getenv(MASTER)
+deployMode = System.getenv(DEPLOY_MODE)
+  }
+
+// TODO: some deploy mode and master need to be specified by default. 
local seems like a good choice.  or we should just error?
+// TODO: throw an exception instead of exiting to make things easier on 
tests?
+
+  def parseArgs(args: List[String]) {
+if (args.size == 0) {
+  printUsageAndExit(1)
+  System.exit(1)
+}
+primaryResource = args(0)
+parseOpts(args.tail)
+  }
+
+  def parseOpts(opts: List[String]): Unit = opts match {
+case (--name) :: value :: tail =
+  name = value
+  parseOpts(tail)
+
+case (--master) :: value :: tail =
+  master = value
+  parseOpts(tail)
+
+case (--class) :: value :: tail =
+  mainClass = value
+  parseOpts(tail)
+
+case (--deploy-mode) :: value :: tail =
+  if (value != client  value != cluster) {
+System.err.println(--deploy-mode must be either \client\ or 
\cluster\)
+System.exit(1)
+  }
+  deployMode = value
+  parseOpts(tail)
+
+case (--num-executors) :: value :: tail =
+  numExecutors = value
+  parseOpts(tail)
+
+case (--total-executor-cores) :: value :: tail =
+  totalExecutorCores = value
+  parseOpts(tail)
+
+case (--executor-cores) :: value :: tail =
+  executorCores = value
+  parseOpts(tail)
+
+case (--executor-memory) :: value :: tail =
+  executorMemory = value
+  parseOpts(tail)
+
+case (--driver-memory) :: value :: tail =
+  driverMemory = value
+  parseOpts(tail)
+
+case (--driver-cores) :: value :: tail =
+  driverCores = value
+  parseOpts(tail)
+
+case (--supervise) :: tail =
+  supervise = true
+  parseOpts(tail)
+
+case (--queue) :: value :: tail =
+  queue = value
+  parseOpts(tail)
+
+case (--files) :: value :: tail =
+  files = value
+  parseOpts(tail)
+
+case (--archives) :: value :: tail =
+  archives = value
+  parseOpts(tail)
+
+case (--arg) :: value :: tail =
+  childArgs += value
+  parseOpts(tail)
+
+case (--more-jars) :: value :: tail =
+  moreJars = value
+  parseOpts(tail)
+
+case (--help | -h) :: tail =
+  printUsageAndExit(0)
+
+case Nil =
+
+case _ =
+  printUsageAndExit(1, opts)
+  }
+
+  def printUsageAndExit(exitCode: Int, unknownParam: Any = null) {
+if (unknownParam != null) {
+  

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-13 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37502813
  
@mateiz maybe you could take a pass on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37506962
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37507039
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13155/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37507037
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-13 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37601581
  
Yeah, workin on it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37611181
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37611180
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37611363
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37611365
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13174/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-12 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37381283
  
Updated patch takes review comments form @mridulm and @pwendell  into 
account.

spark.max.cores is now correctly handled.  Jars passed in with --more-jars 
are not added to the classpath.  And yarn-standalone|client are used to infer 
the deploy mode and vice versa.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37382160
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37382161
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37382223
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13127/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-3738
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-07 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10399046
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.File
+import java.net.URL
+import java.net.URLClassLoader
+
+import scala.collection.mutable.ArrayBuffer
+
+object SparkSubmit {
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+val appArgs = new SparkSubmitArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master.startsWith(local)) {
+clusterManager = LOCAL
+  } else {
+System.err.println(master must start with yarn, mesos, spark, or 
local)
+System.exit(1)
+  }
+}
+
+val deployOnCluster = appArgs.deployMode == cluster
--- End diff --

On the other hand, it might make more sense to move towards consistency 
between yarn and standalone/mesos, for which MASTER only specifies the cluster 
manager, and not the application's deploy mode.  For this, we would allow just 
giving --master to spark-submit as yarn, and yarn-client vs. yarn-standalone 
would be inferred depending on --deploy-mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-07 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10405655
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.File
+import java.net.URL
+import java.net.URLClassLoader
+
+import scala.collection.mutable.ArrayBuffer
+
+object SparkSubmit {
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+val appArgs = new SparkSubmitArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master.startsWith(local)) {
+clusterManager = LOCAL
+  } else {
+System.err.println(master must start with yarn, mesos, spark, or 
local)
+System.exit(1)
+  }
+}
+
+val deployOnCluster = appArgs.deployMode == cluster
+val childClasspath = new ArrayBuffer[String]()
+val childArgs = new ArrayBuffer[String]()
+var childMainClass = 
+
+if (clusterManager == MESOS  deployOnCluster) {
+  System.err.println(Mesos does not support running the driver on the 
cluster)
+  System.exit(1)
+}
+
+if (deployOnCluster  clusterManager == STANDALONE) {
+  childMainClass = org.apache.spark.deploy.Client
+  childArgs += launch
+  childArgs += (appArgs.master, appArgs.primaryResource, 
appArgs.mainClass)
+} else if (deployOnCluster  clusterManager == YARN) {
+  childMainClass = org.apache.spark.deploy.yarn.Client
+  childArgs += (--jar, appArgs.primaryResource)
+  childArgs += (--class, appArgs.mainClass)
+} else {
+  childMainClass = appArgs.mainClass
+  childClasspath += appArgs.primaryResource
+}
+
+val options = List[OptionAssigner](
+  new OptionAssigner(appArgs.driverMemory, YARN, true, clOption = 
--master-memory),
+  new OptionAssigner(appArgs.name, YARN, true, clOption = --name),
+  new OptionAssigner(appArgs.queue, YARN, true, clOption = --queue),
+  new OptionAssigner(appArgs.queue, YARN, false, sysProp = 
spark.yarn.queue),
+  new OptionAssigner(appArgs.numExecutors, YARN, true, clOption = 
--num-workers),
+  new OptionAssigner(appArgs.numExecutors, YARN, false, sysProp = 
spark.worker.instances),
+  new OptionAssigner(appArgs.executorMemory, YARN, false, clOption = 
--worker-memory),
+  new OptionAssigner(appArgs.executorMemory, STANDALONE, true, 
clOption = --memory),
+  new OptionAssigner(appArgs.executorMemory, STANDALONE | MESOS | 
YARN, false, sysProp = spark.executor.memory),
+  new OptionAssigner(appArgs.executorCores, YARN, true, clOption = 
--worker-cores),
+  new OptionAssigner(appArgs.executorCores, STANDALONE, true, clOption 
= --cores),
+  new OptionAssigner(appArgs.executorCores, STANDALONE | MESOS | YARN, 
false, sysProp = spark.cores.max),
+  new OptionAssigner(appArgs.files, YARN, false, sysProp = 
spark.yarn.dist.files),
+  new OptionAssigner(appArgs.files, YARN, true, clOption = --files),
+  new OptionAssigner(appArgs.archives, YARN, false, sysProp = 
spark.yarn.dist.archives),
+  new OptionAssigner(appArgs.archives, YARN, true, clOption = 
--archives),
+  new OptionAssigner(appArgs.moreJars, YARN, true, clOption = 
--addJars)
+)
+
+// more jars
+if (appArgs.moreJars != null  !deployOnCluster) {
+  childClasspath += appArgs.moreJars
+}
  

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-07 Thread hsaputra
Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10406142
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import scala.collection.mutable.ArrayBuffer
+
+private[spark] class SparkSubmitArguments(args: Array[String]) {
--- End diff --

Please add class comment to explain why this class exist and how would it 
being used or relate to other classes.
Few months from now it would make it easier to immediately understand how 
this class fits in the overall picture by just looking at the summary of the 
class than have to do a search of usage with IDE in the source repo =)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37088913
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37088937
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13063/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37088936
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-07 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-37088962
  
Newest patch includes tests and doc.  @pwendell, do you have a link to the 
addJar patch?  If it's definitely going to happen, I'll take out the 
classloader stuff here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10359483
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.BufferedReader
+import java.io.InputStream
+import java.io.InputStreamReader
+import java.io.PrintStream
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.JavaConverters._
+
+object SparkApp {
+  val CLIENT = 1
+  val CLUSTER = 2
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+println(args:  + args.toList)
+val appArgs = new SparkAppArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
--- End diff --

Definitely open to improvements over yarn-cluster, but yarn-batch 
doesn't sound right to me because this mode can be used to run long running 
apps as well like streaming.

As this is orthogonal to spark-app, and there are a few related changes I'd 
like to make, I submitted separate pull request for this.  Let's continue the 
discussion on https://github.com/apache/spark/pull/95.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36939213
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36939354
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36939470
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13027/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36939469
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36945861
  
I see, regarding the memory part, it sounds like we could do it in bash, 
but it might be kind of painful. We could do the following:
- Look for just the driver memory and cluster mode arguments using bash
- If we're not running in cluster mode, set the -Xmx and -Xms parameters 
when launching

I agree that we shouldn't use the full memory you required if you submitted 
to a cluster. I'm not sure how hard it is to parse these arguments in bash -- 
it shouldn't be *that* hard, but we'll also have to do it in .cmd scripts on 
Windows and such. Otherwise It would be good to test how slow this is with two 
JVM launches (maybe we can avoid a lot of the slowness).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10372142
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.BufferedReader
+import java.io.InputStream
+import java.io.InputStreamReader
+import java.io.PrintStream
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.JavaConverters._
+
+object SparkApp {
+  val CLIENT = 1
+  val CLUSTER = 2
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+println(args:  + args.toList)
+val appArgs = new SparkAppArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master == local) {
+clusterManager = LOCAL
+  } else {
+System.err.println(master must start with yarn, mesos, spark, or 
be local)
+System.exit(1)
+  }
+}
+
+val deployMode = if (appArgs.deployMode == client) CLIENT else 
CLUSTER
+val childEnv = new HashMap[String, String]()
+val childClasspath = new ArrayBuffer[String]()
+val childArgs = new ArrayBuffer[String]()
+childArgs += System.getenv(SPARK_HOME) + /bin/spark-class
+
+if (clusterManager == MESOS  deployMode == CLUSTER) {
+  System.err.println(Mesos does not support running the driver on the 
cluster)
+  System.exit(1)
+}
+
+if (deployMode == CLUSTER  clusterManager == STANDALONE) {
+  childArgs += org.apache.spark.deploy.Client
+  childArgs += launch
+  childArgs += appArgs.master
+  childArgs += appArgs.primaryResource
+  childArgs += appArgs.mainClass
+} else if (deployMode == CLUSTER  clusterManager == YARN) {
+  childArgs += org.apache.spark.deploy.yarn.Client
+  childArgs += --jar
+  childArgs += appArgs.primaryResource
+  childArgs += --class
+  childArgs += appArgs.mainClass
+} else {
+  childClasspath += appArgs.primaryResource
+  childArgs += appArgs.mainClass
+}
+
+// TODO: num-executors when not using YARN
+val options = List[Opt](
+  new Opt(appArgs.driverMemory, YARN, CLUSTER, null, 
--master-memory, null),
+  new Opt(appArgs.name, YARN, CLUSTER, null, --name, null),
+  new Opt(appArgs.queue, YARN, CLUSTER, null, --queue, null),
+  new Opt(appArgs.queue, YARN, CLIENT, SPARK_YARN_QUEUE, null, null),
+  new Opt(appArgs.numExecutors, YARN, CLUSTER, null, --num-workers, 
null),
+  new Opt(appArgs.executorMemory, YARN, CLIENT, SPARK_WORKER_MEMORY, 
null, null),
+  new Opt(appArgs.executorMemory, YARN, CLUSTER, null, 
--worker-memory, null),
+  new Opt(appArgs.executorMemory, STANDALONE, CLUSTER, null, 
--memory, null),
+  new Opt(appArgs.executorMemory, STANDALONE | MESOS, CLIENT, null, 
null, spark.executor.memory),
+  new Opt(appArgs.executorCores, YARN, CLIENT, SPARK_WORKER_CORES, 
null, null),
+  new Opt(appArgs.executorCores, YARN, CLUSTER, null, 
--worker-cores, null),
+  new Opt(appArgs.executorCores, STANDALONE, CLUSTER, null, --cores, 
null),
+  new Opt(appArgs.executorCores, STANDALONE | MESOS, CLIENT, null, 
null, spark.cores.max),
+  new Opt(appArgs.files, YARN, CLIENT, SPARK_YARN_DIST_FILES, null, 
null),
+  new 

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36966736
  
I uploaded a new patch that doesn't start a new JVM and parses 
--driver-memory in bash. It wasn't as bad as I expected (thanks to some help 
from @umbrant and @atm).

I've verified that it works with yarn with both deploy modes.  I'm still 
planning to add some tests and doc, but I wanted to upload it with the new 
approach in case there are any comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36967209
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36967210
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10373180
  
--- Diff: bin/spark-submit ---
@@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+export SPARK_HOME=$(cd `dirname $0`/..; pwd)
+ORIG_ARGS=$@
+
+while (($#)); do
+  if [ $1 = --deploy-mode ]; then
+DEPLOY_MODE=$2
+  elif [ $1 = --driver-memory ]; then
+DRIVER_MEMORY=$2
+  fi
+
+  shift
+done
+
+if [ ! -z $DRIVER_MEMORY ]  [ ! -z $DEPLOY_MODE ]  [ $DEPLOY_MODE = 
client ]; then
+  export SPARK_MEM=$DRIVER_MEMORY
+fi
+
+$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
+
--- End diff --

Are we envisioning a corresponding .cmd file once the review of this is 
done ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36967269
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13036/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36967267
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10373335
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.File
+import java.net.URL
+import java.net.URLClassLoader
+
+import scala.collection.mutable.ArrayBuffer
+
+object SparkSubmit {
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+val appArgs = new SparkSubmitArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master.startsWith(local)) {
+clusterManager = LOCAL
+  } else {
+System.err.println(master must start with yarn, mesos, spark, or 
local)
+System.exit(1)
+  }
+}
+
+val deployOnCluster = appArgs.deployMode == cluster
+val childClasspath = new ArrayBuffer[String]()
+val childArgs = new ArrayBuffer[String]()
+var childMainClass = 
+
+if (clusterManager == MESOS  deployOnCluster) {
+  System.err.println(Mesos does not support running the driver on the 
cluster)
+  System.exit(1)
+}
+
+if (deployOnCluster  clusterManager == STANDALONE) {
+  childMainClass = org.apache.spark.deploy.Client
+  childArgs += launch
+  childArgs += (appArgs.master, appArgs.primaryResource, 
appArgs.mainClass)
+} else if (deployOnCluster  clusterManager == YARN) {
+  childMainClass = org.apache.spark.deploy.yarn.Client
+  childArgs += (--jar, appArgs.primaryResource)
+  childArgs += (--class, appArgs.mainClass)
+} else {
+  childMainClass = appArgs.mainClass
+  childClasspath += appArgs.primaryResource
+}
+
+val options = List[OptionAssigner](
+  new OptionAssigner(appArgs.driverMemory, YARN, true, clOption = 
--master-memory),
+  new OptionAssigner(appArgs.name, YARN, true, clOption = --name),
+  new OptionAssigner(appArgs.queue, YARN, true, clOption = --queue),
+  new OptionAssigner(appArgs.queue, YARN, false, sysProp = 
spark.yarn.queue),
+  new OptionAssigner(appArgs.numExecutors, YARN, true, clOption = 
--num-workers),
+  new OptionAssigner(appArgs.numExecutors, YARN, false, sysProp = 
spark.worker.instances),
+  new OptionAssigner(appArgs.executorMemory, YARN, false, clOption = 
--worker-memory),
+  new OptionAssigner(appArgs.executorMemory, STANDALONE, true, 
clOption = --memory),
+  new OptionAssigner(appArgs.executorMemory, STANDALONE | MESOS | 
YARN, false, sysProp = spark.executor.memory),
+  new OptionAssigner(appArgs.executorCores, YARN, true, clOption = 
--worker-cores),
+  new OptionAssigner(appArgs.executorCores, STANDALONE, true, clOption 
= --cores),
+  new OptionAssigner(appArgs.executorCores, STANDALONE | MESOS | YARN, 
false, sysProp = spark.cores.max),
+  new OptionAssigner(appArgs.files, YARN, false, sysProp = 
spark.yarn.dist.files),
+  new OptionAssigner(appArgs.files, YARN, true, clOption = --files),
+  new OptionAssigner(appArgs.archives, YARN, false, sysProp = 
spark.yarn.dist.archives),
+  new OptionAssigner(appArgs.archives, YARN, true, clOption = 
--archives),
+  new OptionAssigner(appArgs.moreJars, YARN, true, clOption = 
--addJars)
+)
+
+// more jars
+if (appArgs.moreJars != null  !deployOnCluster) {
+  childClasspath += appArgs.moreJars
+}
   

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10373471
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.File
+import java.net.URL
+import java.net.URLClassLoader
+
+import scala.collection.mutable.ArrayBuffer
+
+object SparkSubmit {
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+val appArgs = new SparkSubmitArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master.startsWith(local)) {
+clusterManager = LOCAL
+  } else {
+System.err.println(master must start with yarn, mesos, spark, or 
local)
+System.exit(1)
+  }
+}
+
+val deployOnCluster = appArgs.deployMode == cluster
+val childClasspath = new ArrayBuffer[String]()
+val childArgs = new ArrayBuffer[String]()
+var childMainClass = 
+
+if (clusterManager == MESOS  deployOnCluster) {
+  System.err.println(Mesos does not support running the driver on the 
cluster)
+  System.exit(1)
+}
+
+if (deployOnCluster  clusterManager == STANDALONE) {
+  childMainClass = org.apache.spark.deploy.Client
+  childArgs += launch
+  childArgs += (appArgs.master, appArgs.primaryResource, 
appArgs.mainClass)
+} else if (deployOnCluster  clusterManager == YARN) {
+  childMainClass = org.apache.spark.deploy.yarn.Client
+  childArgs += (--jar, appArgs.primaryResource)
+  childArgs += (--class, appArgs.mainClass)
+} else {
+  childMainClass = appArgs.mainClass
+  childClasspath += appArgs.primaryResource
+}
+
+val options = List[OptionAssigner](
+  new OptionAssigner(appArgs.driverMemory, YARN, true, clOption = 
--master-memory),
+  new OptionAssigner(appArgs.name, YARN, true, clOption = --name),
+  new OptionAssigner(appArgs.queue, YARN, true, clOption = --queue),
+  new OptionAssigner(appArgs.queue, YARN, false, sysProp = 
spark.yarn.queue),
+  new OptionAssigner(appArgs.numExecutors, YARN, true, clOption = 
--num-workers),
+  new OptionAssigner(appArgs.numExecutors, YARN, false, sysProp = 
spark.worker.instances),
+  new OptionAssigner(appArgs.executorMemory, YARN, false, clOption = 
--worker-memory),
+  new OptionAssigner(appArgs.executorMemory, STANDALONE, true, 
clOption = --memory),
+  new OptionAssigner(appArgs.executorMemory, STANDALONE | MESOS | 
YARN, false, sysProp = spark.executor.memory),
+  new OptionAssigner(appArgs.executorCores, YARN, true, clOption = 
--worker-cores),
+  new OptionAssigner(appArgs.executorCores, STANDALONE, true, clOption 
= --cores),
+  new OptionAssigner(appArgs.executorCores, STANDALONE | MESOS | YARN, 
false, sysProp = spark.cores.max),
+  new OptionAssigner(appArgs.files, YARN, false, sysProp = 
spark.yarn.dist.files),
+  new OptionAssigner(appArgs.files, YARN, true, clOption = --files),
+  new OptionAssigner(appArgs.archives, YARN, false, sysProp = 
spark.yarn.dist.archives),
+  new OptionAssigner(appArgs.archives, YARN, true, clOption = 
--archives),
+  new OptionAssigner(appArgs.moreJars, YARN, true, clOption = 
--addJars)
+)
+
+// more jars
+if (appArgs.moreJars != null  !deployOnCluster) {
+  childClasspath += appArgs.moreJars
+}
   

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10374754
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.File
+import java.net.URL
+import java.net.URLClassLoader
+
+import scala.collection.mutable.ArrayBuffer
+
+object SparkSubmit {
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+val appArgs = new SparkSubmitArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master.startsWith(local)) {
+clusterManager = LOCAL
+  } else {
+System.err.println(master must start with yarn, mesos, spark, or 
local)
+System.exit(1)
+  }
+}
+
+val deployOnCluster = appArgs.deployMode == cluster
--- End diff --

unfortunately not, since YARN and the standalone cluster both support 
running the driver inside or outside of the cluster, so now there is a second 
dimension.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10374821
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.File
+import java.net.URL
+import java.net.URLClassLoader
+
+import scala.collection.mutable.ArrayBuffer
+
+object SparkSubmit {
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+val appArgs = new SparkSubmitArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master.startsWith(local)) {
+clusterManager = LOCAL
+  } else {
+System.err.println(master must start with yarn, mesos, spark, or 
local)
+System.exit(1)
+  }
+}
+
+val deployOnCluster = appArgs.deployMode == cluster
--- End diff --

But that is specified via the master already ... via yarn-standalone vs
yarn-client.
Not sure about standalone cluster : so probably useful.

But if it can be inferred, would be better to simplify user interface.
On Mar 7, 2014 12:21 PM, Patrick Wendell notificati...@github.com wrote:

 In core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala:

  +if (appArgs.master != null) {
  +  if (appArgs.master.startsWith(yarn)) {
  +clusterManager = YARN
  +  } else if (appArgs.master.startsWith(spark)) {
  +clusterManager = STANDALONE
  +  } else if (appArgs.master.startsWith(mesos)) {
  +clusterManager = MESOS
  +  } else if (appArgs.master.startsWith(local)) {
  +clusterManager = LOCAL
  +  } else {
  +System.err.println(master must start with yarn, mesos, spark, 
or local)
  +System.exit(1)
  +  }
  +}
  +
  +val deployOnCluster = appArgs.deployMode == cluster

 unfortunately not, since YARN and the standalone cluster both support
 running the driver inside or outside of the cluster, so now there is a
 second dimension.

 --
 Reply to this email directly or view it on 
GitHubhttps://github.com/apache/spark/pull/86/files#r10374754
 .



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread sryza
GitHub user sryza opened a pull request:

https://github.com/apache/spark/pull/86

SPARK-1126.  spark-app preliminary

This is a starting version of the spark-app script for running compiled 
binaries against Spark.  It still needs tests and some polish.  The only 
testing I've done so far has been using it to launch jobs in yarn-standalone 
mode against a pseudo-distributed cluster.

This leaves out the changes required for launching python scripts.  I think 
it might be best to save those for another JIRA/PR (while keeping to the design 
so that they won't require backwards-incompatible changes).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sryza/spark sandy-spark-1126

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/86.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #86


commit e33a2233963ea771002b683572bd822d63689d00
Author: Sandy Ryza sa...@cloudera.com
Date:   2014-03-04T07:54:02Z

SPARK-1126.  spark-app preliminary




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10330438
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkAppArguments.scala ---
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import scala.collection.mutable.ArrayBuffer
+
+private[spark] class SparkAppArguments(args: Array[String]) {
+  var master: String = null
+  var deployMode: String = null
+  var executorMemory: String = null
+  var executorCores: String = null
+  var driverMemory: String = null
+  var supervise: Boolean = false
+  var queue: String = null
+  var numExecutors: String = null
+  var files: String = null
+  var archives: String = null
+  var mainClass: String = null
+  var primaryResource: String = null
+  var name: String = null
+  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
+  var moreJars: String = null
+  var clientClasspath: String = null
+
+  loadEnvVars()
+  parseArgs(args.toList)
+
+  def loadEnvVars() {
+master = System.getenv(MASTER)
+deployMode = System.getenv(DEPLOY_MODE)
+  }
+
+  def parseArgs(args: List[String]) {
+primaryResource = args(0)
+parseOpts(args.tail)
+  }
+
+  def parseOpts(opts: List[String]): Unit = opts match {
+case (--name) :: value :: tail =
+  name = value
+  parseOpts(tail)
+
+case (--master) :: value :: tail =
+  master = value
+  parseOpts(tail)
+
+case (--class) :: value :: tail =
+  mainClass = value
+  parseOpts(tail)
+
+case (--deploy-mode) :: value :: tail =
+  if (value != client  value != cluster) {
+System.err.println(--deploy-mode must be either \client\ or 
\cluster\)
+System.exit(1)
+  }
+  deployMode = value
+  parseOpts(tail)
+
+case (--num-executors) :: value :: tail =
+  numExecutors = value
+  parseOpts(tail)
+
+case (--executor-cores) :: value :: tail =
+  executorCores = value
+  parseOpts(tail)
+
+case (--executor-memory) :: value :: tail =
+  executorMemory = value
+  parseOpts(tail)
+
+case (--driver-memory) :: value :: tail =
+  driverMemory = value
+  parseOpts(tail)
+
+case (--supervise) :: tail =
+  supervise = true
+  parseOpts(tail)
+
+case (--queue) :: value :: tail =
+  queue = value
+  parseOpts(tail)
+
+case (--files) :: value :: tail =
+  files = value
+  parseOpts(tail)
+
+case (--archives) :: value :: tail =
+  archives = value
+  parseOpts(tail)
+
+case (--arg) :: value :: tail =
+  childArgs += value
+  parseOpts(tail)
+
+case (--more-jars) :: value :: tail =
+  moreJars = value
+  parseOpts(tail)
+
+case (--client-classpath) :: value :: tail =
+  clientClasspath = value
+  parseOpts(tail)
+
+case (--help | -h) :: tail =
+  printUsageAndExit(0)
+
+case Nil =
+
+case _ =
+  printUsageAndExit(1, opts)
+  }
+
+  def printUsageAndExit(exitCode: Int, unknownParam: Any = null) {
+if (unknownParam != null) {
+  System.err.println(Unknown/unsupported param  + unknownParam)
+}
+System.err.println(
+  Usage: spark-app primary binary [options] \n +
+Options:\n +
+  --master MASTER_URLspark://host:port, 
mesos://host:port, yarn, or local\n +
--- End diff --

I'd disable the scalastyle here so that it doesn't produce build errors. In 
this case I think it's fine to violate the line limit:
http://www.scalastyle.org/configuration.html

also there is a different way to do multline strings in scala - but up to 
you...

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10330672
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkAppArguments.scala ---
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import scala.collection.mutable.ArrayBuffer
+
+private[spark] class SparkAppArguments(args: Array[String]) {
+  var master: String = null
+  var deployMode: String = null
+  var executorMemory: String = null
+  var executorCores: String = null
+  var driverMemory: String = null
+  var supervise: Boolean = false
+  var queue: String = null
+  var numExecutors: String = null
+  var files: String = null
+  var archives: String = null
+  var mainClass: String = null
+  var primaryResource: String = null
+  var name: String = null
+  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
+  var moreJars: String = null
+  var clientClasspath: String = null
+
+  loadEnvVars()
+  parseArgs(args.toList)
+
+  def loadEnvVars() {
+master = System.getenv(MASTER)
+deployMode = System.getenv(DEPLOY_MODE)
+  }
+
+  def parseArgs(args: List[String]) {
+primaryResource = args(0)
+parseOpts(args.tail)
+  }
+
+  def parseOpts(opts: List[String]): Unit = opts match {
+case (--name) :: value :: tail =
+  name = value
+  parseOpts(tail)
+
+case (--master) :: value :: tail =
+  master = value
+  parseOpts(tail)
+
+case (--class) :: value :: tail =
+  mainClass = value
+  parseOpts(tail)
+
+case (--deploy-mode) :: value :: tail =
+  if (value != client  value != cluster) {
+System.err.println(--deploy-mode must be either \client\ or 
\cluster\)
+System.exit(1)
+  }
+  deployMode = value
+  parseOpts(tail)
+
+case (--num-executors) :: value :: tail =
+  numExecutors = value
+  parseOpts(tail)
+
+case (--executor-cores) :: value :: tail =
+  executorCores = value
+  parseOpts(tail)
+
+case (--executor-memory) :: value :: tail =
+  executorMemory = value
+  parseOpts(tail)
+
+case (--driver-memory) :: value :: tail =
+  driverMemory = value
+  parseOpts(tail)
+
+case (--supervise) :: tail =
+  supervise = true
+  parseOpts(tail)
+
+case (--queue) :: value :: tail =
+  queue = value
+  parseOpts(tail)
+
+case (--files) :: value :: tail =
+  files = value
+  parseOpts(tail)
+
+case (--archives) :: value :: tail =
+  archives = value
+  parseOpts(tail)
+
+case (--arg) :: value :: tail =
+  childArgs += value
+  parseOpts(tail)
+
+case (--more-jars) :: value :: tail =
+  moreJars = value
+  parseOpts(tail)
+
+case (--client-classpath) :: value :: tail =
+  clientClasspath = value
+  parseOpts(tail)
+
+case (--help | -h) :: tail =
+  printUsageAndExit(0)
+
+case Nil =
+
+case _ =
+  printUsageAndExit(1, opts)
+  }
+
+  def printUsageAndExit(exitCode: Int, unknownParam: Any = null) {
+if (unknownParam != null) {
+  System.err.println(Unknown/unsupported param  + unknownParam)
+}
+System.err.println(
+  Usage: spark-app primary binary [options] \n +
+Options:\n +
+  --master MASTER_URLspark://host:port, 
mesos://host:port, yarn, or local\n +
+  --deploy-mode DEPLOY_MODE  Mode to deploy the app in, either 
\client\ or \cluster\\n +
+  --class CLASS_NAME Name of your application's main 
class (required for Java apps)\n +
+  --arg ARG  Argument to be passed to your 
application's main class.\n +
+ 

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10330871
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.BufferedReader
+import java.io.InputStream
+import java.io.InputStreamReader
+import java.io.PrintStream
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.JavaConverters._
+
+object SparkApp {
+  val CLIENT = 1
+  val CLUSTER = 2
+  val YARN = 1
--- End diff --

These are created as flags, but since they are mutually exclusive would it 
make more sense for them to be an `Enumeration`?
http://www.scala-lang.org/api/2.10.2/index.html#scala.Enumeration

I.e. is it possible for something to be both YARN and LOCAL?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10331089
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.BufferedReader
+import java.io.InputStream
+import java.io.InputStreamReader
+import java.io.PrintStream
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.JavaConverters._
+
+object SparkApp {
+  val CLIENT = 1
+  val CLUSTER = 2
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+println(args:  + args.toList)
+val appArgs = new SparkAppArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
--- End diff --

This has bothered me too.  Would love to make that change if it's not too 
late.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10331562
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkAppArguments.scala ---
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import scala.collection.mutable.ArrayBuffer
+
+private[spark] class SparkAppArguments(args: Array[String]) {
+  var master: String = null
+  var deployMode: String = null
+  var executorMemory: String = null
+  var executorCores: String = null
+  var driverMemory: String = null
+  var supervise: Boolean = false
+  var queue: String = null
+  var numExecutors: String = null
+  var files: String = null
+  var archives: String = null
+  var mainClass: String = null
+  var primaryResource: String = null
+  var name: String = null
+  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
+  var moreJars: String = null
+  var clientClasspath: String = null
+
+  loadEnvVars()
+  parseArgs(args.toList)
+
+  def loadEnvVars() {
+master = System.getenv(MASTER)
+deployMode = System.getenv(DEPLOY_MODE)
+  }
+
+  def parseArgs(args: List[String]) {
+primaryResource = args(0)
+parseOpts(args.tail)
+  }
+
+  def parseOpts(opts: List[String]): Unit = opts match {
+case (--name) :: value :: tail =
+  name = value
+  parseOpts(tail)
+
+case (--master) :: value :: tail =
+  master = value
+  parseOpts(tail)
+
+case (--class) :: value :: tail =
+  mainClass = value
+  parseOpts(tail)
+
+case (--deploy-mode) :: value :: tail =
+  if (value != client  value != cluster) {
+System.err.println(--deploy-mode must be either \client\ or 
\cluster\)
+System.exit(1)
+  }
+  deployMode = value
+  parseOpts(tail)
+
+case (--num-executors) :: value :: tail =
+  numExecutors = value
+  parseOpts(tail)
+
+case (--executor-cores) :: value :: tail =
+  executorCores = value
+  parseOpts(tail)
+
+case (--executor-memory) :: value :: tail =
+  executorMemory = value
+  parseOpts(tail)
+
+case (--driver-memory) :: value :: tail =
+  driverMemory = value
+  parseOpts(tail)
+
+case (--supervise) :: tail =
+  supervise = true
+  parseOpts(tail)
+
+case (--queue) :: value :: tail =
+  queue = value
+  parseOpts(tail)
+
+case (--files) :: value :: tail =
+  files = value
+  parseOpts(tail)
+
+case (--archives) :: value :: tail =
+  archives = value
+  parseOpts(tail)
+
+case (--arg) :: value :: tail =
+  childArgs += value
+  parseOpts(tail)
+
+case (--more-jars) :: value :: tail =
+  moreJars = value
+  parseOpts(tail)
+
+case (--client-classpath) :: value :: tail =
+  clientClasspath = value
+  parseOpts(tail)
+
+case (--help | -h) :: tail =
+  printUsageAndExit(0)
+
+case Nil =
+
+case _ =
+  printUsageAndExit(1, opts)
+  }
+
+  def printUsageAndExit(exitCode: Int, unknownParam: Any = null) {
+if (unknownParam != null) {
+  System.err.println(Unknown/unsupported param  + unknownParam)
+}
+System.err.println(
+  Usage: spark-app primary binary [options] \n +
+Options:\n +
+  --master MASTER_URLspark://host:port, 
mesos://host:port, yarn, or local\n +
+  --deploy-mode DEPLOY_MODE  Mode to deploy the app in, either 
\client\ or \cluster\\n +
+  --class CLASS_NAME Name of your application's main 
class (required for Java apps)\n +
+  --arg ARG  Argument to be passed to your 
application's main class.\n +
+ 

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10332560
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.BufferedReader
+import java.io.InputStream
+import java.io.InputStreamReader
+import java.io.PrintStream
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.JavaConverters._
+
+object SparkApp {
+  val CLIENT = 1
+  val CLUSTER = 2
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+println(args:  + args.toList)
+val appArgs = new SparkAppArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master == local) {
+clusterManager = LOCAL
+  } else {
+System.err.println(master must start with yarn, mesos, spark, or 
be local)
+System.exit(1)
+  }
+}
+
+val deployMode = if (appArgs.deployMode == client) CLIENT else 
CLUSTER
+val childEnv = new HashMap[String, String]()
+val childClasspath = new ArrayBuffer[String]()
+val childArgs = new ArrayBuffer[String]()
+childArgs += System.getenv(SPARK_HOME) + /bin/spark-class
+
+if (clusterManager == MESOS  deployMode == CLUSTER) {
+  System.err.println(Mesos does not support running the driver on the 
cluster)
+  System.exit(1)
+}
+
+if (deployMode == CLUSTER  clusterManager == STANDALONE) {
+  childArgs += org.apache.spark.deploy.Client
+  childArgs += launch
--- End diff --

needs whitespace ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10332582
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.BufferedReader
+import java.io.InputStream
+import java.io.InputStreamReader
+import java.io.PrintStream
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.JavaConverters._
+
+object SparkApp {
+  val CLIENT = 1
+  val CLUSTER = 2
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+println(args:  + args.toList)
+val appArgs = new SparkAppArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master == local) {
--- End diff --

This should be startsWith because you can do local[4] and stuff like that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10332644
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.BufferedReader
+import java.io.InputStream
+import java.io.InputStreamReader
+import java.io.PrintStream
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.JavaConverters._
+
+object SparkApp {
+  val CLIENT = 1
+  val CLUSTER = 2
+  val YARN = 1
--- End diff --

At the very least though, separating these two sets of constants to make it 
clear that they cover different things would help. E.g. give them prefixes, 
like `MANAGER_LOCAL` and `MODE_CLIENT`, or maybe make the client vs cluster 
thing a boolean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10332654
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.BufferedReader
+import java.io.InputStream
+import java.io.InputStreamReader
+import java.io.PrintStream
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.JavaConverters._
+
+object SparkApp {
+  val CLIENT = 1
+  val CLUSTER = 2
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+println(args:  + args.toList)
+val appArgs = new SparkAppArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master == local) {
+clusterManager = LOCAL
+  } else {
+System.err.println(master must start with yarn, mesos, spark, or 
be local)
+System.exit(1)
+  }
+}
+
+val deployMode = if (appArgs.deployMode == client) CLIENT else 
CLUSTER
+val childEnv = new HashMap[String, String]()
+val childClasspath = new ArrayBuffer[String]()
+val childArgs = new ArrayBuffer[String]()
+childArgs += System.getenv(SPARK_HOME) + /bin/spark-class
+
+if (clusterManager == MESOS  deployMode == CLUSTER) {
+  System.err.println(Mesos does not support running the driver on the 
cluster)
+  System.exit(1)
+}
+
+if (deployMode == CLUSTER  clusterManager == STANDALONE) {
+  childArgs += org.apache.spark.deploy.Client
+  childArgs += launch
+  childArgs += appArgs.master
+  childArgs += appArgs.primaryResource
+  childArgs += appArgs.mainClass
+} else if (deployMode == CLUSTER  clusterManager == YARN) {
+  childArgs += org.apache.spark.deploy.yarn.Client
+  childArgs += --jar
+  childArgs += appArgs.primaryResource
+  childArgs += --class
+  childArgs += appArgs.mainClass
+} else {
+  childClasspath += appArgs.primaryResource
+  childArgs += appArgs.mainClass
+}
+
+// TODO: num-executors when not using YARN
+val options = List[Opt](
+  new Opt(appArgs.driverMemory, YARN, CLUSTER, null, 
--master-memory, null),
+  new Opt(appArgs.name, YARN, CLUSTER, null, --name, null),
+  new Opt(appArgs.queue, YARN, CLUSTER, null, --queue, null),
+  new Opt(appArgs.queue, YARN, CLIENT, SPARK_YARN_QUEUE, null, null),
+  new Opt(appArgs.numExecutors, YARN, CLUSTER, null, --num-workers, 
null),
+  new Opt(appArgs.executorMemory, YARN, CLIENT, SPARK_WORKER_MEMORY, 
null, null),
+  new Opt(appArgs.executorMemory, YARN, CLUSTER, null, 
--worker-memory, null),
+  new Opt(appArgs.executorMemory, STANDALONE, CLUSTER, null, 
--memory, null),
+  new Opt(appArgs.executorMemory, STANDALONE | MESOS, CLIENT, null, 
null, spark.executor.memory),
+  new Opt(appArgs.executorCores, YARN, CLIENT, SPARK_WORKER_CORES, 
null, null),
+  new Opt(appArgs.executorCores, YARN, CLUSTER, null, 
--worker-cores, null),
+  new Opt(appArgs.executorCores, STANDALONE, CLUSTER, null, --cores, 
null),
+  new Opt(appArgs.executorCores, STANDALONE | MESOS, CLIENT, null, 
null, spark.cores.max),
+  new Opt(appArgs.files, YARN, CLIENT, SPARK_YARN_DIST_FILES, null, 
null),
+  new 

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10332670
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.BufferedReader
+import java.io.InputStream
+import java.io.InputStreamReader
+import java.io.PrintStream
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.JavaConverters._
+
+object SparkApp {
+  val CLIENT = 1
+  val CLUSTER = 2
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+println(args:  + args.toList)
+val appArgs = new SparkAppArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith(yarn)) {
+clusterManager = YARN
+  } else if (appArgs.master.startsWith(spark)) {
+clusterManager = STANDALONE
+  } else if (appArgs.master.startsWith(mesos)) {
+clusterManager = MESOS
+  } else if (appArgs.master == local) {
+clusterManager = LOCAL
+  } else {
+System.err.println(master must start with yarn, mesos, spark, or 
be local)
+System.exit(1)
+  }
+}
+
+val deployMode = if (appArgs.deployMode == client) CLIENT else 
CLUSTER
+val childEnv = new HashMap[String, String]()
+val childClasspath = new ArrayBuffer[String]()
+val childArgs = new ArrayBuffer[String]()
+childArgs += System.getenv(SPARK_HOME) + /bin/spark-class
+
+if (clusterManager == MESOS  deployMode == CLUSTER) {
+  System.err.println(Mesos does not support running the driver on the 
cluster)
+  System.exit(1)
+}
+
+if (deployMode == CLUSTER  clusterManager == STANDALONE) {
+  childArgs += org.apache.spark.deploy.Client
+  childArgs += launch
+  childArgs += appArgs.master
+  childArgs += appArgs.primaryResource
+  childArgs += appArgs.mainClass
+} else if (deployMode == CLUSTER  clusterManager == YARN) {
+  childArgs += org.apache.spark.deploy.yarn.Client
+  childArgs += --jar
+  childArgs += appArgs.primaryResource
+  childArgs += --class
+  childArgs += appArgs.mainClass
+} else {
+  childClasspath += appArgs.primaryResource
+  childArgs += appArgs.mainClass
+}
+
+// TODO: num-executors when not using YARN
+val options = List[Opt](
+  new Opt(appArgs.driverMemory, YARN, CLUSTER, null, 
--master-memory, null),
+  new Opt(appArgs.name, YARN, CLUSTER, null, --name, null),
+  new Opt(appArgs.queue, YARN, CLUSTER, null, --queue, null),
+  new Opt(appArgs.queue, YARN, CLIENT, SPARK_YARN_QUEUE, null, null),
+  new Opt(appArgs.numExecutors, YARN, CLUSTER, null, --num-workers, 
null),
+  new Opt(appArgs.executorMemory, YARN, CLIENT, SPARK_WORKER_MEMORY, 
null, null),
+  new Opt(appArgs.executorMemory, YARN, CLUSTER, null, 
--worker-memory, null),
+  new Opt(appArgs.executorMemory, STANDALONE, CLUSTER, null, 
--memory, null),
+  new Opt(appArgs.executorMemory, STANDALONE | MESOS, CLIENT, null, 
null, spark.executor.memory),
+  new Opt(appArgs.executorCores, YARN, CLIENT, SPARK_WORKER_CORES, 
null, null),
+  new Opt(appArgs.executorCores, YARN, CLUSTER, null, 
--worker-cores, null),
+  new Opt(appArgs.executorCores, STANDALONE, CLUSTER, null, --cores, 
null),
+  new Opt(appArgs.executorCores, STANDALONE | MESOS, CLIENT, null, 
null, spark.cores.max),
+  new Opt(appArgs.files, YARN, CLIENT, SPARK_YARN_DIST_FILES, null, 
null),
+  new 

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10332688
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkAppArguments.scala ---
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import scala.collection.mutable.ArrayBuffer
+
+private[spark] class SparkAppArguments(args: Array[String]) {
+  var master: String = null
+  var deployMode: String = null
+  var executorMemory: String = null
+  var executorCores: String = null
+  var driverMemory: String = null
+  var supervise: Boolean = false
+  var queue: String = null
+  var numExecutors: String = null
+  var files: String = null
+  var archives: String = null
+  var mainClass: String = null
+  var primaryResource: String = null
+  var name: String = null
+  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
+  var moreJars: String = null
+  var clientClasspath: String = null
--- End diff --

clientClasspath is not used ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/86#issuecomment-36828689
  
Thanks for taking a look, Matei.  If we use system properties instead of 
env variables, the remaining reason we'd want to start a second JVM is to be 
able to have a --driver-memory property.  The only way around this I can think 
of would be to require users to set this with an environment variable instead 
of a command line option.  One small weird thing about this is that the client 
would still be given the max heap specified in driver SPARK_DRIVER_MEMORY even 
when the driver is being run on the cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---