[GitHub] spark pull request: [WIP] In yarn.ClientBase spark.yarn.dist.* do ...

2014-06-04 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/969

[WIP] In yarn.ClientBase spark.yarn.dist.* do not work



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark yarn_ClientBase

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/969.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #969


commit 836248956ff3ef17d44cea37b357e3616b054d64
Author: witgo 
Date:   2014-06-04T16:50:12Z

yarn.ClientBase spark.yarn.dist.* do not work




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] In yarn.ClientBase spark.yarn.dist.* do ...

2014-06-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/969#issuecomment-45120602
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] In yarn.ClientBase spark.yarn.dist.* do ...

2014-06-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/969#issuecomment-45120576
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] In yarn.ClientBase spark.yarn.dist.* do ...

2014-06-04 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/969#issuecomment-45124529
  
Please provide more description or problem and how to reproduce and open a 
jira.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] In yarn.ClientBase spark.yarn.dist.* do ...

2014-06-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/969#issuecomment-45125353
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15449/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] In yarn.ClientBase spark.yarn.dist.* do ...

2014-06-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/969#issuecomment-45125352
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] In yarn.ClientBase spark.yarn.dist.* do ...

2014-06-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/969#discussion_r13417150
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -220,10 +220,21 @@ trait ClientBase extends Logging {
   }
 }
 
+def getArg(arg: String, envVar: String, sysProp: String): String = {
+  if (arg != null && !arg.isEmpty) {
+arg
+  } else if (System.getenv(envVar) != null && 
!System.getenv(envVar).isEmpty) {
+System.getenv(envVar)
+  } else {
+sparkConf.getOption(sysProp).orNull
+  }
+}
 var cachedSecondaryJarLinks = ListBuffer.empty[String]
-val fileLists = List( (args.addJars, LocalResourceType.FILE, true),
-  (args.files, LocalResourceType.FILE, false),
-  (args.archives, LocalResourceType.ARCHIVE, false) )
+val fileLists = List((args.addJars, LocalResourceType.FILE, true),
+  (getArg(args.files, "SPARK_YARN_DIST_FILES", 
"spark.yarn.dist.files"),
+LocalResourceType.FILE, false),
+  (getArg(args.archives, "SPARK_YARN_DIST_ARCHIVES", 
"spark.yarn.dist.archives"),
--- End diff --

I don't think env variables and conf entries should be handled here like 
this.

YarnClientSchedulerBackend already deals with the env variable and command 
line option for client mode. It seems that SparkSubmit might be missing code to 
handle the env variable for cluster mode, though. Probably better to fix it 
there, and leave this code to deal only with the command line args (which are 
already correctly parsed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] In yarn.ClientBase spark.yarn.dist.* do ...

2014-06-05 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/969#issuecomment-45224988
  
 Spark configuration
`conf/spark-defaults.conf` => 
```
spark.yarn.dist.archives /toona/conf
spark.executor.extraClassPath ./conf
spark.driver.extraClassPath  ./conf
```


HDFS directory
`hadoop dfs -cat /toona/conf/toona.conf` =>
```
 redis.num=4
```
-
The following command execution fails
```shell
YARN_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit  --num-executors 2 
--driver-memory 2g --executor-memory 2g --master yarn-cluster --class 
toona.DeployTest toona-assembly.jar  
```


The following is testing the code
```scala
package toona
import com.typesafe.config.Config
import com.typesafe.config.ConfigFactory

object DeployTest {
  def main(args: Array[String]) {
val conf = ConfigFactory.load("toona.conf")
val redisNum = conf.getInt("redis.num") // Here will throw an 
`ConfigException` exception
assert(redisNum == 4)

  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---