[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-06-09 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-45523729
  
@tgravescs I fixed the test, squashed the commit and repointed the branch. 
The latest commit addresses the test failure you were seeing while running it 
for Hadoop 0.23.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-06-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/433


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-06-09 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-45545343
  
Looks good. Thanks @berngp 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-06-06 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r13499967
  
--- Diff: 
yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSuite.scala 
---
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.net.URI
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.mapreduce.MRJobConfig
+import org.apache.hadoop.yarn.conf.YarnConfiguration
+import org.apache.hadoop.yarn.api.ApplicationConstants.Environment
+
+import org.scalatest.FunSuite
+import org.scalatest.matchers.ShouldMatchers._
+
+import scala.collection.JavaConversions._
+import scala.collection.mutable.{ HashMap = MutableHashMap }
+import scala.util.Try
+
+
+class ClientBaseSuite extends FunSuite {
+
+  test(default Yarn application classpath) {
+ClientBase.getDefaultYarnApplicationClasspath should 
be(Some(Fixtures.knownDefYarnAppCP))
+  }
+
+  test(default MR application classpath) {
+ClientBase.getDefaultMRApplicationClasspath should 
be(Some(Fixtures.knownDefMRAppCP))
+  }
+
+  test(resultant classpath for an application that defines a classpath 
for YARN) {
+withAppConf(Fixtures.mapYARNAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(
+flatten(Fixtures.knownYARNAppCP, 
ClientBase.getDefaultMRApplicationClasspath))
+}
+  }
+
+  test(resultant classpath for an application that defines a classpath 
for MR) {
+withAppConf(Fixtures.mapMRAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(
+flatten(ClientBase.getDefaultYarnApplicationClasspath, 
Fixtures.knownMRAppCP))
+}
+  }
+
+  test(resultant classpath for an application that defines both 
classpaths, YARN and MR) {
+withAppConf(Fixtures.mapAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(flatten(Fixtures.knownYARNAppCP, 
Fixtures.knownMRAppCP))
+}
+  }
+
+  object Fixtures {
+
+val knownDefYarnAppCP: Seq[String] =
+  getFieldValue[Array[String], Seq[String]](classOf[YarnConfiguration],
+
DEFAULT_YARN_APPLICATION_CLASSPATH,
+Seq[String]())(a = 
a.toSeq)
+
+
+val knownDefMRAppCP: Seq[String] =
+  getFieldValue[String, Seq[String]](classOf[MRJobConfig],
+ 
DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH,
+ Seq[String]())(a = a.split(,))
+
+val knownYARNAppCP = Some(Seq(/known/yarn/path))
+
+val knownMRAppCP = Some(Seq(/known/mr/path))
+
+val mapMRAppConf =
+  Map(mapreduce.application.classpath - 
knownMRAppCP.map(_.mkString(:)).get)
+
+val mapYARNAppConf =
+  Map(YarnConfiguration.YARN_APPLICATION_CLASSPATH - 
knownYARNAppCP.map(_.mkString(:)).get)
+
+val mapAppConf = mapYARNAppConf ++ mapMRAppConf
+  }
+
+  def withAppConf(m: Map[String, String] = Map())(testCode: 
(Configuration) = Any) {
+val conf = new Configuration
+m.foreach { case (k, v) = conf.set(k, v, ClientBaseSpec) }
+testCode(conf)
+  }
+
+  def newEnv = MutableHashMap[String, String]()
+
+  def classpath(env: MutableHashMap[String, String]) = 
env(Environment.CLASSPATH.name).split(:|;)
+
+  def flatten(a: Option[Seq[String]], b: Option[Seq[String]]) = (a ++ 
b).flatten.toArray
+
+  def getFieldValue[A, B](clazz: Class[_], field: String, defaults: = 
B)(mapTo: A = B): B =
--- End diff --

@tgravescs tested for Hadoop 0.23 and 2.4.0. A bit silly that I made such 
mistake when the main 

[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-06-05 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r13456889
  
--- Diff: 
yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSuite.scala 
---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.net.URI
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.mapreduce.MRJobConfig
+import org.apache.hadoop.yarn.conf.YarnConfiguration
+import org.apache.hadoop.yarn.api.ApplicationConstants.Environment
+
+import org.scalatest.FunSuite
+import org.scalatest.matchers.ShouldMatchers._
+
+import scala.collection.JavaConversions._
+import scala.collection.mutable.{ HashMap = MutableHashMap }
+import scala.util.Try
+
+
+class ClientBaseSuite extends FunSuite {
+
+  test(default Yarn application classpath) {
+ClientBase.getDefaultYarnApplicationClasspath.get should 
be(Fixtures.knownDefYarnAppCP.get)
+  }
+
+  test(default MR application classpath) {
+ClientBase.getDefaultMRApplicationClasspath.get should 
be(Fixtures.knownDefMRAppCP.get)
+  }
+
+  test(resultant classpath for an application that defines a classpath 
for YARN) {
+withAppConf(Fixtures.mapYARNAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(
+flatten(Fixtures.knownYARNAppCP, 
ClientBase.getDefaultMRApplicationClasspath))
+}
+  }
+
+  test(resultant classpath for an application that defines a classpath 
for MR) {
+withAppConf(Fixtures.mapMRAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(
+flatten(ClientBase.getDefaultYarnApplicationClasspath, 
Fixtures.knownMRAppCP))
+}
+  }
+
+  test(resultant classpath for an application that defines both 
classpaths, YARN and MR) {
+withAppConf(Fixtures.mapAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(flatten(Fixtures.knownYARNAppCP, 
Fixtures.knownMRAppCP))
+}
+  }
+
+  object Fixtures {
+
+val knownDefYarnAppCP: Option[Seq[String]] =
+  Some(YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH)
--- End diff --

DEFAULT_YARN_APPLICATION_CLASSPATH doesn't exist in hadoop 0.23 so we can't 
use it directly. That is why we use the reflection in the ClientBase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-06-05 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r13456914
  
--- Diff: 
yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSuite.scala 
---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.net.URI
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.mapreduce.MRJobConfig
+import org.apache.hadoop.yarn.conf.YarnConfiguration
+import org.apache.hadoop.yarn.api.ApplicationConstants.Environment
+
+import org.scalatest.FunSuite
+import org.scalatest.matchers.ShouldMatchers._
+
+import scala.collection.JavaConversions._
+import scala.collection.mutable.{ HashMap = MutableHashMap }
+import scala.util.Try
+
+
+class ClientBaseSuite extends FunSuite {
+
+  test(default Yarn application classpath) {
+ClientBase.getDefaultYarnApplicationClasspath.get should 
be(Fixtures.knownDefYarnAppCP.get)
+  }
+
+  test(default MR application classpath) {
+ClientBase.getDefaultMRApplicationClasspath.get should 
be(Fixtures.knownDefMRAppCP.get)
+  }
+
+  test(resultant classpath for an application that defines a classpath 
for YARN) {
+withAppConf(Fixtures.mapYARNAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(
+flatten(Fixtures.knownYARNAppCP, 
ClientBase.getDefaultMRApplicationClasspath))
+}
+  }
+
+  test(resultant classpath for an application that defines a classpath 
for MR) {
+withAppConf(Fixtures.mapMRAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(
+flatten(ClientBase.getDefaultYarnApplicationClasspath, 
Fixtures.knownMRAppCP))
+}
+  }
+
+  test(resultant classpath for an application that defines both 
classpaths, YARN and MR) {
+withAppConf(Fixtures.mapAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(flatten(Fixtures.knownYARNAppCP, 
Fixtures.knownMRAppCP))
+}
+  }
+
+  object Fixtures {
+
+val knownDefYarnAppCP: Option[Seq[String]] =
+  Some(YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH)
+
+val knownDefMRAppCP: Option[Seq[String]] =
+  Some(MRJobConfig.DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH.split(,))
--- End diff --

DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH doesn't exist in hadoop 0.23 either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-06-05 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r13458231
  
--- Diff: 
yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSuite.scala 
---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.net.URI
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.mapreduce.MRJobConfig
+import org.apache.hadoop.yarn.conf.YarnConfiguration
+import org.apache.hadoop.yarn.api.ApplicationConstants.Environment
+
+import org.scalatest.FunSuite
+import org.scalatest.matchers.ShouldMatchers._
+
+import scala.collection.JavaConversions._
+import scala.collection.mutable.{ HashMap = MutableHashMap }
+import scala.util.Try
+
+
+class ClientBaseSuite extends FunSuite {
+
+  test(default Yarn application classpath) {
+ClientBase.getDefaultYarnApplicationClasspath.get should 
be(Fixtures.knownDefYarnAppCP.get)
+  }
+
+  test(default MR application classpath) {
+ClientBase.getDefaultMRApplicationClasspath.get should 
be(Fixtures.knownDefMRAppCP.get)
+  }
+
+  test(resultant classpath for an application that defines a classpath 
for YARN) {
+withAppConf(Fixtures.mapYARNAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(
+flatten(Fixtures.knownYARNAppCP, 
ClientBase.getDefaultMRApplicationClasspath))
+}
+  }
+
+  test(resultant classpath for an application that defines a classpath 
for MR) {
+withAppConf(Fixtures.mapMRAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(
+flatten(ClientBase.getDefaultYarnApplicationClasspath, 
Fixtures.knownMRAppCP))
+}
+  }
+
+  test(resultant classpath for an application that defines both 
classpaths, YARN and MR) {
+withAppConf(Fixtures.mapAppConf) { conf =
+  val env = newEnv
+  ClientBase.populateHadoopClasspath(conf, env)
+  classpath(env) should be(flatten(Fixtures.knownYARNAppCP, 
Fixtures.knownMRAppCP))
+}
+  }
+
+  object Fixtures {
+
+val knownDefYarnAppCP: Option[Seq[String]] =
+  Some(YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH)
--- End diff --

Thanks for caching this. 

I will wrap the call in a Try. e.g.
```
Try(YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH).toOption
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-06-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-45264422
  
Mostly looks good. Fix those couple minor test issues for hadoop 0.23 and 
I'll commit it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-06-04 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-45146709
  
@berngp  can you upmerge to the latest master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-06-04 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-45159895
  
@tgravescs done, thank you again for following this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-05-22 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-43892089
  
There is still a lot going on with spark 1.0. I'm going to wait for that to 
setting down and then review this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-05-17 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-43404871
  
@berngp sorry for the delay I think everyone has been busy with the spark 
1.0 release.  I think this should still be fixed in spark 1.1.  Lets leave this 
pr as is and I'll review it this week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-05-16 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-43383729
  
@tgravescs any thoughts around this, should I just close this pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-05-05 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-42254447
  
Hi all, just checking the state of affairs with this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-29 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-41725598
  
So @sryza, @tgravescs and @pwendell should we increase the scope of the 
changes or close this one, merge it hopefully, and create a broader ticket to 
address testing and additional refactoring?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-23 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11917001
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,85 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Seq[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultYarnApplicationClasspath
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultMRApplicationClasspath
+}
+
+  def getDefaultYarnApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
--- End diff --

Maybe I have misunderstood something but do we really need to use 
reflection to get DEFAULT_YARN_APPLICATION_CLASSPATH? Can't we simply get it as 
YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH (similar to 
YarnConfiguration.YARN_APPLICATION_CLASSPATH)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-23 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11917358
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,85 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Seq[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultYarnApplicationClasspath
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultMRApplicationClasspath
+}
+
+  def getDefaultYarnApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
--- End diff --

Excellent question @rahulsinghaliitd is all related with the 
_instability_ of the YARN API and the different versions people are using. 
See https://issues.apache.org/jira/browse/SPARK-1233


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-23 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11918623
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,85 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Seq[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultYarnApplicationClasspath
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultMRApplicationClasspath
+}
+
+  def getDefaultYarnApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+  val value = field.get(null).asInstanceOf[Array[String]]
+  value.toSeq
+} recoverWith {
+  case e: NoSuchFieldException = Success(Seq.empty[String])
 }
+
+triedDefault match {
+  case f: Failure[_] =
+logError(Unable to obtain the default YARN Application 
classpath., f.exception)
+  case s: Success[_] =
+logDebug(sUsing the default YARN application classpath: 
${s.get.mkString(,)})
+}
+
+triedDefault.toOption
   }
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
-   * So we need to use reflection to retrieve it.
+   * So we need to use reflection to retrieve it
--- End diff --

Any reason for removing this period?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-23 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11918859
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,85 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Seq[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultYarnApplicationClasspath
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultMRApplicationClasspath
+}
+
+  def getDefaultYarnApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+  val value = field.get(null).asInstanceOf[Array[String]]
+  value.toSeq
+} recoverWith {
+  case e: NoSuchFieldException = Success(Seq.empty[String])
 }
+
+triedDefault match {
+  case f: Failure[_] =
+logError(Unable to obtain the default YARN Application 
classpath., f.exception)
+  case s: Success[_] =
+logDebug(sUsing the default YARN application classpath: 
${s.get.mkString(,)})
+}
+
+triedDefault.toOption
   }
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
-   * So we need to use reflection to retrieve it.
+   * So we need to use reflection to retrieve it
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
+  def getDefaultMRApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
   val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
+  val value = if (field.getType == classOf[String]) {
+
StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
   } else {
 field.get(null).asInstanceOf[Array[String]]
   }
-} catch {
-  case err: NoSuchFieldError = null
--- End diff --

Do we not need to handle NoSuchFieldError?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-23 Thread rahulsinghaliitd
Github user rahulsinghaliitd commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11920472
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,85 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Seq[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultYarnApplicationClasspath
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultMRApplicationClasspath
+}
+
+  def getDefaultYarnApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
--- End diff --

Got it!

I now see that the DEFAULT version is not present in hadoop 0.23. Then I 
wondered that if we truly want to be YARN API agnostic then shouldn't we get  
YARN_APPLICATION_CLASSPATH also via reflection. But I guess it is safe to 
assume that YARN_APPLICATION_CLASSPATH is here to stay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-23 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11920584
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,85 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Seq[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultYarnApplicationClasspath
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultMRApplicationClasspath
+}
+
+  def getDefaultYarnApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
--- End diff --

@rahulsinghaliitd that I hope.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-23 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11920723
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,85 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Seq[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultYarnApplicationClasspath
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultMRApplicationClasspath
+}
+
+  def getDefaultYarnApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+  val value = field.get(null).asInstanceOf[Array[String]]
+  value.toSeq
+} recoverWith {
+  case e: NoSuchFieldException = Success(Seq.empty[String])
 }
+
+triedDefault match {
+  case f: Failure[_] =
+logError(Unable to obtain the default YARN Application 
classpath., f.exception)
+  case s: Success[_] =
+logDebug(sUsing the default YARN application classpath: 
${s.get.mkString(,)})
+}
+
+triedDefault.toOption
   }
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
-   * So we need to use reflection to retrieve it.
+   * So we need to use reflection to retrieve it
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
+  def getDefaultMRApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
   val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
+  val value = if (field.getType == classOf[String]) {
+
StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
   } else {
 field.get(null).asInstanceOf[Array[String]]
   }
-} catch {
-  case err: NoSuchFieldError = null
--- End diff --

@sryza I removed the `NoSuchFieldError` since in this context it can't 
happen, both classes and fields are accessed through reflection.

A linkage error will occur when lets say you have a dependency flagged as 
provided and your code points to one if its Class fields, for example 
`MRJobConfig.DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH`. Now since your version 
of such 

[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-23 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11927317
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,85 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Seq[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultYarnApplicationClasspath
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case Some(s) = Some(s.toSeq)
+  case None = getDefaultMRApplicationClasspath
+}
+
+  def getDefaultYarnApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+  val value = field.get(null).asInstanceOf[Array[String]]
+  value.toSeq
+} recoverWith {
+  case e: NoSuchFieldException = Success(Seq.empty[String])
 }
+
+triedDefault match {
+  case f: Failure[_] =
+logError(Unable to obtain the default YARN Application 
classpath., f.exception)
+  case s: Success[_] =
+logDebug(sUsing the default YARN application classpath: 
${s.get.mkString(,)})
+}
+
+triedDefault.toOption
   }
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
-   * So we need to use reflection to retrieve it.
+   * So we need to use reflection to retrieve it
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
+  def getDefaultMRApplicationClasspath: Option[Seq[String]] = {
+val triedDefault = Try[Seq[String]] {
   val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
+  val value = if (field.getType == classOf[String]) {
+
StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
   } else {
 field.get(null).asInstanceOf[Array[String]]
   }
-} catch {
-  case err: NoSuchFieldError = null
--- End diff --

Makes sense, thanks for clarifying.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11876372
  
--- Diff: core/src/main/scala/org/apache/spark/Logging.scala ---
@@ -95,6 +97,15 @@ trait Logging {
 if (log.isErrorEnabled) log.error(msg, throwable)
   }
 
+  protected def log[A](t: Try[A])(pf: PartialFunction[Try[A], String]) {
--- End diff --

We don't want to change this Logging interface at all. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11876388
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,79 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = 
+val triedDefault:Try[Array[String]] = 
getDefaultYarnApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default YARN Application classpath.
+  case s: Success[_] =
+sUsing the default YARN application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None =
+val triedDefault:Try[Array[String]] = 
getDefaultMRApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default MR Application classpath.
+  case s: Success[_] =
+sUsing the default MR application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
+
+  def getDefaultYarnApplicationClasspath: Try[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  } recoverWith {
+case e: NoSuchFieldError = Success(Array.empty[String])
+case e: NoSuchFieldException = Success(Array.empty[String])
   }
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
-   * So we need to use reflection to retrieve it.
+   * So we need to use reflection to retrieve it
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Try[Array[String]] = 
Try[Array[String]] {
--- End diff --

Could this just return an `Option[Seq[String]]`


---
If your project is set up for it, you can reply to this 

[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11876408
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,79 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = 
+val triedDefault:Try[Array[String]] = 
getDefaultYarnApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default YARN Application classpath.
+  case s: Success[_] =
+sUsing the default YARN application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None =
+val triedDefault:Try[Array[String]] = 
getDefaultMRApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default MR Application classpath.
+  case s: Success[_] =
+sUsing the default MR application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
+
+  def getDefaultYarnApplicationClasspath: Try[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  } recoverWith {
+case e: NoSuchFieldError = Success(Array.empty[String])
+case e: NoSuchFieldException = Success(Array.empty[String])
   }
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
-   * So we need to use reflection to retrieve it.
+   * So we need to use reflection to retrieve it
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Try[Array[String]] = 
Try[Array[String]] {
--- End diff --

In general we don't pass Try's around in Spark... 


---
If your project is set up for it, you can reply to this 

[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11876544
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,79 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = 
+val triedDefault:Try[Array[String]] = 
getDefaultYarnApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default YARN Application classpath.
+  case s: Success[_] =
+sUsing the default YARN application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
--- End diff --

It would be nice for these to return sequences instead of Arrays. I think 
those array types were just there due to this code being ported from java.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11876568
  
--- Diff: core/src/main/scala/org/apache/spark/Logging.scala ---
@@ -95,6 +97,15 @@ trait Logging {
 if (log.isErrorEnabled) log.error(msg, throwable)
   }
 
+  protected def log[A](t: Try[A])(pf: PartialFunction[Try[A], String]) {
--- End diff --

In general, when fixing a bug like this it's good to think of the most 
non-invasive, surgical way to fix it, then to consider doing broader 
re-factoring. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11876560
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,79 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = 
+val triedDefault:Try[Array[String]] = 
getDefaultYarnApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default YARN Application classpath.
+  case s: Success[_] =
+sUsing the default YARN application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None =
+val triedDefault:Try[Array[String]] = 
getDefaultMRApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default MR Application classpath.
+  case s: Success[_] =
+sUsing the default MR application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
+
+  def getDefaultYarnApplicationClasspath: Try[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  } recoverWith {
+case e: NoSuchFieldError = Success(Array.empty[String])
+case e: NoSuchFieldException = Success(Array.empty[String])
   }
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
-   * So we need to use reflection to retrieve it.
+   * So we need to use reflection to retrieve it
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Try[Array[String]] = 
Try[Array[String]] {
--- End diff --

Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub 

[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11876713
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,79 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = 
+val triedDefault:Try[Array[String]] = 
getDefaultYarnApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default YARN Application classpath.
+  case s: Success[_] =
+sUsing the default YARN application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
--- End diff --

Will do, I kept the Array since the original implementation returned 
`Array[String]` but think its a lot better to return `Seq` or `List`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11876756
  
--- Diff: core/src/main/scala/org/apache/spark/Logging.scala ---
@@ -95,6 +97,15 @@ trait Logging {
 if (log.isErrorEnabled) log.error(msg, throwable)
   }
 
+  protected def log[A](t: Try[A])(pf: PartialFunction[Try[A], String]) {
--- End diff --

Will remove this method from the trait.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-41097662
  
Hey just a heads up - I've submitted a smaller one-line fix for the 
original issue as part of #488 - that's just because I want to pull it into a 
release candidate.


https://github.com/apache/spark/pull/488/files#diff-50e237ea17ce94c3ccfc44143518a5f7R378

In general this part of the code definitely could use some broader 
clean-up, so let's keep going with this patch...

The change to be logging would be good to remove, that's a widely used 
interface in Spark that's been stable for a long time, and it's been kept 
intentionally consistent with standard logger interfaces in case we want to 
change around the logging in spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11876879
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,79 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = 
+val triedDefault:Try[Array[String]] = 
getDefaultYarnApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default YARN Application classpath.
+  case s: Success[_] =
+sUsing the default YARN application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None =
+val triedDefault:Try[Array[String]] = 
getDefaultMRApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default MR Application classpath.
--- End diff --

I think it's fine to just call the normal logWarning function here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11876895
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,79 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = 
+val triedDefault:Try[Array[String]] = 
getDefaultYarnApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default YARN Application classpath.
+  case s: Success[_] =
+sUsing the default YARN application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None =
+val triedDefault:Try[Array[String]] = 
getDefaultMRApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default MR Application classpath.
+  case s: Success[_] =
+sUsing the default MR application classpath: 
${s.get.mkString(,)}
--- End diff --

Or maybe, this should be a warning and the above should log an error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11877368
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -354,63 +354,79 @@ trait ClientBase extends Logging {
   }
 }
 
-object ClientBase {
+object ClientBase extends Logging {
   val SPARK_JAR: String = spark.jar
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = 
+val triedDefault:Try[Array[String]] = 
getDefaultYarnApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default YARN Application classpath.
+  case s: Success[_] =
+sUsing the default YARN application classpath: 
${s.get.mkString(,)}
+}
+triedDefault.toOption
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None =
+val triedDefault:Try[Array[String]] = 
getDefaultMRApplicationClasspath
+log(triedDefault){
+  case f: Failure[_] =
+Unable to obtain the default MR Application classpath.
+  case s: Success[_] =
+sUsing the default MR application classpath: 
${s.get.mkString(,)}
--- End diff --

Not sure about warning but it is a good place to start, i think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-22 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-41124726
  
@pwendell your one-line fix makes complete sense. Just pushed the changes 
you suggested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40808057
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40808255
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40808261
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40810841
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40810842
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14236/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40833064
  
Not convinced that build should be failing due _hive/test_ taking too long.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40839991
  
Don't worry about Travis for now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11785222
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
--- End diff --

No need to return this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11785535
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
--- End diff --

Since `populateHadoopClasspath` has a _side effect_ on `env` by convention 
I am used to return the actual _side effect_ that was applied. It aids 
testing/asserting the expectation of such _side effect_ for consumers of the 
API and since `populateHadoopClasspath` is part of the public API of the 
_ClientBase:Object_ I think is a good idea to provide it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11785608
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
--- End diff --

Better yet @andrewor14 I could change `populateHadoopClasspath` to avoid a 
_side effect_ and return the new `env`. I don't see any consumer of 
`populateHadoopClasspath` outside the `ClientBase`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11785859
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
--- End diff --

Oops sorry I accidentally deleted my comment No need to return this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11785982
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
--- End diff --

Yeah I also did a quick grep for it and I didn't see any usages outside of 
`ClientBase`. I didn't realize it was a public API, which seems super strange 
to me, since this is more like a setter than a getter. I think for now it's OK 
to leave the public API the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40842451
  
@berngp This is looking good. I will do a quick test of this on a YARN 
cluster, and provided that I don't run into anything I think this is good to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11786166
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
--- End diff --

It is strange, public APIs with _side effects_ make me anxious.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40842897
  
Thank you @andrewor14, and again I appreciate very much all the feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11786442
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultYarnApplicationClasspath
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultMRApplicationClasspath
 }
-  }
+
+  def getDefaultYarnApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
--- End diff --

By the way I just noticed, it looks like we're changing the public API 
here. It used to return Array[String], but now it returns an Option. A second 
point is that using `Try` here changes the semantics a little bit. Before we 
propagate any exception that's not `NoSuchField*`, but now we swallow 
everything.

I think we should revert this method to what it was before, and have the 
caller deal with the fact that this can potentially be null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11786520
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultYarnApplicationClasspath
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultMRApplicationClasspath
 }
-  }
+
+  def getDefaultYarnApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  }.toOption
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
+if (field.getType == classOf[String]) {
+  StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
+} else {
+  field.get(null).asInstanceOf[Array[String]]
 }
-  }
+  }.toOption
--- End diff --

Same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11786824
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultYarnApplicationClasspath
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultMRApplicationClasspath
 }
-  }
+
+  def getDefaultYarnApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  }.toOption
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
+if (field.getType == classOf[String]) {
+  StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
+} else {
+  field.get(null).asInstanceOf[Array[String]]
 }
-  }
+  }.toOption
--- End diff --

What if we keep the `Try` so we return `Try[Array[String]]` and in this 
case `populateHadoopClasspath` will deal with the potential _failure_. You can 
always get the `Throwable` out of a `Try` with the `failed` attribute.

e.g.
```
val t = Try(1/0)
t.failed.get.printStackTrace
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11786869
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultYarnApplicationClasspath
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultMRApplicationClasspath
 }
-  }
+
+  def getDefaultYarnApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  }.toOption
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
+if (field.getType == classOf[String]) {
+  StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
+} else {
+  field.get(null).asInstanceOf[Array[String]]
 }
-  }
+  }.toOption
--- End diff --

I know I changed the public API of those two methods but I do think that by 
returning a `Try[Array[String]]` you don't loose the option of handling the 
`Throwable` and is better than returning `null`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11786914
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultYarnApplicationClasspath
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultMRApplicationClasspath
 }
-  }
+
+  def getDefaultYarnApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  }.toOption
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
+if (field.getType == classOf[String]) {
+  StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
+} else {
+  field.get(null).asInstanceOf[Array[String]]
 }
-  }
+  }.toOption
--- End diff --

Btw, I think we should document that we are using _reflection_ to extract 
the value of the _defaults_ in this case. This is the only reason we are 
wrapping this in a _Try_.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11786936
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultYarnApplicationClasspath
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultMRApplicationClasspath
 }
-  }
+
+  def getDefaultYarnApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  }.toOption
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
+if (field.getType == classOf[String]) {
+  StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
+} else {
+  field.get(null).asInstanceOf[Array[String]]
 }
-  }
+  }.toOption
--- End diff --

Returning `Try` still changes the public API. Also, the issue is that right 
now we only catch particular exceptions `NoSuchField*`. Any other exception is 
supposed to propagate upwards. I don't think there is a clean way to do it with 
'Try', which isn't really designed for catching specific exceptions anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11787690
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultYarnApplicationClasspath
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultMRApplicationClasspath
 }
-  }
+
+  def getDefaultYarnApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  }.toOption
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
+if (field.getType == classOf[String]) {
+  StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
+} else {
+  field.get(null).asInstanceOf[Array[String]]
 }
-  }
+  }.toOption
--- End diff --

I will still Try to defend the argument that `Try` is a better option than 
just returning `null` or throwing an exception that we assume the author 
intended, which is not documented.

The `Try` monad itself defines that:
only non-fatal exceptions are caught by the combinators on Try (see 
scala.util.control.NonFatal). Serious system errors, on the other hand, will be 
thrown.

Assuming the intentions of the original author was to handle the `null` and 
_bubble_ up any other exception that wasn't caught by the `try-catch` control 
flow he was actually failing on the first one. To be honest I will bet that 
most likely the `try-catch` specifically was used to handle the compilation 
error flags due the usage of reflection. I know I am changing the _public_ API 
but do think that is for the best. 

Ref:


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11788225
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultYarnApplicationClasspath
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultMRApplicationClasspath
 }
-  }
+
+  def getDefaultYarnApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  }.toOption
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
+if (field.getType == classOf[String]) {
+  StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
+} else {
+  field.get(null).asInstanceOf[Array[String]]
 }
-  }
+  }.toOption
--- End diff --

There are a few issues here:

1. Can we change this API? Yes, we can change it - the whole deploy package 
is hidden from our docs and not supposed to be used by users. In a separate PR 
we should make this private.

2. Should we swallow all exceptions here? No, we should swallow only the 
ones we expect which is the missing field. Otherwise we should just throw it up 
the chain.

3. Should we use scala.util.Try instead of a try catch block? In the Spark 
code we use try {} catch {} blocks typically unless it is a very simple case 
where we want to swallow every possible exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with 

[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11789723
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultYarnApplicationClasspath
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultMRApplicationClasspath
 }
-  }
+
+  def getDefaultYarnApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  }.toOption
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
+if (field.getType == classOf[String]) {
+  StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
+} else {
+  field.get(null).asInstanceOf[Array[String]]
 }
-  }
+  }.toOption
--- End diff --

Thanks @pwendell, the easiest thing could be to avoid a longer discussion 
and just use the `try-catch` block but I do think that the `Try` monad is a 
good option. It won't swallow _fatal_ errors based on 
[NonFatal](https://github.com/scala/scala/blob/v2.10.3/src/library/scala/util/control/NonFatal.scala#L1)
 and can handle specific error recovery scenarios through `recoverWith`. What 
it avoids is cases where you return `null`, which overall is not a good 
practice, and the _bubbling_, mostly by mistake, of errors that should not 
disable a program/application while enabling _functional composition_.

In the end, I will align with the view of the **Apache Spark Project**. Let 
me know and I'll make the changes without further arguments, but if we decide 
to use the `try-catch` block please let me know how `populateHadoopClasspath` 
should handle the exception:
1. _bubbling_ it up in case it is not 

[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-18 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11789984
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -361,55 +361,48 @@ object ClientBase {
   val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF
   val LOCAL_SCHEME = local
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+for (c - classPathElementsToAdd.flatten) {
+  YarnSparkHadoopUtil.addToEnvironment(
+env,
+Environment.CLASSPATH.name,
+c.trim,
 File.pathSeparator)
 }
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
+  private def getYarnAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) 
match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultYarnApplicationClasspath
 }
-  }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  private def getMRAppClasspath(conf: Configuration): 
Option[Array[String]] =
+Option(conf.getStrings(mapreduce.application.classpath)) match {
+  case s: Some[Array[String]] = s
+  case None = getDefaultMRApplicationClasspath
 }
-  }
+
+  def getDefaultYarnApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+field.get(null).asInstanceOf[Array[String]]
+  }.toOption
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
-StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
-field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  def getDefaultMRApplicationClasspath: Option[Array[String]] = 
Try[Array[String]] {
+val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
+if (field.getType == classOf[String]) {
+  StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray
+} else {
+  field.get(null).asInstanceOf[Array[String]]
 }
-  }
+  }.toOption
--- End diff --

I'd bubble up the exception if it's unexpected. And return `None` if we 
encounter the expected exception. I agree returning an Option is good rather 
than potentially `null` value. I'd like it if this never returned `null` in any 
case, including exceptions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
GitHub user berngp opened a pull request:

https://github.com/apache/spark/pull/433

[SPARK-1522] : YARN ClientBase throws a NPE if there is no YARN applicat...

...ion specific CP

The current implementation of 
`ClientBase.getDefaultYarnApplicationClasspath` inspects
the `MRJobConfig` class for the field `DEFAULT_YARN_APPLICATION_CLASSPATH` 
when it should
be really looking into `YarnConfiguration`. If the Application 
Configuration has no
`yarn.application.classpath` defined a NPE exception will be thrown.

Additional Changes include:
* ScalaBuild now points to Scalatest 2.1.3
* ScalaBuild project root renamed as spark
* Test Suite for ClientBase added
* Fixes for scalastyle in other yarn files.

[ticket: SPARK-1522] : https://issues.apache.org/jira/browse/SPARK-1522

Author  : bernardo.gomezpala...@gmail.com
Testing : SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/berngp/spark feature/SPARK-1522

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/433.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #433


commit b08beb96c4437f095c31f53a0877eb95aebab637
Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com
Date:   2014-04-17T09:35:52Z

[SPARK-1522] : YARN ClientBase throws a NPE if there is no YARN application 
specific CP

The current implementation of 
`ClientBase.getDefaultYarnApplicationClasspath` inspects
the `MRJobConfig` class for the field `DEFAULT_YARN_APPLICATION_CLASSPATH` 
when it should
be really looking into `YarnConfiguration`. If the Application 
Configuration has no
`yarn.application.classpath` defined a NPE exception will be thrown.

Additional Changes include:
* ScalaBuild now points to Scalatest 2.1.3
* ScalaBuild project root renamed as spark
* Test Suite for ClientBase added
* Fixes for scalastyle in other yarn files.

[ticket: SPARK-1522] : https://issues.apache.org/jira/browse/SPARK-1522

Author  : bernardo.gomezpala...@gmail.com
Testing : SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40699148
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11726091
  
--- Diff: project/SparkBuild.scala ---
@@ -52,7 +52,7 @@ object SparkBuild extends Build {
   val SCALAC_JVM_VERSION = jvm-1.6
   val JAVAC_JVM_VERSION = 1.6
 
-  lazy val root = Project(root, file(.), settings = rootSettings) 
aggregate(allProjects: _*)
--- End diff --

When importing to IntelliJ Idea as an SBT project it uses the name of the 
Projects and root lacks a bit of context. I presume the usage of the word 
root has been based on the SBT multimodule example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11726156
  
--- Diff: project/SparkBuild.scala ---
@@ -263,7 +263,7 @@ object SparkBuild extends Build {
 org.eclipse.jetty % jetty-util % jettyVersion,
 org.eclipse.jetty % jetty-plus % jettyVersion,
 org.eclipse.jetty % jetty-security % jettyVersion,
-org.scalatest%% scalatest   % 1.9.1  % test,
+org.scalatest%% scalatest   % 2.1.3  % test,
--- End diff --

Most likely the upgrade to scalatest 2.1.3 is causing failures on the 
bellow Test Suites which I will fix.

streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
spark/core/src/test/scala/org/apache/spark/ContextCleanerSuite.scala
sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11726876
  
--- Diff: project/SparkBuild.scala ---
@@ -52,7 +52,7 @@ object SparkBuild extends Build {
   val SCALAC_JVM_VERSION = jvm-1.6
   val JAVAC_JVM_VERSION = 1.6
 
-  lazy val root = Project(root, file(.), settings = rootSettings) 
aggregate(allProjects: _*)
--- End diff --

So better?
```scala 
lazy val spark = Project(spark, file(.), settings = rootSettings) 
aggregate(allProjects: _*)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40707049
  
Most of the changes in the diff look unrelated to what is mentioned in the 
summary.
In addition, they introduce additional bugs.

Please cleanup the diffs and include only what is required to fix the issue 
without unrelated changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11741951
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c - elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
--- End diff --

please follow style guide for indentations: 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11741991
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c - elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: = Array[String]) : 
Array[String] = 
+Option(conf.getStrings(key)) match {
+  case Some(s) = s 
+  case None = f
 }
-  }
+  
+  def getDefaultYarnApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+  field.get(null).asInstanceOf[Array[String]]
+  }.getOrElse(Array.empty[String])
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
+  def getDefaultMRApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) 
+  if( field.getType  == classOf[String] )
--- End diff --

remove extra space after getType before ==



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11745730
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c - elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: = Array[String]) : 
Array[String] = 
+Option(conf.getStrings(key)) match {
+  case Some(s) = s 
+  case None = f
 }
-  }
+  
+  def getDefaultYarnApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+  field.get(null).asInstanceOf[Array[String]]
+  }.getOrElse(Array.empty[String])
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
+  def getDefaultMRApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) 
+  if( field.getType  == classOf[String] )
 StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
+  else
 field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
-}
-  }
+  }.getOrElse(Array.empty[String])
--- End diff --

I prefer the old try-catch to the scala.util.Try, which hides the specific 
exceptions you catch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11745870
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c - elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: = Array[String]) : 
Array[String] = 
+Option(conf.getStrings(key)) match {
+  case Some(s) = s 
+  case None = f
--- End diff --

`Option(...).getOrElse(f)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11746226
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c - elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: = Array[String]) : 
Array[String] = 
+Option(conf.getStrings(key)) match {
+  case Some(s) = s 
+  case None = f
 }
-  }
+  
+  def getDefaultYarnApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
+  field.get(null).asInstanceOf[Array[String]]
+  }.getOrElse(Array.empty[String])
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH)
-  if (field.getType == classOf[String]) {
+  def getDefaultMRApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) 
+  if( field.getType  == classOf[String] )
 StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
+  else
 field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
-}
-  }
+  }.getOrElse(Array.empty[String])
--- End diff --

Ah, I see what you're trying to do. I think a better way is to return 
`Option[Array[String]]`. If there's an exception, return None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11746514
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
--- End diff --

I don't think you need to return anything here, so you can use `{` instead 
of `= {`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11746787
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
--- End diff --

I actually prefer that you move all of the logic of `getAppClasspathForKey` 
and `getDefaultMRApplicationClasspath` into this method. Right now you have to 
jump a few levels to understand what's going on, and I think it'll be clearer 
if it's in one isolated place. (Same in getYarnAppClasspath)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40747230
  
@berngp Thanks for doing this. I literally ran into this NPE yesterday in 
my own YARN cluster. It turns out I forgot to point YARN_CONF_DIR to the proper 
place, but running into a NPE did not leave any clue as to what the problem is 
(until I dug into the code, which is bad user experience). This PR is a much 
needed fix.

I left a couple of comments. As @tgraves mentioned, the style of this PR is 
inconsistent with the Spark style guide. Further, it would be good if we could 
remove several levels of indirection to make the code clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11747100
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
--- End diff --

Also, in Spark we try not to use `protected[*]`, since it's not exactly 
clear what that does. I think these can just be `private`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11753045
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
--- End diff --

Oh I see. Should we specify `: Unit = {` in the new style then? Or is the 
Unit optional


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11763040
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c - elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: = Array[String]) : 
Array[String] = 
+Option(conf.getStrings(key)) match {
+  case Some(s) = s 
+  case None = f
--- End diff --

I inlined the function, had to keep the match since the `getOrElse` was 
giving me some trouble with the types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11763060
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c - elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
--- End diff --

I kept the `for (c - classPathElementsToAdd.flatten)`, to be honest the 
`yield` was a remanence of a different approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11763070
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c - elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: = Array[String]) : 
Array[String] = 
--- End diff --

Done, good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11763107
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = app.jar
   val LOG4J_PROP: String = log4j.properties
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c - classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  mapreduce.application.classpath)).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c - mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey(mapreduce.application.classpath, 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c - elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH)
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError = null
-  case err: NoSuchFieldException = null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
--- End diff --

Changed to `private`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11763222
  
--- Diff: 
yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSpec.scala ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.net.URI
+
+import org.scalatest.FreeSpec
--- End diff --

Will do, that said I do think the narrative in the FreeSpec|Suite is a bit 
better, and more `Fun` to write ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---