[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-45523729 @tgravescs I fixed the test, squashed the commit and repointed the branch. The latest commit addresses the test failure you were seeing while running it for Hadoop 0.23. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/433 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-45545343 Looks good. Thanks @berngp --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r13499967 --- Diff: yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSuite.scala --- @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.net.URI + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapreduce.MRJobConfig +import org.apache.hadoop.yarn.conf.YarnConfiguration +import org.apache.hadoop.yarn.api.ApplicationConstants.Environment + +import org.scalatest.FunSuite +import org.scalatest.matchers.ShouldMatchers._ + +import scala.collection.JavaConversions._ +import scala.collection.mutable.{ HashMap = MutableHashMap } +import scala.util.Try + + +class ClientBaseSuite extends FunSuite { + + test(default Yarn application classpath) { +ClientBase.getDefaultYarnApplicationClasspath should be(Some(Fixtures.knownDefYarnAppCP)) + } + + test(default MR application classpath) { +ClientBase.getDefaultMRApplicationClasspath should be(Some(Fixtures.knownDefMRAppCP)) + } + + test(resultant classpath for an application that defines a classpath for YARN) { +withAppConf(Fixtures.mapYARNAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be( +flatten(Fixtures.knownYARNAppCP, ClientBase.getDefaultMRApplicationClasspath)) +} + } + + test(resultant classpath for an application that defines a classpath for MR) { +withAppConf(Fixtures.mapMRAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be( +flatten(ClientBase.getDefaultYarnApplicationClasspath, Fixtures.knownMRAppCP)) +} + } + + test(resultant classpath for an application that defines both classpaths, YARN and MR) { +withAppConf(Fixtures.mapAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be(flatten(Fixtures.knownYARNAppCP, Fixtures.knownMRAppCP)) +} + } + + object Fixtures { + +val knownDefYarnAppCP: Seq[String] = + getFieldValue[Array[String], Seq[String]](classOf[YarnConfiguration], + DEFAULT_YARN_APPLICATION_CLASSPATH, +Seq[String]())(a = a.toSeq) + + +val knownDefMRAppCP: Seq[String] = + getFieldValue[String, Seq[String]](classOf[MRJobConfig], + DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH, + Seq[String]())(a = a.split(,)) + +val knownYARNAppCP = Some(Seq(/known/yarn/path)) + +val knownMRAppCP = Some(Seq(/known/mr/path)) + +val mapMRAppConf = + Map(mapreduce.application.classpath - knownMRAppCP.map(_.mkString(:)).get) + +val mapYARNAppConf = + Map(YarnConfiguration.YARN_APPLICATION_CLASSPATH - knownYARNAppCP.map(_.mkString(:)).get) + +val mapAppConf = mapYARNAppConf ++ mapMRAppConf + } + + def withAppConf(m: Map[String, String] = Map())(testCode: (Configuration) = Any) { +val conf = new Configuration +m.foreach { case (k, v) = conf.set(k, v, ClientBaseSpec) } +testCode(conf) + } + + def newEnv = MutableHashMap[String, String]() + + def classpath(env: MutableHashMap[String, String]) = env(Environment.CLASSPATH.name).split(:|;) + + def flatten(a: Option[Seq[String]], b: Option[Seq[String]]) = (a ++ b).flatten.toArray + + def getFieldValue[A, B](clazz: Class[_], field: String, defaults: = B)(mapTo: A = B): B = --- End diff -- @tgravescs tested for Hadoop 0.23 and 2.4.0. A bit silly that I made such mistake when the main
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r13456889 --- Diff: yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSuite.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.net.URI + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapreduce.MRJobConfig +import org.apache.hadoop.yarn.conf.YarnConfiguration +import org.apache.hadoop.yarn.api.ApplicationConstants.Environment + +import org.scalatest.FunSuite +import org.scalatest.matchers.ShouldMatchers._ + +import scala.collection.JavaConversions._ +import scala.collection.mutable.{ HashMap = MutableHashMap } +import scala.util.Try + + +class ClientBaseSuite extends FunSuite { + + test(default Yarn application classpath) { +ClientBase.getDefaultYarnApplicationClasspath.get should be(Fixtures.knownDefYarnAppCP.get) + } + + test(default MR application classpath) { +ClientBase.getDefaultMRApplicationClasspath.get should be(Fixtures.knownDefMRAppCP.get) + } + + test(resultant classpath for an application that defines a classpath for YARN) { +withAppConf(Fixtures.mapYARNAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be( +flatten(Fixtures.knownYARNAppCP, ClientBase.getDefaultMRApplicationClasspath)) +} + } + + test(resultant classpath for an application that defines a classpath for MR) { +withAppConf(Fixtures.mapMRAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be( +flatten(ClientBase.getDefaultYarnApplicationClasspath, Fixtures.knownMRAppCP)) +} + } + + test(resultant classpath for an application that defines both classpaths, YARN and MR) { +withAppConf(Fixtures.mapAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be(flatten(Fixtures.knownYARNAppCP, Fixtures.knownMRAppCP)) +} + } + + object Fixtures { + +val knownDefYarnAppCP: Option[Seq[String]] = + Some(YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH) --- End diff -- DEFAULT_YARN_APPLICATION_CLASSPATH doesn't exist in hadoop 0.23 so we can't use it directly. That is why we use the reflection in the ClientBase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r13456914 --- Diff: yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSuite.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.net.URI + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapreduce.MRJobConfig +import org.apache.hadoop.yarn.conf.YarnConfiguration +import org.apache.hadoop.yarn.api.ApplicationConstants.Environment + +import org.scalatest.FunSuite +import org.scalatest.matchers.ShouldMatchers._ + +import scala.collection.JavaConversions._ +import scala.collection.mutable.{ HashMap = MutableHashMap } +import scala.util.Try + + +class ClientBaseSuite extends FunSuite { + + test(default Yarn application classpath) { +ClientBase.getDefaultYarnApplicationClasspath.get should be(Fixtures.knownDefYarnAppCP.get) + } + + test(default MR application classpath) { +ClientBase.getDefaultMRApplicationClasspath.get should be(Fixtures.knownDefMRAppCP.get) + } + + test(resultant classpath for an application that defines a classpath for YARN) { +withAppConf(Fixtures.mapYARNAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be( +flatten(Fixtures.knownYARNAppCP, ClientBase.getDefaultMRApplicationClasspath)) +} + } + + test(resultant classpath for an application that defines a classpath for MR) { +withAppConf(Fixtures.mapMRAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be( +flatten(ClientBase.getDefaultYarnApplicationClasspath, Fixtures.knownMRAppCP)) +} + } + + test(resultant classpath for an application that defines both classpaths, YARN and MR) { +withAppConf(Fixtures.mapAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be(flatten(Fixtures.knownYARNAppCP, Fixtures.knownMRAppCP)) +} + } + + object Fixtures { + +val knownDefYarnAppCP: Option[Seq[String]] = + Some(YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH) + +val knownDefMRAppCP: Option[Seq[String]] = + Some(MRJobConfig.DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH.split(,)) --- End diff -- DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH doesn't exist in hadoop 0.23 either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r13458231 --- Diff: yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSuite.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.net.URI + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.mapreduce.MRJobConfig +import org.apache.hadoop.yarn.conf.YarnConfiguration +import org.apache.hadoop.yarn.api.ApplicationConstants.Environment + +import org.scalatest.FunSuite +import org.scalatest.matchers.ShouldMatchers._ + +import scala.collection.JavaConversions._ +import scala.collection.mutable.{ HashMap = MutableHashMap } +import scala.util.Try + + +class ClientBaseSuite extends FunSuite { + + test(default Yarn application classpath) { +ClientBase.getDefaultYarnApplicationClasspath.get should be(Fixtures.knownDefYarnAppCP.get) + } + + test(default MR application classpath) { +ClientBase.getDefaultMRApplicationClasspath.get should be(Fixtures.knownDefMRAppCP.get) + } + + test(resultant classpath for an application that defines a classpath for YARN) { +withAppConf(Fixtures.mapYARNAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be( +flatten(Fixtures.knownYARNAppCP, ClientBase.getDefaultMRApplicationClasspath)) +} + } + + test(resultant classpath for an application that defines a classpath for MR) { +withAppConf(Fixtures.mapMRAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be( +flatten(ClientBase.getDefaultYarnApplicationClasspath, Fixtures.knownMRAppCP)) +} + } + + test(resultant classpath for an application that defines both classpaths, YARN and MR) { +withAppConf(Fixtures.mapAppConf) { conf = + val env = newEnv + ClientBase.populateHadoopClasspath(conf, env) + classpath(env) should be(flatten(Fixtures.knownYARNAppCP, Fixtures.knownMRAppCP)) +} + } + + object Fixtures { + +val knownDefYarnAppCP: Option[Seq[String]] = + Some(YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH) --- End diff -- Thanks for caching this. I will wrap the call in a Try. e.g. ``` Try(YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH).toOption ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-45264422 Mostly looks good. Fix those couple minor test issues for hadoop 0.23 and I'll commit it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-45146709 @berngp can you upmerge to the latest master? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-45159895 @tgravescs done, thank you again for following this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-43892089 There is still a lot going on with spark 1.0. I'm going to wait for that to setting down and then review this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-43404871 @berngp sorry for the delay I think everyone has been busy with the spark 1.0 release. I think this should still be fixed in spark 1.1. Lets leave this pr as is and I'll review it this week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-43383729 @tgravescs any thoughts around this, should I just close this pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-42254447 Hi all, just checking the state of affairs with this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-41725598 So @sryza, @tgravescs and @pwendell should we increase the scope of the changes or close this one, merge it hopefully, and create a broader ticket to address testing and additional refactoring? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11917001 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,85 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + private def getYarnAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultYarnApplicationClasspath } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultMRApplicationClasspath +} + + def getDefaultYarnApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) --- End diff -- Maybe I have misunderstood something but do we really need to use reflection to get DEFAULT_YARN_APPLICATION_CLASSPATH? Can't we simply get it as YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH (similar to YarnConfiguration.YARN_APPLICATION_CLASSPATH) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11917358 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,85 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + private def getYarnAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultYarnApplicationClasspath } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultMRApplicationClasspath +} + + def getDefaultYarnApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) --- End diff -- Excellent question @rahulsinghaliitd is all related with the _instability_ of the YARN API and the different versions people are using. See https://issues.apache.org/jira/browse/SPARK-1233 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11918623 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,85 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + private def getYarnAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultYarnApplicationClasspath } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultMRApplicationClasspath +} + + def getDefaultYarnApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) + val value = field.get(null).asInstanceOf[Array[String]] + value.toSeq +} recoverWith { + case e: NoSuchFieldException = Success(Seq.empty[String]) } + +triedDefault match { + case f: Failure[_] = +logError(Unable to obtain the default YARN Application classpath., f.exception) + case s: Success[_] = +logDebug(sUsing the default YARN application classpath: ${s.get.mkString(,)}) +} + +triedDefault.toOption } /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. - * So we need to use reflection to retrieve it. + * So we need to use reflection to retrieve it --- End diff -- Any reason for removing this period? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11918859 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,85 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + private def getYarnAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultYarnApplicationClasspath } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultMRApplicationClasspath +} + + def getDefaultYarnApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) + val value = field.get(null).asInstanceOf[Array[String]] + value.toSeq +} recoverWith { + case e: NoSuchFieldException = Success(Seq.empty[String]) } + +triedDefault match { + case f: Failure[_] = +logError(Unable to obtain the default YARN Application classpath., f.exception) + case s: Success[_] = +logDebug(sUsing the default YARN application classpath: ${s.get.mkString(,)}) +} + +triedDefault.toOption } /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. - * So we need to use reflection to retrieve it. + * So we need to use reflection to retrieve it */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { + def getDefaultMRApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) + val value = if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray } else { field.get(null).asInstanceOf[Array[String]] } -} catch { - case err: NoSuchFieldError = null --- End diff -- Do we not need to handle NoSuchFieldError? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user rahulsinghaliitd commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11920472 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,85 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + private def getYarnAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultYarnApplicationClasspath } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultMRApplicationClasspath +} + + def getDefaultYarnApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) --- End diff -- Got it! I now see that the DEFAULT version is not present in hadoop 0.23. Then I wondered that if we truly want to be YARN API agnostic then shouldn't we get YARN_APPLICATION_CLASSPATH also via reflection. But I guess it is safe to assume that YARN_APPLICATION_CLASSPATH is here to stay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11920584 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,85 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + private def getYarnAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultYarnApplicationClasspath } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultMRApplicationClasspath +} + + def getDefaultYarnApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) --- End diff -- @rahulsinghaliitd that I hope. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11920723 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,85 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + private def getYarnAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultYarnApplicationClasspath } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultMRApplicationClasspath +} + + def getDefaultYarnApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) + val value = field.get(null).asInstanceOf[Array[String]] + value.toSeq +} recoverWith { + case e: NoSuchFieldException = Success(Seq.empty[String]) } + +triedDefault match { + case f: Failure[_] = +logError(Unable to obtain the default YARN Application classpath., f.exception) + case s: Success[_] = +logDebug(sUsing the default YARN application classpath: ${s.get.mkString(,)}) +} + +triedDefault.toOption } /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. - * So we need to use reflection to retrieve it. + * So we need to use reflection to retrieve it */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { + def getDefaultMRApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) + val value = if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray } else { field.get(null).asInstanceOf[Array[String]] } -} catch { - case err: NoSuchFieldError = null --- End diff -- @sryza I removed the `NoSuchFieldError` since in this context it can't happen, both classes and fields are accessed through reflection. A linkage error will occur when lets say you have a dependency flagged as provided and your code points to one if its Class fields, for example `MRJobConfig.DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH`. Now since your version of such
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11927317 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,85 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + private def getYarnAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultYarnApplicationClasspath } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Seq[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case Some(s) = Some(s.toSeq) + case None = getDefaultMRApplicationClasspath +} + + def getDefaultYarnApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) + val value = field.get(null).asInstanceOf[Array[String]] + value.toSeq +} recoverWith { + case e: NoSuchFieldException = Success(Seq.empty[String]) } + +triedDefault match { + case f: Failure[_] = +logError(Unable to obtain the default YARN Application classpath., f.exception) + case s: Success[_] = +logDebug(sUsing the default YARN application classpath: ${s.get.mkString(,)}) +} + +triedDefault.toOption } /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. - * So we need to use reflection to retrieve it. + * So we need to use reflection to retrieve it */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { + def getDefaultMRApplicationClasspath: Option[Seq[String]] = { +val triedDefault = Try[Seq[String]] { val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) + val value = if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray } else { field.get(null).asInstanceOf[Array[String]] } -} catch { - case err: NoSuchFieldError = null --- End diff -- Makes sense, thanks for clarifying. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11876372 --- Diff: core/src/main/scala/org/apache/spark/Logging.scala --- @@ -95,6 +97,15 @@ trait Logging { if (log.isErrorEnabled) log.error(msg, throwable) } + protected def log[A](t: Try[A])(pf: PartialFunction[Try[A], String]) { --- End diff -- We don't want to change this Logging interface at all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11876388 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,79 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultYarnApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default YARN Application classpath. + case s: Success[_] = +sUsing the default YARN application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultMRApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default MR Application classpath. + case s: Success[_] = +sUsing the default MR application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } + + def getDefaultYarnApplicationClasspath: Try[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + } recoverWith { +case e: NoSuchFieldError = Success(Array.empty[String]) +case e: NoSuchFieldException = Success(Array.empty[String]) } /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. - * So we need to use reflection to retrieve it. + * So we need to use reflection to retrieve it */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Try[Array[String]] = Try[Array[String]] { --- End diff -- Could this just return an `Option[Seq[String]]` --- If your project is set up for it, you can reply to this
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11876408 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,79 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultYarnApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default YARN Application classpath. + case s: Success[_] = +sUsing the default YARN application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultMRApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default MR Application classpath. + case s: Success[_] = +sUsing the default MR application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } + + def getDefaultYarnApplicationClasspath: Try[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + } recoverWith { +case e: NoSuchFieldError = Success(Array.empty[String]) +case e: NoSuchFieldException = Success(Array.empty[String]) } /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. - * So we need to use reflection to retrieve it. + * So we need to use reflection to retrieve it */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Try[Array[String]] = Try[Array[String]] { --- End diff -- In general we don't pass Try's around in Spark... --- If your project is set up for it, you can reply to this
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11876544 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,79 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultYarnApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default YARN Application classpath. + case s: Success[_] = +sUsing the default YARN application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = --- End diff -- It would be nice for these to return sequences instead of Arrays. I think those array types were just there due to this code being ported from java. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11876568 --- Diff: core/src/main/scala/org/apache/spark/Logging.scala --- @@ -95,6 +97,15 @@ trait Logging { if (log.isErrorEnabled) log.error(msg, throwable) } + protected def log[A](t: Try[A])(pf: PartialFunction[Try[A], String]) { --- End diff -- In general, when fixing a bug like this it's good to think of the most non-invasive, surgical way to fix it, then to consider doing broader re-factoring. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11876560 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,79 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultYarnApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default YARN Application classpath. + case s: Success[_] = +sUsing the default YARN application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultMRApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default MR Application classpath. + case s: Success[_] = +sUsing the default MR application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } + + def getDefaultYarnApplicationClasspath: Try[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + } recoverWith { +case e: NoSuchFieldError = Success(Array.empty[String]) +case e: NoSuchFieldException = Success(Array.empty[String]) } /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. - * So we need to use reflection to retrieve it. + * So we need to use reflection to retrieve it */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Try[Array[String]] = Try[Array[String]] { --- End diff -- Will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11876713 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,79 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultYarnApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default YARN Application classpath. + case s: Success[_] = +sUsing the default YARN application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = --- End diff -- Will do, I kept the Array since the original implementation returned `Array[String]` but think its a lot better to return `Seq` or `List`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11876756 --- Diff: core/src/main/scala/org/apache/spark/Logging.scala --- @@ -95,6 +97,15 @@ trait Logging { if (log.isErrorEnabled) log.error(msg, throwable) } + protected def log[A](t: Try[A])(pf: PartialFunction[Try[A], String]) { --- End diff -- Will remove this method from the trait. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-41097662 Hey just a heads up - I've submitted a smaller one-line fix for the original issue as part of #488 - that's just because I want to pull it into a release candidate. https://github.com/apache/spark/pull/488/files#diff-50e237ea17ce94c3ccfc44143518a5f7R378 In general this part of the code definitely could use some broader clean-up, so let's keep going with this patch... The change to be logging would be good to remove, that's a widely used interface in Spark that's been stable for a long time, and it's been kept intentionally consistent with standard logger interfaces in case we want to change around the logging in spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11876879 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,79 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultYarnApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default YARN Application classpath. + case s: Success[_] = +sUsing the default YARN application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultMRApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default MR Application classpath. --- End diff -- I think it's fine to just call the normal logWarning function here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11876895 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,79 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultYarnApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default YARN Application classpath. + case s: Success[_] = +sUsing the default YARN application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultMRApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default MR Application classpath. + case s: Success[_] = +sUsing the default MR application classpath: ${s.get.mkString(,)} --- End diff -- Or maybe, this should be a warning and the above should log an error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11877368 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -354,63 +354,79 @@ trait ClientBase extends Logging { } } -object ClientBase { +object ClientBase extends Logging { val SPARK_JAR: String = spark.jar val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultYarnApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default YARN Application classpath. + case s: Success[_] = +sUsing the default YARN application classpath: ${s.get.mkString(,)} +} +triedDefault.toOption } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = +val triedDefault:Try[Array[String]] = getDefaultMRApplicationClasspath +log(triedDefault){ + case f: Failure[_] = +Unable to obtain the default MR Application classpath. + case s: Success[_] = +sUsing the default MR application classpath: ${s.get.mkString(,)} --- End diff -- Not sure about warning but it is a good place to start, i think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-41124726 @pwendell your one-line fix makes complete sense. Just pushed the changes you suggested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40808057 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40808255 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40808261 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40810841 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40810842 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14236/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40833064 Not convinced that build should be failing due _hive/test_ taking too long. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40839991 Don't worry about Travis for now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11785222 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd --- End diff -- No need to return this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11785535 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd --- End diff -- Since `populateHadoopClasspath` has a _side effect_ on `env` by convention I am used to return the actual _side effect_ that was applied. It aids testing/asserting the expectation of such _side effect_ for consumers of the API and since `populateHadoopClasspath` is part of the public API of the _ClientBase:Object_ I think is a good idea to provide it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11785608 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd --- End diff -- Better yet @andrewor14 I could change `populateHadoopClasspath` to avoid a _side effect_ and return the new `env`. I don't see any consumer of `populateHadoopClasspath` outside the `ClientBase`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11785859 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd --- End diff -- Oops sorry I accidentally deleted my comment No need to return this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11785982 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd --- End diff -- Yeah I also did a quick grep for it and I didn't see any usages outside of `ClientBase`. I didn't realize it was a public API, which seems super strange to me, since this is more like a setter than a getter. I think for now it's OK to leave the public API the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40842451 @berngp This is looking good. I will do a quick test of this on a YARN cluster, and provided that I don't run into anything I think this is good to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11786166 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd --- End diff -- It is strange, public APIs with _side effects_ make me anxious. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40842897 Thank you @andrewor14, and again I appreciate very much all the feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11786442 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = getDefaultYarnApplicationClasspath } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = getDefaultMRApplicationClasspath } - } + + def getDefaultYarnApplicationClasspath: Option[Array[String]] = Try[Array[String]] { --- End diff -- By the way I just noticed, it looks like we're changing the public API here. It used to return Array[String], but now it returns an Option. A second point is that using `Try` here changes the semantics a little bit. Before we propagate any exception that's not `NoSuchField*`, but now we swallow everything. I think we should revert this method to what it was before, and have the caller deal with the fact that this can potentially be null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11786520 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = getDefaultYarnApplicationClasspath } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = getDefaultMRApplicationClasspath } - } + + def getDefaultYarnApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + }.toOption /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) +if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray +} else { + field.get(null).asInstanceOf[Array[String]] } - } + }.toOption --- End diff -- Same here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11786824 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = getDefaultYarnApplicationClasspath } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = getDefaultMRApplicationClasspath } - } + + def getDefaultYarnApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + }.toOption /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) +if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray +} else { + field.get(null).asInstanceOf[Array[String]] } - } + }.toOption --- End diff -- What if we keep the `Try` so we return `Try[Array[String]]` and in this case `populateHadoopClasspath` will deal with the potential _failure_. You can always get the `Throwable` out of a `Try` with the `failed` attribute. e.g. ``` val t = Try(1/0) t.failed.get.printStackTrace ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11786869 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = getDefaultYarnApplicationClasspath } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = getDefaultMRApplicationClasspath } - } + + def getDefaultYarnApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + }.toOption /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) +if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray +} else { + field.get(null).asInstanceOf[Array[String]] } - } + }.toOption --- End diff -- I know I changed the public API of those two methods but I do think that by returning a `Try[Array[String]]` you don't loose the option of handling the `Throwable` and is better than returning `null`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11786914 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = getDefaultYarnApplicationClasspath } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = getDefaultMRApplicationClasspath } - } + + def getDefaultYarnApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + }.toOption /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) +if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray +} else { + field.get(null).asInstanceOf[Array[String]] } - } + }.toOption --- End diff -- Btw, I think we should document that we are using _reflection_ to extract the value of the _defaults_ in this case. This is the only reason we are wrapping this in a _Try_. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11786936 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = getDefaultYarnApplicationClasspath } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = getDefaultMRApplicationClasspath } - } + + def getDefaultYarnApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + }.toOption /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) +if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray +} else { + field.get(null).asInstanceOf[Array[String]] } - } + }.toOption --- End diff -- Returning `Try` still changes the public API. Also, the issue is that right now we only catch particular exceptions `NoSuchField*`. Any other exception is supposed to propagate upwards. I don't think there is a clean way to do it with 'Try', which isn't really designed for catching specific exceptions anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11787690 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = getDefaultYarnApplicationClasspath } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = getDefaultMRApplicationClasspath } - } + + def getDefaultYarnApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + }.toOption /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) +if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray +} else { + field.get(null).asInstanceOf[Array[String]] } - } + }.toOption --- End diff -- I will still Try to defend the argument that `Try` is a better option than just returning `null` or throwing an exception that we assume the author intended, which is not documented. The `Try` monad itself defines that: only non-fatal exceptions are caught by the combinators on Try (see scala.util.control.NonFatal). Serious system errors, on the other hand, will be thrown. Assuming the intentions of the original author was to handle the `null` and _bubble_ up any other exception that wasn't caught by the `try-catch` control flow he was actually failing on the first one. To be honest I will bet that most likely the `try-catch` specifically was used to handle the compilation error flags due the usage of reflection. I know I am changing the _public_ API but do think that is for the best. Ref:
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11788225 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = getDefaultYarnApplicationClasspath } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = getDefaultMRApplicationClasspath } - } + + def getDefaultYarnApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + }.toOption /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) +if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray +} else { + field.get(null).asInstanceOf[Array[String]] } - } + }.toOption --- End diff -- There are a few issues here: 1. Can we change this API? Yes, we can change it - the whole deploy package is hidden from our docs and not supposed to be used by users. In a separate PR we should make this private. 2. Should we swallow all exceptions here? No, we should swallow only the ones we expect which is the missing field. Otherwise we should just throw it up the chain. 3. Should we use scala.util.Try instead of a try catch block? In the Spark code we use try {} catch {} blocks typically unless it is a very simple case where we want to swallow every possible exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11789723 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = getDefaultYarnApplicationClasspath } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = getDefaultMRApplicationClasspath } - } + + def getDefaultYarnApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + }.toOption /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) +if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray +} else { + field.get(null).asInstanceOf[Array[String]] } - } + }.toOption --- End diff -- Thanks @pwendell, the easiest thing could be to avoid a longer discussion and just use the `try-catch` block but I do think that the `Try` monad is a good option. It won't swallow _fatal_ errors based on [NonFatal](https://github.com/scala/scala/blob/v2.10.3/src/library/scala/util/control/NonFatal.scala#L1) and can handle specific error recovery scenarios through `recoverWith`. What it avoids is cases where you return `null`, which overall is not a good practice, and the _bubbling_, mostly by mistake, of errors that should not disable a program/application while enabling _functional composition_. In the end, I will align with the view of the **Apache Spark Project**. Let me know and I'll make the changes without further arguments, but if we decide to use the `try-catch` block please let me know how `populateHadoopClasspath` should handle the exception: 1. _bubbling_ it up in case it is not
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11789984 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -361,55 +361,48 @@ object ClientBase { val LOG4J_CONF_ENV_KEY: String = SPARK_LOG4J_CONF val LOCAL_SCHEME = local - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +for (c - classPathElementsToAdd.flatten) { + YarnSparkHadoopUtil.addToEnvironment( +env, +Environment.CLASSPATH.name, +c.trim, File.pathSeparator) } +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } + private def getYarnAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH)) match { + case s: Some[Array[String]] = s + case None = getDefaultYarnApplicationClasspath } - } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + private def getMRAppClasspath(conf: Configuration): Option[Array[String]] = +Option(conf.getStrings(mapreduce.application.classpath)) match { + case s: Some[Array[String]] = s + case None = getDefaultMRApplicationClasspath } - } + + def getDefaultYarnApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) +field.get(null).asInstanceOf[Array[String]] + }.toOption /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { -StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { -field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + def getDefaultMRApplicationClasspath: Option[Array[String]] = Try[Array[String]] { +val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) +if (field.getType == classOf[String]) { + StringUtils.getStrings(field.get(null).asInstanceOf[String]).toArray +} else { + field.get(null).asInstanceOf[Array[String]] } - } + }.toOption --- End diff -- I'd bubble up the exception if it's unexpected. And return `None` if we encounter the expected exception. I agree returning an Option is good rather than potentially `null` value. I'd like it if this never returned `null` in any case, including exceptions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
GitHub user berngp opened a pull request: https://github.com/apache/spark/pull/433 [SPARK-1522] : YARN ClientBase throws a NPE if there is no YARN applicat... ...ion specific CP The current implementation of `ClientBase.getDefaultYarnApplicationClasspath` inspects the `MRJobConfig` class for the field `DEFAULT_YARN_APPLICATION_CLASSPATH` when it should be really looking into `YarnConfiguration`. If the Application Configuration has no `yarn.application.classpath` defined a NPE exception will be thrown. Additional Changes include: * ScalaBuild now points to Scalatest 2.1.3 * ScalaBuild project root renamed as spark * Test Suite for ClientBase added * Fixes for scalastyle in other yarn files. [ticket: SPARK-1522] : https://issues.apache.org/jira/browse/SPARK-1522 Author : bernardo.gomezpala...@gmail.com Testing : SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt test You can merge this pull request into a Git repository by running: $ git pull https://github.com/berngp/spark feature/SPARK-1522 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/433.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #433 commit b08beb96c4437f095c31f53a0877eb95aebab637 Author: Bernardo Gomez Palacio bernardo.gomezpala...@gmail.com Date: 2014-04-17T09:35:52Z [SPARK-1522] : YARN ClientBase throws a NPE if there is no YARN application specific CP The current implementation of `ClientBase.getDefaultYarnApplicationClasspath` inspects the `MRJobConfig` class for the field `DEFAULT_YARN_APPLICATION_CLASSPATH` when it should be really looking into `YarnConfiguration`. If the Application Configuration has no `yarn.application.classpath` defined a NPE exception will be thrown. Additional Changes include: * ScalaBuild now points to Scalatest 2.1.3 * ScalaBuild project root renamed as spark * Test Suite for ClientBase added * Fixes for scalastyle in other yarn files. [ticket: SPARK-1522] : https://issues.apache.org/jira/browse/SPARK-1522 Author : bernardo.gomezpala...@gmail.com Testing : SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40699148 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11726091 --- Diff: project/SparkBuild.scala --- @@ -52,7 +52,7 @@ object SparkBuild extends Build { val SCALAC_JVM_VERSION = jvm-1.6 val JAVAC_JVM_VERSION = 1.6 - lazy val root = Project(root, file(.), settings = rootSettings) aggregate(allProjects: _*) --- End diff -- When importing to IntelliJ Idea as an SBT project it uses the name of the Projects and root lacks a bit of context. I presume the usage of the word root has been based on the SBT multimodule example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11726156 --- Diff: project/SparkBuild.scala --- @@ -263,7 +263,7 @@ object SparkBuild extends Build { org.eclipse.jetty % jetty-util % jettyVersion, org.eclipse.jetty % jetty-plus % jettyVersion, org.eclipse.jetty % jetty-security % jettyVersion, -org.scalatest%% scalatest % 1.9.1 % test, +org.scalatest%% scalatest % 2.1.3 % test, --- End diff -- Most likely the upgrade to scalatest 2.1.3 is causing failures on the bellow Test Suites which I will fix. streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala spark/core/src/test/scala/org/apache/spark/ContextCleanerSuite.scala sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11726876 --- Diff: project/SparkBuild.scala --- @@ -52,7 +52,7 @@ object SparkBuild extends Build { val SCALAC_JVM_VERSION = jvm-1.6 val JAVAC_JVM_VERSION = 1.6 - lazy val root = Project(root, file(.), settings = rootSettings) aggregate(allProjects: _*) --- End diff -- So better? ```scala lazy val spark = Project(spark, file(.), settings = rootSettings) aggregate(allProjects: _*) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40707049 Most of the changes in the diff look unrelated to what is mentioned in the summary. In addition, they introduce additional bugs. Please cleanup the diffs and include only what is required to fix the issue without unrelated changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11741951 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) + + protected[yarn] def addToAppClasspath(env: HashMap[String, String], elements : Iterable[String]) { +for ( c - elements ) + yield (YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, File.pathSeparator)) } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration) --- End diff -- please follow style guide for indentations: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11741991 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) + + protected[yarn] def addToAppClasspath(env: HashMap[String, String], elements : Iterable[String]) { +for ( c - elements ) + yield (YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, File.pathSeparator)) } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration) + (f: = Array[String]) : Array[String] = +Option(conf.getStrings(key)) match { + case Some(s) = s + case None = f } - } + + def getDefaultYarnApplicationClasspath : Array[String] = Try { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) + field.get(null).asInstanceOf[Array[String]] + }.getOrElse(Array.empty[String]) /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { + def getDefaultMRApplicationClasspath : Array[String] = Try { + val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) + if( field.getType == classOf[String] ) --- End diff -- remove extra space after getType before == --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11745730 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) + + protected[yarn] def addToAppClasspath(env: HashMap[String, String], elements : Iterable[String]) { +for ( c - elements ) + yield (YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, File.pathSeparator)) } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration) + (f: = Array[String]) : Array[String] = +Option(conf.getStrings(key)) match { + case Some(s) = s + case None = f } - } + + def getDefaultYarnApplicationClasspath : Array[String] = Try { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) + field.get(null).asInstanceOf[Array[String]] + }.getOrElse(Array.empty[String]) /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { + def getDefaultMRApplicationClasspath : Array[String] = Try { + val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) + if( field.getType == classOf[String] ) StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { + else field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null -} - } + }.getOrElse(Array.empty[String]) --- End diff -- I prefer the old try-catch to the scala.util.Try, which hides the specific exceptions you catch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11745870 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) + + protected[yarn] def addToAppClasspath(env: HashMap[String, String], elements : Iterable[String]) { +for ( c - elements ) + yield (YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, File.pathSeparator)) } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration) + (f: = Array[String]) : Array[String] = +Option(conf.getStrings(key)) match { + case Some(s) = s + case None = f --- End diff -- `Option(...).getOrElse(f)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11746226 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) + + protected[yarn] def addToAppClasspath(env: HashMap[String, String], elements : Iterable[String]) { +for ( c - elements ) + yield (YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, File.pathSeparator)) } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration) + (f: = Array[String]) : Array[String] = +Option(conf.getStrings(key)) match { + case Some(s) = s + case None = f } - } + + def getDefaultYarnApplicationClasspath : Array[String] = Try { + val field = classOf[YarnConfiguration].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) + field.get(null).asInstanceOf[Array[String]] + }.getOrElse(Array.empty[String]) /** * In Hadoop 0.23, the MR application classpath comes with the YARN application * classpath. In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's a String. * So we need to use reflection to retrieve it. */ - def getDefaultMRApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) - if (field.getType == classOf[String]) { + def getDefaultMRApplicationClasspath : Array[String] = Try { + val field = classOf[MRJobConfig].getField(DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH) + if( field.getType == classOf[String] ) StringUtils.getStrings(field.get(null).asInstanceOf[String]) - } else { + else field.get(null).asInstanceOf[Array[String]] - } -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null -} - } + }.getOrElse(Array.empty[String]) --- End diff -- Ah, I see what you're trying to do. I think a better way is to return `Option[Array[String]]`. If there's an exception, return None. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11746514 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { --- End diff -- I don't think you need to return anything here, so you can use `{` instead of `= {` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11746787 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) --- End diff -- I actually prefer that you move all of the logic of `getAppClasspathForKey` and `getDefaultMRApplicationClasspath` into this method. Right now you have to jump a few levels to understand what's going on, and I think it'll be clearer if it's in one isolated place. (Same in getYarnAppClasspath) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/433#issuecomment-40747230 @berngp Thanks for doing this. I literally ran into this NPE yesterday in my own YARN cluster. It turns out I forgot to point YARN_CONF_DIR to the proper place, but running into a NPE did not leave any clue as to what the problem is (until I dug into the code, which is bad user experience). This PR is a much needed fix. I left a couple of comments. As @tgraves mentioned, the style of this PR is inconsistent with the Spark style guide. Further, it would be good if we could remove several levels of indirection to make the code clearer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11747100 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) + + protected[yarn] def addToAppClasspath(env: HashMap[String, String], elements : Iterable[String]) { --- End diff -- Also, in Spark we try not to use `protected[*]`, since it's not exactly clear what that does. I think these can just be `private`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11753045 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { --- End diff -- Oh I see. Should we specify `: Unit = {` in the new style then? Or is the Unit optional --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11763040 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) + + protected[yarn] def addToAppClasspath(env: HashMap[String, String], elements : Iterable[String]) { +for ( c - elements ) + yield (YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, File.pathSeparator)) } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration) + (f: = Array[String]) : Array[String] = +Option(conf.getStrings(key)) match { + case Some(s) = s + case None = f --- End diff -- I inlined the function, had to keep the match since the `getOrElse` was giving me some trouble with the types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11763060 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) + + protected[yarn] def addToAppClasspath(env: HashMap[String, String], elements : Iterable[String]) { +for ( c - elements ) + yield (YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, File.pathSeparator)) --- End diff -- I kept the `for (c - classPathElementsToAdd.flatten)`, to be honest the `yield` was a remanence of a different approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11763070 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) + + protected[yarn] def addToAppClasspath(env: HashMap[String, String], elements : Iterable[String]) { +for ( c - elements ) + yield (YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, File.pathSeparator)) } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration) + (f: = Array[String]) : Array[String] = --- End diff -- Done, good point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11763107 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -378,55 +378,48 @@ object ClientBase { val APP_JAR: String = app.jar val LOG4J_PROP: String = log4j.properties - // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps - def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) { -val classpathEntries = Option(conf.getStrings( - YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse( -getDefaultYarnApplicationClasspath()) -for (c - classpathEntries) { - YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, -File.pathSeparator) -} + def populateHadoopClasspath(conf: Configuration, env: HashMap[String, String]) = { +val classPathElementsToAdd = getYarnAppClasspath(conf) ++ getMRAppClasspath(conf) +addToAppClasspath(env, classPathElementsToAdd) +classPathElementsToAdd + } -val mrClasspathEntries = Option(conf.getStrings( - mapreduce.application.classpath)).getOrElse( -getDefaultMRApplicationClasspath()) -if (mrClasspathEntries != null) { - for (c - mrClasspathEntries) { -YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, - File.pathSeparator) - } -} + protected[yarn] def getYarnAppClasspath(conf: Configuration) = +getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, conf)(getDefaultYarnApplicationClasspath) + + protected[yarn] def getMRAppClasspath(conf: Configuration) = +getAppClasspathForKey(mapreduce.application.classpath, conf)(getDefaultMRApplicationClasspath) + + protected[yarn] def addToAppClasspath(env: HashMap[String, String], elements : Iterable[String]) { +for ( c - elements ) + yield (YarnSparkHadoopUtil.addToEnvironment(env, Environment.CLASSPATH.name, c.trim, File.pathSeparator)) } - def getDefaultYarnApplicationClasspath(): Array[String] = { -try { - val field = classOf[MRJobConfig].getField(DEFAULT_YARN_APPLICATION_CLASSPATH) - field.get(null).asInstanceOf[Array[String]] -} catch { - case err: NoSuchFieldError = null - case err: NoSuchFieldException = null + protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration) --- End diff -- Changed to `private`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...
Github user berngp commented on a diff in the pull request: https://github.com/apache/spark/pull/433#discussion_r11763222 --- Diff: yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSpec.scala --- @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.yarn + +import java.net.URI + +import org.scalatest.FreeSpec --- End diff -- Will do, that said I do think the narrative in the FreeSpec|Suite is a bit better, and more `Fun` to write ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---