[ https://issues.apache.org/jira/browse/OOZIE-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373706#comment-15373706 ]
Satish Subhashrao Saley commented on OOZIE-2606: ------------------------------------------------ Currently, we pass on the jars using {{-- files}} options and archives using {{-- archives}} option. We pass in the hdfs paths in both. [Here is code for it |https://github.com/apache/oozie/blob/master/sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java#L175-L183]. I have few questions questions regarding {{spark.yarn.jars}}. - Is it replacement for {{--files}} ? It does not look like that based on this [part of the code |https://github.com/apache/spark/blob/bad0f7dbba2eda149ee4fc5810674d971d17874a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L495-L504] The files inside SPARK_JARS i.e. {{spark.yarn.jars}} gets distributed only when we haven't defined SPARK_ARCHIVE. {code} val sparkArchive = sparkConf.get(SPARK_ARCHIVE) if (sparkArchive.isDefined) { val archive = sparkArchive.get require(!isLocalUri(archive), s"${SPARK_ARCHIVE.key} cannot be a local URI.") distribute(Utils.resolveURI(archive).toString, resType = LocalResourceType.ARCHIVE, destName = Some(LOCALIZED_LIB_DIR)) } else { sparkConf.get(SPARK_JARS) match { case Some(jars) => {code} - Is {{spark.yarn.jars}} as a replacement of {{spark.yarn.jar}} with some additional functionality? Currently, we can set {{spark.yarn.jar}} to the spark-assembly.jar in case overriding the default location. {code} http://spark.apache.org/docs/latest/running-on-yarn.html The location of the Spark jar file, in case overriding the default location is desired. By default, Spark on YARN will use a Spark jar installed locally, but the Spark jar can also be in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn't need to be distributed each time an application runs. To point to a jar on HDFS, for example, set this configuration to hdfs:///some/path. {code} I was about to file a jira for setting {{spark.yarn.jar}} to spark-assembly.jar because currently, spark-assembly.jar is getting distributed multiple times. Let me know, shall we add fix for this in here? > Set spark.yarn.jars to fix Spark 2.0 with Oozie > ----------------------------------------------- > > Key: OOZIE-2606 > URL: https://issues.apache.org/jira/browse/OOZIE-2606 > Project: Oozie > Issue Type: Bug > Components: core > Affects Versions: 4.2.0 > Reporter: Jonathan Kelly > Labels: spark, spark2.0.0 > Fix For: trunk > > Attachments: OOZIE-2606.patch > > > Oozie adds all of the jars in the Oozie Spark sharelib to the > DistributedCache such that all jars will be present in the current working > directory of the YARN container (as well as in the container classpath). > However, this is not quite enough to make Spark 2.0 work, since Spark 2.0 by > default looks for the jars in assembly/target/scala-2.11/jars [1] (as if it > is a locally built distribution for development) and will not find them in > the current working directory. > To fix this, we can set spark.yarn.jars to *.jar so that it finds the jars in > the current working directory rather than looking in the wrong place. [2] > [1] > https://github.com/apache/spark/blob/v2.0.0-rc2/launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java#L357 > [2] > https://github.com/apache/spark/blob/v2.0.0-rc2/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L476 > Note: This property will be ignored by Spark 1.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332)