[ 
https://issues.apache.org/jira/browse/OOZIE-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373706#comment-15373706
 ] 

Satish Subhashrao Saley commented on OOZIE-2606:
------------------------------------------------

Currently, we pass on the jars using {{-- files}} options and archives using 
{{-- archives}} option. We pass in the hdfs paths in both. [Here is code for it 
|https://github.com/apache/oozie/blob/master/sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java#L175-L183].

I have few questions questions regarding {{spark.yarn.jars}}.
- Is it replacement for {{--files}} ? It does not look like that based on this 
[part of the code 
|https://github.com/apache/spark/blob/bad0f7dbba2eda149ee4fc5810674d971d17874a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L495-L504]
The files inside SPARK_JARS i.e. {{spark.yarn.jars}} gets distributed only when 
we haven't defined SPARK_ARCHIVE. 
{code}
val sparkArchive = sparkConf.get(SPARK_ARCHIVE)
    if (sparkArchive.isDefined) {
      val archive = sparkArchive.get
      require(!isLocalUri(archive), s"${SPARK_ARCHIVE.key} cannot be a local 
URI.")
      distribute(Utils.resolveURI(archive).toString,
        resType = LocalResourceType.ARCHIVE,
        destName = Some(LOCALIZED_LIB_DIR))
    } else {
      sparkConf.get(SPARK_JARS) match {
        case Some(jars) =>
{code}
- Is {{spark.yarn.jars}} as a replacement of {{spark.yarn.jar}} with some 
additional functionality?

Currently, we can set {{spark.yarn.jar}} to the spark-assembly.jar in case 
overriding the default location. 
{code}
http://spark.apache.org/docs/latest/running-on-yarn.html
The location of the Spark jar file, in case overriding the default location is 
desired. By default, Spark on YARN will use a Spark jar installed locally, but 
the Spark jar can also be in a world-readable location on HDFS. This allows 
YARN to cache it on nodes so that it doesn't need to be distributed each time 
an application runs. To point to a jar on HDFS, for example, set this 
configuration to hdfs:///some/path.
{code}

I was about to file a jira for setting {{spark.yarn.jar}} to spark-assembly.jar 
because currently, spark-assembly.jar is getting distributed multiple times. 
Let me know, shall we add fix for this in here?

> Set spark.yarn.jars to fix Spark 2.0 with Oozie
> -----------------------------------------------
>
>                 Key: OOZIE-2606
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2606
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 4.2.0
>            Reporter: Jonathan Kelly
>              Labels: spark, spark2.0.0
>             Fix For: trunk
>
>         Attachments: OOZIE-2606.patch
>
>
> Oozie adds all of the jars in the Oozie Spark sharelib to the 
> DistributedCache such that all jars will be present in the current working 
> directory of the YARN container (as well as in the container classpath). 
> However, this is not quite enough to make Spark 2.0 work, since Spark 2.0 by 
> default looks for the jars in assembly/target/scala-2.11/jars [1] (as if it 
> is a locally built distribution for development) and will not find them in 
> the current working directory.
> To fix this, we can set spark.yarn.jars to *.jar so that it finds the jars in 
> the current working directory rather than looking in the wrong place. [2]
> [1] 
> https://github.com/apache/spark/blob/v2.0.0-rc2/launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java#L357
> [2] 
> https://github.com/apache/spark/blob/v2.0.0-rc2/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L476
> Note: This property will be ignored by Spark 1.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to