[ 
https://issues.apache.org/jira/browse/OOZIE-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494685#comment-15494685
 ] 

Satish Subhashrao Saley edited comment on OOZIE-2606 at 9/15/16 10:20 PM:
--------------------------------------------------------------------------

- In case of spark 1.X, we *can* have both {{spark.yarn.jar}} and 
{{spark.yarn.jars}}. {{spark.yarn.jars}} will get ignored anyway.
- By setting {{spark.yarn.jar}} configuration in case of spark-1, we will avoid 
multiple distribution of spark-yarn/spark-assembly jar.
- In case of spark 2.X, we *cannot* have both {{spark.yarn.jar}} and 
{{spark.yarn.jars}}. It causes problems.

- The approach in the patch is to look at the spark version and populate the 
configuration accordingly. For checking the spark version, I am checking the 
"Specification-Version" field in the jar manifest. (Any cleaner alternatives?)
- We are still keeping the {{-files}} option as it is required in case of 
spark-1 and not causing any issues with spark-2 even if some of the uris 
present in {{spark.yarn.jars}} and {{-files}} are same. Files will get 
distributed only once. 
- Also, for spark-2.X, we need to bump up the versions of some libraries. I 
created a profile for spark-2 and spark-1 (spark-1 being the default). For me, 
spark-1.X did not work with newer version of those libraries.


was (Author: satishsaley):
- In case of spark 1.X, we *can* have both {{spark.yarn.jar}} and 
{{spark.yarn.jars}}. {{spark.yarn.jars}} will get ignored anyway.
- By setting {{spark.yarn.jar}} configuration in case of spark-1, we will avoid 
multiple distribution of spark-yarn/spark-assembly jar.
- In case of spark 2.X, we *cannot* have both {{spark.yarn.jar}} and 
{{spark.yarn.jars}}. It causes problems.

- The approach in the patch is to look at the spark version and populate the 
configuration accordingly. For checking the spark version, I am checking the 
"Specification-Version" field in the jar manifest. (Any cleaner alternatives?)
- We are still keeping the {{--files}} option as it is required in case of 
spark-1 and not causing any issues with spark-2 even if some of the uris 
present in {{spark.yarn.jars}} and {{--files}} are same. Files will get 
distributed only once. 
- Also, for spark-2.X, we need to bump up the versions of some libraries. I 
created a profile for spark-2 and spark-1 (spark-1 being the default). For me, 
spark-1.X did not work with newer version of those libraries.

> Set spark.yarn.jars to fix Spark 2.0 with Oozie
> -----------------------------------------------
>
>                 Key: OOZIE-2606
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2606
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 4.2.0
>            Reporter: Jonathan Kelly
>            Assignee: Satish Subhashrao Saley
>              Labels: spark, spark2.0.0
>             Fix For: 4.3.0
>
>         Attachments: OOZIE-2606-2.patch, OOZIE-2606.patch
>
>
> Oozie adds all of the jars in the Oozie Spark sharelib to the 
> DistributedCache such that all jars will be present in the current working 
> directory of the YARN container (as well as in the container classpath). 
> However, this is not quite enough to make Spark 2.0 work, since Spark 2.0 by 
> default looks for the jars in assembly/target/scala-2.11/jars [1] (as if it 
> is a locally built distribution for development) and will not find them in 
> the current working directory.
> To fix this, we can set spark.yarn.jars to *.jar so that it finds the jars in 
> the current working directory rather than looking in the wrong place. [2]
> [1] 
> https://github.com/apache/spark/blob/v2.0.0-rc2/launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java#L357
> [2] 
> https://github.com/apache/spark/blob/v2.0.0-rc2/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L476
> Note: This property will be ignored by Spark 1.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to