[ 
https://issues.apache.org/jira/browse/OOZIE-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820813#comment-16820813
 ] 

Andras Salamon commented on OOZIE-3472:
---------------------------------------

There are quite a few libraries in the oozie sharelibs, we have encountered 
similar problems. All the jiras cleaning up the sharelibs are welcome (like 
OOZIE-3450).

Solving such problems are not easy. If the conflicting jar is required by 
Oozie, we might change the code and eliminate the usage. If the jar is required 
by Spark, we cannot really delete it from the sharelib.

Can you please list the conflicting jars?

> Improve Spark Action compatibility with Oozie launcher
> ------------------------------------------------------
>
>                 Key: OOZIE-3472
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3472
>             Project: Oozie
>          Issue Type: Improvement
>          Components: action
>    Affects Versions: 5.1.0
>            Reporter: Junfan Zhang
>            Assignee: Junfan Zhang
>            Priority: Major
>
> In the production environment, when using the spark action, our users often 
> encounter conflicts between the user jar and the launcher, causing the 
> launcher to fail to start.
> To do this we have a maven plugin to guide the user to remove Hadoop related 
> dependencies from the user jar. But the user jar is more complicated and 
> sometimes not easy to remove. Therefore, it is appropriate to solve this 
> problem from the oozie side.
> We research code found that the spark action is inherited to the Java action. 
> The reason for the conflict is because the Java action will put the user jar 
> into the cache before the mr starts (related 
> [link|https://github.com/apache/oozie/blob/b91457edd2a76f94f41a89ec718eec574c200c71/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L722]).
>  If there is a Hadoop dependency in the user jar and the version is 
> incompatible, a conflict will occur.
> From the root cause analysis, the spark action just uses the map node in mr 
> as a spark submit client, and does not need to add the user jar to the mr 
> distributed cache. We solved this conflict by using spark submit sdk to load 
> the user jar from HDFS directly. It currently works well in our production 
> environment. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to