[ 
https://issues.apache.org/jira/browse/OOZIE-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294897#comment-16294897
 ] 

Attila Sasvari commented on OOZIE-2601:
---------------------------------------

[~rohini.u] thanks for you comments. 

Earlier, when using a mapping file, {{ShareLibService}} had to figure out the 
list of files on an HDFS directory (see {{getPathRecursively()}} via 
{{loadSharelib()}}). Then it added them to DistributedCache and set classpath 
accordingly. When implementing local sharelib, I tried to reuse existing code 
at a lot of places.

Do you suggest: 
- to set the classpath via {{JavaActionExecutor}} prior to application 
submission in a different way (now we are adding them via 
"yarn.application.classpath") and the improvement of {{ShareLibService}}
or
- setting the classpath via the "global" YARN classpath by the means of 
changing Hadoop environment variables?

I believe using a "global approach" is less flexible and might cause problems 
for individual workflow actions if there are conflicting classes (for example 
Spark's dependencies might not play nice with of, say, Hive or with custom 
classes provided by users). 

With a mapping file, a user can also specify dependencies of workflow actions 
in a flexible way (e.g. some jars stored on HDFS, other jars are present in 
local filesystem, you can add directories or just individual jars).

Right now, {{ShareLibService}} checks the existence of files either on HDFS or 
local filesystem (symlinks followed too). For example: ({{FileStatus[] files = 
fs.listStatus(actionLibsPath)}}, see 
https://github.com/apache/oozie/blob/25a8b99d5197c4e18acf0fd332c4396450d3d551/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L668]).
 It could help users (i.e. warning them if specified dependencies are not 
present).

> Ability to use local paths for the sharelib
> -------------------------------------------
>
>                 Key: OOZIE-2601
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2601
>             Project: Oozie
>          Issue Type: New Feature
>    Affects Versions: oya
>            Reporter: Robert Kanter
>            Assignee: Attila Sasvari
>             Fix For: oya, 5.0.0b1
>
>         Attachments: OOZIE-2601-01.patch, OOZIE-2601-03.patch, 
> OOZIE-2601-05.patch, OOZIE-2601-06.patch, OOZIE-2601-07.patch, 
> OOZIE-2601-09.patch, OOZIE-2601-10.patch, OOZIE-2601-11.patch
>
>
> With OOZIE-2590, as part of OOZIE-1770 Oozie on Yarn work, Oozie now has full 
> control over the classpath given to the Launcher AM.  In a cluster where all 
> nodes have everything installed locally (in the same paths), it should be 
> possible to have the Launcher AM reference the local jars instead of having 
> to localize them from HDFS.
> For example, if you have Hive installed on all nodes at {{/usr/lib/hive/}} 
> and all Hive jars under {{/usr/lib/hive/lib/}}, we could have the Launcher AM 
> add {{/usr/lib/hive/lib}} to its classpath.  This saves on the overhead of 
> localizing the same jars from the hive sharelib in HDFS.  
> I think the best way to implement this is to augment the [Sharelib Mapping 
> File|https://oozie.apache.org/docs/4.2.0/AG_Install.html#Oozie_Share_Lib] 
> feature to accept {{file:///}} paths.
> If we had this also work with the "oozie" sharelib and the Oozie jars in the 
> individual sharelibs (e.g. have the Mapping file take comma-separated 
> dirs/jars), then in a cluster with everything installed on all of the nodes, 
> you wouldn't need to bother with the sharelib at all!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to