[ https://issues.apache.org/jira/browse/OOZIE-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294897#comment-16294897 ]
Attila Sasvari commented on OOZIE-2601: --------------------------------------- [~rohini.u] thanks for you comments. Earlier, when using a mapping file, {{ShareLibService}} had to figure out the list of files on an HDFS directory (see {{getPathRecursively()}} via {{loadSharelib()}}). Then it added them to DistributedCache and set classpath accordingly. When implementing local sharelib, I tried to reuse existing code at a lot of places. Do you suggest: - to set the classpath via {{JavaActionExecutor}} prior to application submission in a different way (now we are adding them via "yarn.application.classpath") and the improvement of {{ShareLibService}} or - setting the classpath via the "global" YARN classpath by the means of changing Hadoop environment variables? I believe using a "global approach" is less flexible and might cause problems for individual workflow actions if there are conflicting classes (for example Spark's dependencies might not play nice with of, say, Hive or with custom classes provided by users). With a mapping file, a user can also specify dependencies of workflow actions in a flexible way (e.g. some jars stored on HDFS, other jars are present in local filesystem, you can add directories or just individual jars). Right now, {{ShareLibService}} checks the existence of files either on HDFS or local filesystem (symlinks followed too). For example: ({{FileStatus[] files = fs.listStatus(actionLibsPath)}}, see https://github.com/apache/oozie/blob/25a8b99d5197c4e18acf0fd332c4396450d3d551/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L668]). It could help users (i.e. warning them if specified dependencies are not present). > Ability to use local paths for the sharelib > ------------------------------------------- > > Key: OOZIE-2601 > URL: https://issues.apache.org/jira/browse/OOZIE-2601 > Project: Oozie > Issue Type: New Feature > Affects Versions: oya > Reporter: Robert Kanter > Assignee: Attila Sasvari > Fix For: oya, 5.0.0b1 > > Attachments: OOZIE-2601-01.patch, OOZIE-2601-03.patch, > OOZIE-2601-05.patch, OOZIE-2601-06.patch, OOZIE-2601-07.patch, > OOZIE-2601-09.patch, OOZIE-2601-10.patch, OOZIE-2601-11.patch > > > With OOZIE-2590, as part of OOZIE-1770 Oozie on Yarn work, Oozie now has full > control over the classpath given to the Launcher AM. In a cluster where all > nodes have everything installed locally (in the same paths), it should be > possible to have the Launcher AM reference the local jars instead of having > to localize them from HDFS. > For example, if you have Hive installed on all nodes at {{/usr/lib/hive/}} > and all Hive jars under {{/usr/lib/hive/lib/}}, we could have the Launcher AM > add {{/usr/lib/hive/lib}} to its classpath. This saves on the overhead of > localizing the same jars from the hive sharelib in HDFS. > I think the best way to implement this is to augment the [Sharelib Mapping > File|https://oozie.apache.org/docs/4.2.0/AG_Install.html#Oozie_Share_Lib] > feature to accept {{file:///}} paths. > If we had this also work with the "oozie" sharelib and the Oozie jars in the > individual sharelibs (e.g. have the Mapping file take comma-separated > dirs/jars), then in a cluster with everything installed on all of the nodes, > you wouldn't need to bother with the sharelib at all! -- This message was sent by Atlassian JIRA (v6.4.14#64029)