[jira] [Commented] (HADOOP-10115) Exclude duplicate jars in hadoop package under different component's lib

Allen Wittenauer (JIRA) Mon, 09 Mar 2015 11:19:51 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353321#comment-14353321
 ]


Allen Wittenauer commented on HADOOP-10115:
-------------------------------------------

bq. One possible gap caused by just skipping the jars (rather than symlinking) 
is that if folks rely on the directory layout at deployment time to grab needed 
jars they might miss out. Presumably they're already grabbing the common share 
dir though?

If you symlink, is there actually any benefit? It shrinks the distribution 
size, sure, but I suspect the JVM won't resolve the link to a degree that it 
realizes it is the same jar.  Also, given that, e.g., HDFS requires common, if 
folks are only grabbing the HDFS deps and not the common deps, they are doing 
Bad Things (tm). But if we only commit this to trunk, it's even less of a 
concern. ;)

bq. One good reason to do it as a follow-on is that we could switch to using an 
maven assembly instead of a shell script.

I'm inclined to commit this now and fix this up either as a maven assembly or a 
separate script as a separate JIRA under the guiding principle of "don't let 
best stop better."  I don't think there is any real question of whether or not 
this is better than what is currently there.  Best might end up being more 
subjective and take longer.

bq. (the two code comments)

Yes, probably a good idea.

bq. Should the yarn get processed before the NFS projects?

I'm not sure if it matters much.


> Exclude duplicate jars in hadoop package under different component's lib
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-10115
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10115
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>              Labels: common, hdfs, mapreduce, nfs, yarn
>         Attachments: HADOOP-10115-004.patch, HADOOP-10115-005.patch, 
> HADOOP-10115-006.patch, HADOOP-10115.patch, HADOOP-10115.patch, 
> HADOOP-10115.patch
>
>
> In the hadoop package distribution there are more than 90% of the jars are 
> duplicated in multiple places.
> For Ex:
> almost all jars in share/hadoop/hdfs/lib are already there in 
> share/hadoop/common/lib
> Same case for all other lib in share directory.
> Anyway for all the daemon processes all directories are added to classpath.
> So to reduce the package distribution size and the classpath overhead, remove 
> the duplicate jars from the distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-10115) Exclude duplicate jars in hadoop package under different component's lib

Reply via email to