[jira] [Commented] (MAPREDUCE-6128) Automatic addition of bundled jars to distributed cache

Jason Lowe (JIRA) Wed, 29 Oct 2014 14:47:56 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189066#comment-14189066
 ]


Jason Lowe commented on MAPREDUCE-6128:
---------------------------------------

Thanks for the patch, Gera.

I think it's an interesting idea, but I'm worried about it being enabled by 
default.  If the user is already manually adding jars to the classpath, and 
these jars have already been preloaded into HDFS and referenced there for 
efficient localization across jobs, then enabling this seems like it does 
proactively bad things by causing extra jars to be uploaded to HDFS and 
localized or outright failing if the distributed cache names collide.  This 
either needs to be disabled by default or it needs to look for duplicate jar 
names already in the distributed cache (maybe both).

Also would be nice to have a unit test to verify this feature doesn't break at 
some point.  For example, we could build a small jar with an MR job that has a 
trivial dependency on another separate, small jar then try to submit it to a 
minicluster just with the job jar to verify the automatic bundling is working.

> Automatic addition of bundled jars to distributed cache 
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-6128
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6128
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 2.5.1
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-6128.v01.patch
>
>
> On the client side, JDK adds Class-Path elements from the job jar manifest
> on the classpath. In theory there could be many bundled jars in many 
> directories such that adding them manually via libjars or similar means to 
> task classpaths is cumbersome. If this property is enabled, the same jars are 
> added
> to the task classpaths automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6128) Automatic addition of bundled jars to distributed cache

Reply via email to