[ https://issues.apache.org/jira/browse/MAPREDUCE-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189066#comment-14189066 ]
Jason Lowe commented on MAPREDUCE-6128: --------------------------------------- Thanks for the patch, Gera. I think it's an interesting idea, but I'm worried about it being enabled by default. If the user is already manually adding jars to the classpath, and these jars have already been preloaded into HDFS and referenced there for efficient localization across jobs, then enabling this seems like it does proactively bad things by causing extra jars to be uploaded to HDFS and localized or outright failing if the distributed cache names collide. This either needs to be disabled by default or it needs to look for duplicate jar names already in the distributed cache (maybe both). Also would be nice to have a unit test to verify this feature doesn't break at some point. For example, we could build a small jar with an MR job that has a trivial dependency on another separate, small jar then try to submit it to a minicluster just with the job jar to verify the automatic bundling is working. > Automatic addition of bundled jars to distributed cache > -------------------------------------------------------- > > Key: MAPREDUCE-6128 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6128 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client > Affects Versions: 2.5.1 > Reporter: Gera Shegalov > Assignee: Gera Shegalov > Attachments: MAPREDUCE-6128.v01.patch > > > On the client side, JDK adds Class-Path elements from the job jar manifest > on the classpath. In theory there could be many bundled jars in many > directories such that adding them manually via libjars or similar means to > task classpaths is cumbersome. If this property is enabled, the same jars are > added > to the task classpaths automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)