[ https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Azuryy(Chijiong) reopened MAPREDUCE-3323: ----------------------------------------- > Add new interface for Distributed Cache, which special for Map or Reduce,but > not Both. > --------------------------------------------------------------------------------------- > > Key: MAPREDUCE-3323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distributed-cache, tasktracker > Affects Versions: 0.20.203.0 > Reporter: Azuryy(Chijiong) > Fix For: 0.20.203.0 > > Attachments: DistributedCache.patch, GenericOptionsParser.patch, > JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch > > > We put some file into Distributed Cache, but sometimes, only Map or Reduce > use thses cached files, not useful for both. but TaskTracker always download > cached files from HDFS, if there are some little bit big files in cache, it's > time expensive. > so, this patch add some new API in the DistributedCache.java as follow: > addArchiveToClassPathForMap > addArchiveToClassPathForReduce > addFileToClassPathForMap > addFileToClassPathForReduce > addCacheFileForMap > addCacheFileForReduce > addCacheArchiveForMap > addCacheArchiveForReduce > New API doesn't affect original interface. User can use these features like > the following two methods: > 1) > hadoop job **** -files file1 -mapfiles file2 -reducefiles file3 -archives > arc1 -maparchives arc2 -reduce archives arc3 > 2) > DistributedCache.addCacheFile(conf, file1); > DistributedCache.addCacheFileForMap(conf, file2); > DistributedCache.addCacheFileForReduce(conf, file3); > DistributedCache.addCacheArchives(conf, arc1); > DistributedCache.addCacheArchivesForMap(conf, arc2); > DistributedCache.addCacheFArchivesForReduce(conf, arc3); > These two methods have the same result, That's mean: > You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, > but file1 and arc1 are cached for both map and reduce; > file2 and arc2 are only cached for map; > file3 and arc3 are only cached for reduce; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira