[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:
----------------------------------------

    Attachment:     (was: GenericOptionsParser.patch)
    
> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache, tasktracker
>    Affects Versions: 0.20.203.0
>            Reporter: Azuryy(Chijiong)
>         Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job **** -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to