[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2015-03-09 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3323:

Fix Version/s: (was: 0.20.203.0)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-07 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: (was: DistributedCache.patch)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Fix For: 0.20.203.0
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-07 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: (was: GenericOptionsParser.patch)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Fix For: 0.20.203.0
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-07 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: (was: JobClient.patch)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Fix For: 0.20.203.0
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-07 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: (was: TaskTracker.patch)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Fix For: 0.20.203.0
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-07 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: (was: TaskDistributedCacheManager.patch)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Fix For: 0.20.203.0
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-07 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Status: Open  (was: Patch Available)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Fix For: 0.20.203.0
>
> Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-04 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Status: Patch Available  (was: Reopened)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Fix For: 0.20.203.0
>
> Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-04 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Target Version/s:   (was: 0.21.0)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Fix For: 0.20.203.0
>
> Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-04 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: (was: GenericOptionsParser.patch)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-04 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: GenericOptionsParser.patch

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-03 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: (was: TaskDistributedCacheManager.patch)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-03 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: TaskDistributedCacheManager.patch

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-03 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: TaskTracker.patch
TaskDistributedCacheManager.patch
JobClient.patch
GenericOptionsParser.patch
DistributedCache.patch

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-03 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: (was: TaskTracker.patch)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-03 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Attachment: (was: DistributedCache.patch)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-03 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Description: 
We put some file into Distributed Cache, but sometimes, only Map or Reduce use 
thses cached files, not useful for both. but TaskTracker always download cached 
files from HDFS, if there are some little bit big files in cache, it's time 
expensive.

so, this patch add some new API in the DistributedCache.java as follow:

addArchiveToClassPathForMap
addArchiveToClassPathForReduce

addFileToClassPathForMap
addFileToClassPathForReduce

addCacheFileForMap
addCacheFileForReduce

addCacheArchiveForMap
addCacheArchiveForReduce


New API doesn't affect original interface. User can use these features like the 
following two methods:

1) 
hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives arc1 
-maparchives arc2 -reduce archives arc3

2)
DistributedCache.addCacheFile(conf, file1);
DistributedCache.addCacheFileForMap(conf, file2);
DistributedCache.addCacheFileForReduce(conf, file3);

DistributedCache.addCacheArchives(conf, arc1);
DistributedCache.addCacheArchivesForMap(conf, arc2);
DistributedCache.addCacheFArchivesForReduce(conf, arc3);


These two methods have the same result, That's mean: 

You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
but file1 and arc1 are cached for both map and reduce;
file2 and arc2 are only cached for map;
file3 and arc3 are only cached for reduce;


  was:
We put some file into Distributed Cache, but sometimes, only Map or Reduce use 
thses cached files, not useful for both. but TaskTracker always download cached 
files from HDFS, if there are some little bit big files in cache, it's time 
expensive.

so, this patch add some new API in the DistributedCache.java as follow:

addArchiveToClassPathForMap
addArchiveToClassPathForReduce

addFileToClassPathForMap
addFileToClassPathForReduce

addCacheFileForMap
addCacheFileForReduce

addCacheArchiveForMap
addCacheArchiveForReduce


New API doesn't affect original interface. but they are specified for only map 
or reduce, not both of them.

But if you do need cache file during both map and reduce, then use original 
interface.



> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job  -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

2011-11-01 Thread Azuryy(Chijiong) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azuryy(Chijiong) updated MAPREDUCE-3323:


Summary: Add new interface for Distributed Cache, which special  for Map or 
Reduce,but not Both.  (was: Distributed Cache for Map or Reduce or Both)

> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distributed-cache, tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Azuryy(Chijiong)
> Attachments: DistributedCache.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. but they are specified for only 
> map or reduce, not both of them.
> But if you do need cache file during both map and reduce, then use original 
> interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira