Map directly to HDFS or reduce()
--------------------------------

                 Key: HADOOP-946
                 URL: https://issues.apache.org/jira/browse/HADOOP-946
             Project: Hadoop
          Issue Type: New Feature
          Components: mapred
         Environment: all
            Reporter: Doug Judd


For situations where you know that the output of the Map phase is already 
aggregated (e.g. the input is the output of another Map-reduce job and map() 
preserves the aggregation), then there should be a way to tell the framework 
that this is the case so that it can pipe the map() output directly to the 
reduce() function, or HDFS in the case of IdentityReducer.  This will probably 
require forcing the number of map tasks to equal the number of reduce tasks.  
This will save the disk I/O required to generate intermediate files.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to