[ 
https://issues.apache.org/jira/browse/HADOOP-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468405
 ] 

Doug Judd commented on HADOOP-946:
----------------------------------

Ok, I see your reasoning.  Go ahead and downgrade this one or remove it 
entirely if you think it's not worth doing.


> Map directly to HDFS or reduce()
> --------------------------------
>
>                 Key: HADOOP-946
>                 URL: https://issues.apache.org/jira/browse/HADOOP-946
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Doug Judd
>
> For situations where you know that the output of the Map phase is already 
> aggregated (e.g. the input is the output of another Map-reduce job and map() 
> preserves the aggregation), then there should be a way to tell the framework 
> that this is the case so that it can pipe the map() output directly to the 
> reduce() function, or HDFS in the case of IdentityReducer.  This will 
> probably require forcing the number of map tasks to equal the number of 
> reduce tasks.  This will save the disk I/O required to generate intermediate 
> files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to