[
https://issues.apache.org/jira/browse/HADOOP-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468405
]
Doug Judd commented on HADOOP-946:
----------------------------------
Ok, I see your reasoning. Go ahead and downgrade this one or remove it
entirely if you think it's not worth doing.
> Map directly to HDFS or reduce()
> --------------------------------
>
> Key: HADOOP-946
> URL: https://issues.apache.org/jira/browse/HADOOP-946
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Environment: all
> Reporter: Doug Judd
>
> For situations where you know that the output of the Map phase is already
> aggregated (e.g. the input is the output of another Map-reduce job and map()
> preserves the aggregation), then there should be a way to tell the framework
> that this is the case so that it can pipe the map() output directly to the
> reduce() function, or HDFS in the case of IdentityReducer. This will
> probably require forcing the number of map tasks to equal the number of
> reduce tasks. This will save the disk I/O required to generate intermediate
> files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.