[
https://issues.apache.org/jira/browse/HADOOP-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468389
]
Doug Judd commented on HADOOP-946:
----------------------------------
> Actually, the ability to keep a large number of maps (one per block) and a
> much smaller number of outputs would be the primary reason I can see for
> adding this feature [...]
Seems like you'd need an atomic record append API to handle this.
> If we place reduces to nodes or racks where their input dominates (as
> discussed in HADOOP-939 comments), then this could be implemented by simply
> specifying a partition method that returns the hash of the map node name.
I guess the primary reason for this enhancement would be to avoid writing and
subsequently reading intermediate files thereby reducing disk load on the
system as a whole. The map-to-hdfs workaround sounds reasonable. If you want
to just run a reduce without generating intermediate files, then the reduce
task needs to be able to pull from HDFS. Unless I didn't follow your logic
correctly on HADOOP-939, it seems like this optimization is orthogonal.
> Map directly to HDFS or reduce()
> --------------------------------
>
> Key: HADOOP-946
> URL: https://issues.apache.org/jira/browse/HADOOP-946
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Environment: all
> Reporter: Doug Judd
>
> For situations where you know that the output of the Map phase is already
> aggregated (e.g. the input is the output of another Map-reduce job and map()
> preserves the aggregation), then there should be a way to tell the framework
> that this is the case so that it can pipe the map() output directly to the
> reduce() function, or HDFS in the case of IdentityReducer. This will
> probably require forcing the number of map tasks to equal the number of
> reduce tasks. This will save the disk I/O required to generate intermediate
> files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.