Run partial reduce instead of combiner at reduce node
-----------------------------------------------------
Key: MAPREDUCE-2083
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2083
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Faraz Ahmad
Fix For: 0.20.2
Shuffle delays can be large for mapreductions with lots of intermediate data.
Some of this shuffle delay can be overlapped with reduce if some of the reduce
computation is started on partial intermediate data received by a reduce.
Along these lines, the patch ??HADOOP-3226?? runs the combiner
on the reduce side to prune the data that goes to reduce. However,
??HADOOP-3226?? does not
achieve our goal of overlap with the shuffle because: (1) In its original use
of reducing intermediate data volume,
the combiner falls in the critical path at the map side. Therefore, the
combiner is usually a simple function which is too lightweight in its new use
to achieve sufficient overlap with the shuffle.
(2) Running the combiner at the reduce side is helpful in overlapping with
the shuffle only if the combiner's functionality is a major portion of the
reduce functionality -- otherwise running the combiner at the reduce side
achieves only modest overlap with the shuffle.
In many mapreductions, the combiner computation is often not part or only a
small part of reduce computation.
Addressing both these points, reduces that are complex often
have heavier-weight computation than simple combining that can be overlapped
with the shuffle. This heavy-weight computation is specified by a
user-supplied "partial reduce" which performs the commutative/associative
parts of reduce. The idea is to run partial reduce on subsets of intermediate
data as they arrive at a reduce to overlap with the shuffle, and then run the
full-blown final reduce which re-reduces the partially-reduced data. Because
the shuffle delay is large for shuffle-heavy mapreductions,
partial reduce that are heavier-weight than simple combiner can be hidden
under the shuffle delay without extending the critical path of execution.
Finally, to further ensure that the partial reduce does not extend the
critical path, include two easily-tunable thresholds: One to start partial
reduce only after enough intermediate data has been received (e.g.
mapred.inmem.merge.threshold
or a separately defined parameter) so that we do not incur the overhead of
invoking partial reduce on small data.
Another threshold to stop partial reduce after most of the intermediate data
has been received so that running partial reduce on the small remainder data
does not delay starting final reduce.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.