[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Faraz Ahmad updated MAPREDUCE-2083:
-----------------------------------

    Description: 
Shuffle delays can be large for mapreductions with lots of intermediate data. 
Some of this shuffle delay can be overlapped with reduce if some of the reduce 
computation is started on partial intermediate data received by a reduce. 
Along these lines, the patch ??HADOOP-3226?? runs the combiner on the reduce 
side to prune the data that goes to reduce. However, ??HADOOP-3226?? does not 
achieve our goal of overlap with the shuffle because: 
(1) In its original use of reducing intermediate data volume, the combiner 
falls in the critical path at the map side. Therefore, the combiner is usually 
a simple function which is too  lightweight in its new use to achieve 
sufficient overlap with the shuffle. 
(2) Running the combiner  at the reduce side is helpful in overlapping with the 
shuffle only if  the combiner's functionality is a major portion of the reduce 
functionality --  otherwise running the combiner at the reduce side 
achieves only modest overlap with the shuffle. In many mapreductions, the 
combiner computation is often not part or only a small part of reduce 
computation. Addressing both these points, reduces that are complex often
have heavier-weight computation than simple combining that can be overlapped 
with the shuffle. This heavy-weight computation is specified by a user-supplied 
"partial reduce" which performs the commutative/associative parts of reduce. 
The idea is to run partial reduce on subsets of intermediate data as they 
arrive at a reduce to  overlap with the shuffle, and then run the full-blown 
final reduce which re-reduces the partially-reduced data. Because the shuffle 
delay is large  for shuffle-heavy mapreductions, partial reduce that are 
heavier-weight than simple combiner can be hidden 
under the shuffle delay without extending the critical path of execution. 
Finally, to further ensure that the partial reduce does not extend the critical 
path, include two easily-tunable thresholds: One to start partial reduce only 
after enough intermediate data has been received (e.g. 
mapred.inmem.merge.threshold or a separately defined parameter) so that we do 
not incur the overhead of invoking partial reduce on small data. Another 
threshold to stop partial reduce after most of the intermediate data has been 
received so that running partial reduce on the small remainder data 
does not  delay starting final reduce.

  was:
Shuffle delays can be large for mapreductions with lots of intermediate data. 
Some of this shuffle delay can be overlapped with reduce if some of the reduce 
computation is started on partial intermediate data received by a reduce. 
Along these lines, the patch ??HADOOP-3226?? runs the combiner
on the reduce side to prune the data that goes to reduce. However, 
??HADOOP-3226?? does not 
achieve our goal of overlap with the shuffle because: (1) In its original use 
of reducing intermediate data volume,
the combiner falls in the critical path at the map side. Therefore, the 
combiner is usually a simple function which is too  lightweight in its new use 
 to achieve sufficient overlap with the shuffle.
(2) Running the combiner  at the reduce side is helpful in overlapping with 
the shuffle only if  the combiner's functionality is a major portion of the 
reduce functionality --  otherwise running the combiner at the reduce side 
achieves only modest overlap with the shuffle.
In many mapreductions, the  combiner computation is often not part or only a 
small part of reduce computation.
Addressing both these points, reduces that are complex often
have heavier-weight computation than simple combining that can be overlapped 
with the shuffle.   This heavy-weight computation is specified by a 
user-supplied "partial reduce" which performs the commutative/associative 
parts of reduce. The idea is to run partial reduce on subsets of intermediate 
data as they arrive at a reduce to  overlap with the shuffle, and then run the 
full-blown final reduce which re-reduces the partially-reduced data. Because
the shuffle delay is large  for shuffle-heavy mapreductions,  
partial reduce that are heavier-weight than simple combiner can be hidden 
under the shuffle delay without extending the critical path of execution.
Finally, to further ensure that the partial reduce does not extend the 
critical path, include two easily-tunable thresholds: One to start partial 
reduce only after enough intermediate data has been received (e.g. 
mapred.inmem.merge.threshold 
or a separately defined parameter) so that we do not incur the overhead of 
invoking partial reduce on small data.
Another threshold to stop partial reduce after most of the intermediate data 
has been received so that running partial reduce on the small remainder data 
does not  delay starting final reduce.


> Run partial reduce instead of combiner at reduce node
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2083
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2083
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Faraz Ahmad
>             Fix For: 0.20.2
>
>
> Shuffle delays can be large for mapreductions with lots of intermediate data. 
> Some of this shuffle delay can be overlapped with reduce if some of the 
> reduce computation is started on partial intermediate data received by a 
> reduce. 
> Along these lines, the patch ??HADOOP-3226?? runs the combiner on the reduce 
> side to prune the data that goes to reduce. However, ??HADOOP-3226?? does not 
> achieve our goal of overlap with the shuffle because: 
> (1) In its original use of reducing intermediate data volume, the combiner 
> falls in the critical path at the map side. Therefore, the combiner is 
> usually a simple function which is too  lightweight in its new use to achieve 
> sufficient overlap with the shuffle. 
> (2) Running the combiner  at the reduce side is helpful in overlapping with 
> the shuffle only if  the combiner's functionality is a major portion of the 
> reduce functionality --  otherwise running the combiner at the reduce side 
> achieves only modest overlap with the shuffle. In many mapreductions, the 
> combiner computation is often not part or only a small part of reduce 
> computation. Addressing both these points, reduces that are complex often
> have heavier-weight computation than simple combining that can be overlapped 
> with the shuffle. This heavy-weight computation is specified by a 
> user-supplied "partial reduce" which performs the commutative/associative 
> parts of reduce. The idea is to run partial reduce on subsets of intermediate 
> data as they arrive at a reduce to  overlap with the shuffle, and then run 
> the full-blown final reduce which re-reduces the partially-reduced data. 
> Because the shuffle delay is large  for shuffle-heavy mapreductions, partial 
> reduce that are heavier-weight than simple combiner can be hidden 
> under the shuffle delay without extending the critical path of execution. 
> Finally, to further ensure that the partial reduce does not extend the 
> critical path, include two easily-tunable thresholds: One to start partial 
> reduce only after enough intermediate data has been received (e.g. 
> mapred.inmem.merge.threshold or a separately defined parameter) so that we do 
> not incur the overhead of invoking partial reduce on small data. Another 
> threshold to stop partial reduce after most of the intermediate data has been 
> received so that running partial reduce on the small remainder data 
> does not  delay starting final reduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to