Implement memory-to-memory merge in the reduce
----------------------------------------------
Key: HADOOP-5831
URL: https://issues.apache.org/jira/browse/HADOOP-5831
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Fix For: 0.21.0
HADOOP-3446 fixed the reduce to not flush the in-memory shuffled map-outputs
before feeding to the reduce. However for latency-sensitive applications with
lots of memory like the terasort this hurts performance since the fan-in for
the final in-memory merge is too large (all 8000 map-outputs very in-memory)
resulting in less than optimal performance.
When I put in an intermediate memory-to-memory merge for the terasort's reduce
(there-by avoiding disk i/o) to cut the fan-in from 8000 to <100 the 'reduce'
phase (including the local datanode-write) sped-up 250% (from 10s to 4s).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.