[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsuyoshi OZAWA updated MAPREDUCE-4502: -------------------------------------- Attachment: design_v2.pdf I redesigned its architecture with the ideas inspired by discussion among Chris and Karthik. Main diffs are as follows: 1. All aggregations are preformed in containers(at the end of MapTasks). Because of this change, the umbilical protocol is need to change. 3. No task types are added newly. 4. Rack-aware aggregation is taken into account. Any feedback are welcome. Thanks, - Tsuyoshi > Multi-level aggregation with combining the result of maps per node/rack > ----------------------------------------------------------------------- > > Key: MAPREDUCE-4502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 > Reporter: Tsuyoshi OZAWA > Assignee: Tsuyoshi OZAWA > Attachments: design_v2.pdf, speculative_draft.pdf > > > The shuffle costs is expensive in Hadoop in spite of the existence of > combiner, because the scope of combining is limited within only one MapTask. > To solve this problem, it's a good way to aggregate the result of maps per > node/rack by launch combiner. > This JIRA is to implement the multi-level aggregation infrastructure, > including combining per container(MAPREDUCE-3902 is related), coordinating > containers by application master without breaking fault tolerance of jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira