[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455909#comment-13455909 ]
Chris Douglas commented on MAPREDUCE-4502: ------------------------------------------ bq. This seems to be good approach to deal with rack-level aggregation. Do you have any results about the benchmark? For reducing on key ranges, there's a paper in [SOCC|http://www.socc2012.org/papers] on Sailfish. I don't have a link to that paper, though there's a [tech report|http://research.yahoo.com/files/yl-2012-003.pdf]. For the benchmark, we were mostly handling cases without combiners; in our data, each combiner was too effective to benefit from an intermediate level. > Multi-level aggregation with combining the result of maps per node/rack > ----------------------------------------------------------------------- > > Key: MAPREDUCE-4502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 > Reporter: Tsuyoshi OZAWA > Assignee: Tsuyoshi OZAWA > Attachments: speculative_draft.pdf > > > The shuffle costs is expensive in Hadoop in spite of the existence of > combiner, because the scope of combining is limited within only one MapTask. > To solve this problem, it's a good way to aggregate the result of maps per > node/rack by launch combiner. > This JIRA is to implement the multi-level aggregation infrastructure, > including combining per container(MAPREDUCE-3902 is related), coordinating > containers by application master without breaking fault tolerance of jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira