[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455840#comment-13455840 ]
Tsuyoshi OZAWA commented on MAPREDUCE-4502: ------------------------------------------- Chris and Karthik, Thank you for your sharing your experience and thinking. These are very useful for me. bq. ShuffleHandler is an auxiliary service loaded in the NodeManager. It's shared across all containers. I see. I have to redesign it to run combiner in container. bq. Carlo Curino and I experimented with this, but (a) saw only slight improvements in job performance and (b) the changes to the AM to accommodate a new task type were extensive. This is very interesting. In fact, I prototyped to run combiner at the end of MapTask as the first version. And, its performance was good. In this case, I found that it's needed to add new status to MapTask because of assuring fault tolerance. Is it acceptable for hadoop to do that? bq. With logic to manage skew, we're hoping that scheduling an aggressive range can have a similar effect to combiner tasks, without introducing the new task type. This seems to be good approach to deal with rack-level aggregation. Do you have some result to bq. 1. Perform node-level aggregation (reduce) at the end of maps in co-ordination with AM. bq. 2. Perform rack-level aggregation at the end of node-level aggregation again in co-ordination with AM. The aggregation could be performed in parallel across the involved nodes such that each node has aggregated values of different keys. bq. 3. Schedule reducers taking the key-distribution into account across racks. Nice wrap-up :-) bq. The con will be that the shuffle won't be asynchronous to map computation, but hopefully this wouldn't offset the gains of decreased network and disk I/O. The balance between the gains by asynchronous processing and the one by decreasing network and disk I/O. In my previous experiment, it deeply depends on number of reducers. I think these gains are trade-off, so parameters are necessary to deal with various workloads. bq. PS. http://dl.acm.org/citation.cfm?id=1901088 documents the advantages of multi-level aggregation in the context of graph algorithms modeled as iterative MR jobs. I'm going read it :) It's time for me to create the new revision of design note with reflecting your opinion. Thanks, -- Tsuyoshi > Multi-level aggregation with combining the result of maps per node/rack > ----------------------------------------------------------------------- > > Key: MAPREDUCE-4502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 > Reporter: Tsuyoshi OZAWA > Assignee: Tsuyoshi OZAWA > Attachments: speculative_draft.pdf > > > The shuffle costs is expensive in Hadoop in spite of the existence of > combiner, because the scope of combining is limited within only one MapTask. > To solve this problem, it's a good way to aggregate the result of maps per > node/rack by launch combiner. > This JIRA is to implement the multi-level aggregation infrastructure, > including combining per container(MAPREDUCE-3902 is related), coordinating > containers by application master without breaking fault tolerance of jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira