[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455840#comment-13455840
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-4502:
-------------------------------------------

Chris and Karthik,

Thank you for your sharing your experience and thinking. These are very useful 
for me.

bq. ShuffleHandler is an auxiliary service loaded in the NodeManager. It's 
shared across all containers. 

I see. I have to redesign it to run combiner in container.

bq. Carlo Curino and I experimented with this, but (a) saw only slight 
improvements in job performance and (b) the changes to the AM to accommodate a 
new task type were extensive.

This is very interesting. In fact, I prototyped to run combiner at the end of 
MapTask as the first version. And, its performance was good. In this case, I 
found that it's needed to add new status to MapTask because of assuring fault 
tolerance. Is it acceptable for hadoop to do that?

bq. With logic to manage skew, we're hoping that scheduling an aggressive range 
can have a similar effect to combiner tasks, without introducing the new task 
type.

This seems to be good approach to deal with rack-level aggregation. Do you have 
some result to 

bq. 1. Perform node-level aggregation (reduce) at the end of maps in 
co-ordination with AM.
bq. 2. Perform rack-level aggregation at the end of node-level aggregation 
again in co-ordination with AM. The aggregation could be performed in parallel 
across the involved nodes such that each node has aggregated values of 
different keys.
bq. 3. Schedule reducers taking the key-distribution into account across racks.

Nice wrap-up :-)

bq. The con will be that the shuffle won't be asynchronous to map computation, 
but hopefully this wouldn't offset the gains of decreased network and disk I/O.

The balance between the gains by asynchronous processing and the one by 
decreasing network and disk I/O. In my previous experiment, it deeply depends 
on number of reducers. I think these gains are trade-off, so parameters are 
necessary to deal with various workloads.

bq. PS. http://dl.acm.org/citation.cfm?id=1901088 documents the advantages of 
multi-level aggregation in the context of graph algorithms modeled as iterative 
MR jobs.

I'm going read it :)

It's time for me to create the new revision of design note with reflecting your 
opinion. 

Thanks,
-- Tsuyoshi
                
> Multi-level aggregation with combining the result of maps per node/rack
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4502
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster, mrv2
>            Reporter: Tsuyoshi OZAWA
>            Assignee: Tsuyoshi OZAWA
>         Attachments: speculative_draft.pdf
>
>
> The shuffle costs is expensive in Hadoop in spite of the existence of 
> combiner, because the scope of combining is limited within only one MapTask. 
> To solve this problem, it's a good way to aggregate the result of maps per 
> node/rack by launch combiner.
> This JIRA is to implement the multi-level aggregation infrastructure, 
> including combining per container(MAPREDUCE-3902 is related), coordinating 
> containers by application master without breaking fault tolerance of jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to