[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453747#comment-13453747
 ] 

Sayali Dehedkar commented on MAPREDUCE-2038:
--------------------------------------------

There is one paper in Cloud Computing Technology and Science (CloudCom), 2011. 
Link is 
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6133196&contentType=Conference+Publications&queryText%3DHadoop-0.20.2.
 Can we incorporate this idea?

                
> Making reduce tasks locality-aware
> ----------------------------------
>
>                 Key: MAPREDUCE-2038
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2038
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Hong Tang
>
> Currently Hadoop MapReduce framework does not take into consideration of data 
> locality when it decides to launch reduce tasks. There are several cases 
> where it could become sub-optimal.
> - The map output data for a particular reduce task are not distributed evenly 
> across different racks. This could happen when the job does not have many 
> maps, or when there is heavy skew in map output data.
> - A reduce task may need to access some side file (e.g. Pig fragmented join, 
> or incremental merge of unsorted smaller dataset with an already sorted large 
> dataset). It'd be useful to place reduce tasks based on the location of the 
> side files they need to access.
> This jira is created for the purpose of soliciting ideas on how we can make 
> it better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to