[ https://issues.apache.org/jira/browse/MAPREDUCE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453747#comment-13453747 ]
Sayali Dehedkar commented on MAPREDUCE-2038: -------------------------------------------- There is one paper in Cloud Computing Technology and Science (CloudCom), 2011. Link is http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6133196&contentType=Conference+Publications&queryText%3DHadoop-0.20.2. Can we incorporate this idea? > Making reduce tasks locality-aware > ---------------------------------- > > Key: MAPREDUCE-2038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2038 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Reporter: Hong Tang > > Currently Hadoop MapReduce framework does not take into consideration of data > locality when it decides to launch reduce tasks. There are several cases > where it could become sub-optimal. > - The map output data for a particular reduce task are not distributed evenly > across different racks. This could happen when the job does not have many > maps, or when there is heavy skew in map output data. > - A reduce task may need to access some side file (e.g. Pig fragmented join, > or incremental merge of unsorted smaller dataset with an already sorted large > dataset). It'd be useful to place reduce tasks based on the location of the > side files they need to access. > This jira is created for the purpose of soliciting ideas on how we can make > it better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira