[ https://issues.apache.org/jira/browse/MAPREDUCE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated MAPREDUCE-936: --------------------------------------- Resolution: Fixed Fix Version/s: 0.21.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. Thanks Zheng. > Allow a load difference in fairshare scheduler > ---------------------------------------------- > > Key: MAPREDUCE-936 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-936 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/fair-share > Affects Versions: 0.20.1, 0.21.0, 0.22.0 > Reporter: Zheng Shao > Assignee: Zheng Shao > Fix For: 0.21.0 > > Attachments: MAPREDUCE-936.1.patch, MAPREDUCE-936.2.patch > > > The problem we are facing: It takes a long time for all tasks of a job to get > scheduled on the cluster, even if the cluster is almost empty. > There are two reasons that together lead to this situation: > 1. The load factor makes sure each TT runs the same number of tasks. (This is > the part that this patch tries to change). > 2. The scheduler tries to schedule map tasks locally (first node-local, then > rack-local). There is a wait time (mapred.fairscheduler.localitywait.node and > mapred.fairscheduler.localitywait.rack, both are around 10 sec in our conf), > and accumulated wait time (JobInfo.localityWait). The accumulated wait time > is reset to 0 whenever a non-local map task is scheduled. That means it takes > N * wait_time to schedule N non-local map tasks. > Because of 1, a lot of TT will not be able to take more tasks, even if they > have free slots. As a result, a lot of the map tasks cannot be scheduled > locally. > Because of 2, it's really hard to schedule a non-local task. > As a result, sometimes we are seeing that it takes more than 2 minutes to > schedule all the mappers of a job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.