[ http://issues.apache.org/jira/browse/HADOOP-465?page=all ]
Doug Cutting resolved HADOOP-465. --------------------------------- Fix Version/s: 0.6.0 Resolution: Duplicate This was fixed in HADOOP-400. > Jobtracker doesn't always spread reduce tasks evenly if > (mapred.tasktracker.tasks.maximum > 1) > ---------------------------------------------------------------------------------------------- > > Key: HADOOP-465 > URL: http://issues.apache.org/jira/browse/HADOOP-465 > Project: Hadoop > Issue Type: Bug > Components: mapred > Reporter: Chris Schneider > Priority: Minor > Fix For: 0.6.0 > > > I note that (at least for Nutch 0.8 Generator.Selector.reduce) if > mapred.reduce.tasks is the same as the number of tasktrackers, and > mapred.tasktracker.tasks.maximum is left at the default of 2, I typically > have no reduce tasks running on a few of my tasktrackers, and two reduce > tasks running on the same number of other tasktrackers. > It seems like the jobtracker should assign reduce tasks to tasktrackers in a > round robin fashion, so that the distribution will be spread as evenly as > possible. The current implementation would seem to waste at least some time > if one or more slave machines have to execute two reduce tasks simultaneously > while other tasktrackers sit idle, with the amount of wasted time depending > on how dependent the reduce tasks were on the slave machine's resources. > I first thought that perhaps the jobtracker was "overloading" the > tasktrackers that had already finished their map tasks (and avoiding those > that were still mapping). However, as I understand it, the reduce tasks are > all launched at the beginning of the job so that they are all ready and > waiting for map output data when it first appears. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira