[ http://issues.apache.org/jira/browse/HADOOP-465?page=all ]
Doug Cutting resolved HADOOP-465.
---------------------------------
Fix Version/s: 0.6.0
Resolution: Duplicate
This was fixed in HADOOP-400.
> Jobtracker doesn't always spread reduce tasks evenly if
> (mapred.tasktracker.tasks.maximum > 1)
> ----------------------------------------------------------------------------------------------
>
> Key: HADOOP-465
> URL: http://issues.apache.org/jira/browse/HADOOP-465
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Reporter: Chris Schneider
> Priority: Minor
> Fix For: 0.6.0
>
>
> I note that (at least for Nutch 0.8 Generator.Selector.reduce) if
> mapred.reduce.tasks is the same as the number of tasktrackers, and
> mapred.tasktracker.tasks.maximum is left at the default of 2, I typically
> have no reduce tasks running on a few of my tasktrackers, and two reduce
> tasks running on the same number of other tasktrackers.
> It seems like the jobtracker should assign reduce tasks to tasktrackers in a
> round robin fashion, so that the distribution will be spread as evenly as
> possible. The current implementation would seem to waste at least some time
> if one or more slave machines have to execute two reduce tasks simultaneously
> while other tasktrackers sit idle, with the amount of wasted time depending
> on how dependent the reduce tasks were on the slave machine's resources.
> I first thought that perhaps the jobtracker was "overloading" the
> tasktrackers that had already finished their map tasks (and avoiding those
> that were still mapping). However, as I understand it, the reduce tasks are
> all launched at the beginning of the job so that they are all ready and
> waiting for map output data when it first appears.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira