[ https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533553 ]
Yuri Pradkin commented on HADOOP-1245: -------------------------------------- The patch is cool, I tested it on our hadoop cluster and it seems to be working. +1. > value for mapred.tasktracker.tasks.maximum taken from two different sources > --------------------------------------------------------------------------- > > Key: HADOOP-1245 > URL: https://issues.apache.org/jira/browse/HADOOP-1245 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.12.3 > Reporter: Michael Bieniosek > Attachments: tasktracker-max-tasks-1245.patch > > > I want to create a cluster with machines with different numbers of CPUs. > Consequently, each machine should have a different value for > mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound. > However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on > the jobtracker and the tasktracker. > When a new job starts up, the jobtracker uses its (single) value for > mapred.tasktracker.tasks.maximum to assign tasks. This means that each > tasktracker gets the same number of tasks, regardless of how I configured > that particular machine. > After the first task finishes on each tasktracker, the tasktracker will > request new tasks from the jobtracker according to the tasktracker's value > for mapred.tasktracker.tasks.maximum. So after the first round of map tasks > is done, the cluster reverts to a mode that works well for heterogeneous > clusters. > The jobtracker should not consult its config for the value of > mapred.tasktracker.tasks.maximum. It should assign tasks (or allow > tasktrackers to request tasks) according to each tasktracker's value of > mapred.tasktracker.tasks.maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.