[ 
https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-1245:
--------------------------------------

    Description: 
I want to create a cluster with machines with different numbers of CPUs.  
Consequently, each machine should have a different value for 
mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound.

However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on 
the jobtracker and the tasktracker.  

When a new job starts up, the jobtracker uses its (single) value for 
mapred.tasktracker.tasks.maximum to assign tasks.  This means that each 
tasktracker gets the same number of tasks, regardless of how I configured that 
particular machine.

After the first task finishes on each tasktracker, the tasktracker will request 
new tasks from the jobtracker according to the tasktracker's value for 
mapred.tasktracker.tasks.maximum.  So after the first round of map tasks is 
done, the cluster reverts to a mode that works well for heterogeneous clusters.

The jobtracker should not consult its config for the value of 
mapred.tasktracker.tasks.maximum.  It should assign tasks (or allow 
tasktrackers to request tasks) according to each tasktracker's value of 
mapred.tasktracker.tasks.maximum.

  was:
When I start a job, hadoop uses mapred.tasktracker.tasks.maximum on the 
jobtracker.  Once these tasks finish, it is the tasktracker's value of 
mapred.tasktracker.tasks.maximum that decides how many new tasks are created 
for each host. 

This would probably be fixed if HADOOP-785 were implemented.



> value for mapred.tasktracker.tasks.maximum taken from two different sources
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-1245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1245
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Michael Bieniosek
>
> I want to create a cluster with machines with different numbers of CPUs.  
> Consequently, each machine should have a different value for 
> mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound.
> However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on 
> the jobtracker and the tasktracker.  
> When a new job starts up, the jobtracker uses its (single) value for 
> mapred.tasktracker.tasks.maximum to assign tasks.  This means that each 
> tasktracker gets the same number of tasks, regardless of how I configured 
> that particular machine.
> After the first task finishes on each tasktracker, the tasktracker will 
> request new tasks from the jobtracker according to the tasktracker's value 
> for mapred.tasktracker.tasks.maximum.  So after the first round of map tasks 
> is done, the cluster reverts to a mode that works well for heterogeneous 
> clusters.
> The jobtracker should not consult its config for the value of 
> mapred.tasktracker.tasks.maximum.  It should assign tasks (or allow 
> tasktrackers to request tasks) according to each tasktracker's value of 
> mapred.tasktracker.tasks.maximum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to