[
https://issues.apache.org/jira/browse/HADOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Bieniosek updated HADOOP-1245:
--------------------------------------
Description:
I want to create a cluster with machines with different numbers of CPUs.
Consequently, each machine should have a different value for
mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound.
However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on
the jobtracker and the tasktracker.
When a new job starts up, the jobtracker uses its (single) value for
mapred.tasktracker.tasks.maximum to assign tasks. This means that each
tasktracker gets the same number of tasks, regardless of how I configured that
particular machine.
After the first task finishes on each tasktracker, the tasktracker will request
new tasks from the jobtracker according to the tasktracker's value for
mapred.tasktracker.tasks.maximum. So after the first round of map tasks is
done, the cluster reverts to a mode that works well for heterogeneous clusters.
The jobtracker should not consult its config for the value of
mapred.tasktracker.tasks.maximum. It should assign tasks (or allow
tasktrackers to request tasks) according to each tasktracker's value of
mapred.tasktracker.tasks.maximum.
was:
When I start a job, hadoop uses mapred.tasktracker.tasks.maximum on the
jobtracker. Once these tasks finish, it is the tasktracker's value of
mapred.tasktracker.tasks.maximum that decides how many new tasks are created
for each host.
This would probably be fixed if HADOOP-785 were implemented.
> value for mapred.tasktracker.tasks.maximum taken from two different sources
> ---------------------------------------------------------------------------
>
> Key: HADOOP-1245
> URL: https://issues.apache.org/jira/browse/HADOOP-1245
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.3
> Reporter: Michael Bieniosek
>
> I want to create a cluster with machines with different numbers of CPUs.
> Consequently, each machine should have a different value for
> mapred.tasktracker.tasks.maximum, since my map tasks are CPU bound.
> However, hadoop uses BOTH the values for mapred.tasktracker.tasks.maximum on
> the jobtracker and the tasktracker.
> When a new job starts up, the jobtracker uses its (single) value for
> mapred.tasktracker.tasks.maximum to assign tasks. This means that each
> tasktracker gets the same number of tasks, regardless of how I configured
> that particular machine.
> After the first task finishes on each tasktracker, the tasktracker will
> request new tasks from the jobtracker according to the tasktracker's value
> for mapred.tasktracker.tasks.maximum. So after the first round of map tasks
> is done, the cluster reverts to a mode that works well for heterogeneous
> clusters.
> The jobtracker should not consult its config for the value of
> mapred.tasktracker.tasks.maximum. It should assign tasks (or allow
> tasktrackers to request tasks) according to each tasktracker's value of
> mapred.tasktracker.tasks.maximum.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.