[
https://issues.apache.org/jira/browse/HADOOP-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648541#action_12648541
]
Vivek Ratan commented on HADOOP-4035:
-------------------------------------
As Vinod has brought up, there are some edge cases and details missing in [the
summary|https://issues.apache.org/jira/browse/HADOOP-4035?focusedCommentId=12644267#action_12644267]
that we need to cover.
We want monitoring to work independent of scheduler support, i.e., even if the
scheduler you're using does not support memory-based scheduling, you may still
want to make sure the TTs monitor memory usage on their machines and kill tasks
if too much memory is used. Based on what we've described in the summary, the
following three configuration settings are required for the TT to do
monitoring: {{mapred.tasktracker.virtualmemory.reserved}} (the offset of total
VM on the machine), {{mapred.task.default.maxvm}} (the default for maximum VM
per task), and {{mapred.task.limit.maxvm}} (the upper limit on the max VM per
task). It is proposed that:
* if one or more of these three values are missing in the configuration, the TT
disables monitoring and logs an appropriate message.
* At startup, the TT should also make sure that {{mapred.task.default.maxvm}}
is not greater than {{mapred.task.limit.maxvm}}. If it is, the TT logs a
message and disables monitoring.
* if all three are present, the TT has enough information to compute the
_max-VM-per-task_ limit for each task it runs and can successfully monitor
memory usage.
* Without scheduler support, the TT can get a task whose _max-VM-per-task_
limit is higher than {{mapred.task.limit.maxvm}} (i.e., the user-set value for
a job's {{mapred.task.maxvm}} can be higher than {{mapred.task.limit.maxvm}}).
In such a case, the TT can choose to fail the task, or it may still run the
task while logging the problem. IMO, the former seems too harsh and not
something that the TT should possibly decide just based on its settings for
monitoring. In the latter case, the TT can still continue monitoring, but may
end up killing the wrong task if the sum of VMs used is over the
_max-VM-per-node_ limit. I propose we do the latter.
The TT also needs to report memory information to the schedulers. As per
HADOOP-3759, TTs currently report, in each heartbeat, how much free VM they
have (which is equal to _max-VM-per-node_ minus the sum of _max-VM-per-task_
for each running task). This makes sense if monitoring is on, and the three
necessary VM config values are defined. If they're not, and the TT cannot
determine its free VM, what should it report?
* It can report -1, or some such value, indicating that it cannot compute free
VM.
* If we let schedulers decide how they want to behave in the absence of
monitoring, or rather in the absence of the necessary VM config values being
defined, a TT should always report how much total VM (as well as RAM) it has,
as well as its value for {{mapred.tasktracker.virtualmemory.reserved}}.
I propose the latter. TTs always report how much VM&RAM they have on their
system, and what offset settings they have. They're the only ones who have this
information, and this approach gives a lot of flexibility to the schedulers in
terms of how to use that information.
What about schedulers? The Capacity Scheduler should do the following:
* If any of the three mandatory VM settings are not set, it should not schedule
based on VM or RAM. The value of {{mapred.tasktracker.virtualmemory.reserved}}
comes from the TT while the other two can be read by the scheduler from its own
config file.
* If the mandatory VM values are set, as well as the mandatory RAM values
({{mapred.capacity-scheduler.default.ramlimit}}, {{mapred.task.limit.maxram}}),
the scheduler uses both VM and RAM settings to schedule, as defined in the
earlier
[summary|https://issues.apache.org/jira/browse/HADOOP-4035?focusedCommentId=12644267#action_12644267].
* If the mandatory VM values are set, but one or more of the mandatory RAM
values are not, the scheduler only uses VM values for scheduling.
It's possible that other schedulers may choose a different algorithm. What's
important is that they have all the available information, which they should as
per this proposal.
> Modify the capacity scheduler (HADOOP-3445) to schedule tasks based on memory
> requirements and task trackers free memory
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-4035
> URL: https://issues.apache.org/jira/browse/HADOOP-4035
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/capacity-sched
> Affects Versions: 0.19.0
> Reporter: Hemanth Yamijala
> Assignee: Vinod K V
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: 4035.1.patch, HADOOP-4035-20080918.1.txt,
> HADOOP-4035-20081006.1.txt, HADOOP-4035-20081006.txt, HADOOP-4035-20081008.txt
>
>
> HADOOP-3759 introduced configuration variables that can be used to specify
> memory requirements for jobs, and also modified the tasktrackers to report
> their free memory. The capacity scheduler in HADOOP-3445 should schedule
> tasks based on these parameters. A task that is scheduled on a TT that uses
> more than the default amount of memory per slot can be viewed as effectively
> using more than one slot, as it would decrease the amount of free memory on
> the TT by more than the default amount while it runs. The scheduler should
> make the used capacity account for this additional usage while enforcing
> limits, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.