[
https://issues.apache.org/jira/browse/HADOOP-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646855#action_12646855
]
Hemanth Yamijala commented on HADOOP-4523:
------------------------------------------
- Consider a case where a task just started, and so is sitting in the
tasksToBeAdded list, but doesn't have a ProcessTreeInfo. If by some chance,
this task is the first reduce task with the least progress, it would be
returned in the findTaskToKill method. However, if it is not found in the
processTreeInfoMap, it is not added to the tasksToKill list. And also the
memory is not reduced. Hence subsequent calls to findTaskToKill will keep
repeating this task, and the code would be stuck in a loop.
- Code related to killing a task is repeated when killing tasks that were over
limit, and those that need to be killed because the cumulative limit is still
in excess. Can we refactor this into a common code ?
- Can we improve the diagnostic message being logged when we kill the task.
Something like: "Killing task '<tid>' as the cumulative memory usage of tasks
exceeds virtual memory limit '<limit>' on the task tracker, as task has the
least progress."
- Add javadoc on param tasksToExclude for findTaskToKill. Can mention that
passing null will include all tasks.
- I think we need the following tests:
-- When memory management is disabled, and tasks running over limits, but
nothing is killed. (backwards compatibility)
-- When there are tasks that are individually over limit and also cumulatively
over limit (where some tasks haven't specified memory limits)
- To simulate a task with least progress, can we have some tasks which have
very large sleep limit and some with very small, or something like that.
- Suggest a few shorter names for the tests. Essentially we test with jobs
which exceed limits individually, cumulatively, and a mix of both. So something
like
-- testJobWithinLimits
-- testJobExceedingLimits
-- testJobsCumulativelyExceedingLimits
-- testMixedSetOfJobsExceedingLimits
- Few references to WordCount in the comments. Are they valid ?
- The testTasksWithinIndividualLimitsButTotalUsageBeyondTTLimits (or
testJobsCumulativelyExceedingLimits) does not seem deterministic. Indeed this
test case failed on my machine. How can we be sure that atleast one overflows.
One way could be to a have a TT with 2 maps and 2 reduce slots. Submit a job
with 2 map tasks and 2 reduces. Let the tasks ask for high memory so that sum
of 2 tasks exceeds the TT limit, and the TT have very low memory limit. Then we
can get the task reports and verify that a couple of tasks were killed.
> Enhance how memory-intensive user tasks are handled
> ---------------------------------------------------
>
> Key: HADOOP-4523
> URL: https://issues.apache.org/jira/browse/HADOOP-4523
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: Vivek Ratan
> Assignee: Vinod K V
> Attachments: HADOOP-4523-200811-05.txt, HADOOP-4523-200811-06.txt,
> HADOOP-4523-20081110.txt
>
>
> HADOOP-3581 monitors each Hadoop task to see if its memory usage (which
> includes usage of any tasks spawned by it and so on) is within a per-task
> limit. If the task's memory usage goes over its limit, the task is killed.
> This, by itself, is not enough to prevent badly behaving jobs from bringing
> down nodes. What is also needed is the ability to make sure that the sum
> total of VM usage of all Hadoop tasks does not exceed a certain limit.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.