[jira] Commented: (HADOOP-4523) Enhance how memory-intensive user tasks are handled

Hemanth Yamijala (JIRA) Wed, 12 Nov 2008 02:36:47 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646855#action_12646855
 ]


Hemanth Yamijala commented on HADOOP-4523:
------------------------------------------

- Consider a case where a task just started, and so is sitting in the 
tasksToBeAdded list, but doesn't have a ProcessTreeInfo. If by some chance, 
this task is the first reduce task with the least progress, it would be 
returned in the findTaskToKill method. However, if it is not found in the 
processTreeInfoMap, it is not added to the tasksToKill list. And also the 
memory is not reduced. Hence subsequent calls to findTaskToKill will keep 
repeating this task, and the code would be stuck in a loop.

- Code related to killing a task is repeated when killing tasks that were over 
limit, and those that need to be killed because the cumulative limit is still 
in excess. Can we refactor this into a common code ?

- Can we improve the diagnostic message being logged when we kill the task. 
Something like: "Killing task '<tid>' as the cumulative memory usage of tasks 
exceeds virtual memory limit '<limit>' on the task tracker, as task has the 
least progress."

- Add javadoc on param tasksToExclude for findTaskToKill. Can mention that 
passing null will include all tasks.

- I think we need the following tests:
-- When memory management is disabled, and tasks running over limits, but 
nothing is killed. (backwards compatibility)
-- When there are tasks that are individually over limit and also cumulatively 
over limit (where some tasks haven't specified memory limits)

- To simulate a task with least progress, can we have some tasks which have 
very large sleep limit and some with very small, or something like that.

- Suggest a few shorter names for the tests. Essentially we test with jobs 
which exceed limits individually, cumulatively, and a mix of both. So something 
like
-- testJobWithinLimits
-- testJobExceedingLimits
-- testJobsCumulativelyExceedingLimits
-- testMixedSetOfJobsExceedingLimits

- Few references to WordCount in the comments. Are they valid ?

- The testTasksWithinIndividualLimitsButTotalUsageBeyondTTLimits (or 
testJobsCumulativelyExceedingLimits) does not seem deterministic. Indeed this 
test case failed on my machine. How can we be sure that atleast one overflows. 
One way could be to a have a TT with 2 maps and 2 reduce slots. Submit a job 
with 2 map tasks and 2 reduces. Let the tasks ask for high memory so that sum 
of 2 tasks exceeds the TT limit, and the TT have very low memory limit. Then we 
can get the task reports and verify that a couple of tasks were killed. 



> Enhance how memory-intensive user tasks are handled
> ---------------------------------------------------
>
>                 Key: HADOOP-4523
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4523
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Vivek Ratan
>            Assignee: Vinod K V
>         Attachments: HADOOP-4523-200811-05.txt, HADOOP-4523-200811-06.txt, 
> HADOOP-4523-20081110.txt
>
>
> HADOOP-3581 monitors each Hadoop task to see if its memory usage (which 
> includes usage of any tasks spawned by it and so on) is within a per-task 
> limit. If the task's memory usage goes over its limit, the task is killed. 
> This, by itself, is not enough to prevent badly behaving jobs from bringing 
> down nodes. What is also needed is the ability to make sure that the sum 
> total of VM usage of all Hadoop tasks does not exceed a certain limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4523) Enhance how memory-intensive user tasks are handled

Reply via email to