[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Scott Chen (JIRA) Thu, 04 Mar 2010 18:07:52 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841653#action_12841653
 ]


Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

@Arun: Sorry for the very late reply. Dhruba and I have been trying to call you 
but it seems you are busy as well.

I think I got your point. The problem is that the bad job will never fail and 
its task gets killed and rescheduled again and again which keeps hurting the 
cluster. So we should add per task RSS limit in this patch so that we can fail 
the bad job. This is just like what we currently do in the trunk for virtual 
memory. But we here offer the RSS memory limiting as an option (a trade-off 
between memory utilization and stability).

I will make the change and resubmit the patch soon. Thanks again for the help.


> Kill tasks on a node if the free physical memory on that machine falls below 
> a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, 
> MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a 
> task exceeds a set of configured thresholds. I would like to extend this 
> feature to enable killing tasks if the physical memory used by that task 
> exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using 
> lots of memory, the machine hangs and dies quickly. This means that we would 
> like to prevent map-reduce jobs from triggering this condition. From my 
> understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
> designed to address this problem. This works well when most map-reduce jobs 
> are Java jobs and have well-defined -Xmx parameters that specify the max 
> virtual memory for each task. On the other hand, if each task forks off 
> mappers/reducers written in other languages (python/php, etc), the total 
> virtual memory usage of the process-subtree varies greatly. In these cases, 
> it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Reply via email to