[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838610#action_12838610
 ] 

dhruba borthakur commented on MAPREDUCE-1221:
---------------------------------------------

Please allow me to present my use case.

I have users submitting their own jobs to the cluster. These jobs and neither 
audited nor vetted by any authority before being deployed on the cluster. The 
mappers for most of these jobs are written in python or php. In these 
languages, it is easy for code writers to mistakenly use  excessive amounts of 
memory (via a python dictionary or some such thing). We have seen about 1 such 
case per month in our cluster. The thing to note that in all 100% of these 
jobs, the user had a coding error that erroneously kept on inserting elements 
to his/her dictionary. These are not "valid" jobs, and are usually killed by 
the user when he/she realises his/her coding mistake.

The problem we are encountering is that when such a job is let loose in our 
cluster, many tasks start eating lots of memory, thus causing excessive 
swapping and finally makes the OS on those hang. This JIRA attempts to prevent 
this scenario. Once properly configured, this JIRA will make it really really 
hard for a user job to be able to bring down nodes in the Hadoop cluster. This 
JIRA increases the stability and uptime of our cluster to a great extent. I 
would request all concerned authorities to review this JIRA from this 
perspective.





> Kill tasks on a node if the free physical memory on that machine falls below 
> a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, 
> MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a 
> task exceeds a set of configured thresholds. I would like to extend this 
> feature to enable killing tasks if the physical memory used by that task 
> exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using 
> lots of memory, the machine hangs and dies quickly. This means that we would 
> like to prevent map-reduce jobs from triggering this condition. From my 
> understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
> designed to address this problem. This works well when most map-reduce jobs 
> are Java jobs and have well-defined -Xmx parameters that specify the max 
> virtual memory for each task. On the other hand, if each task forks off 
> mappers/reducers written in other languages (python/php, etc), the total 
> virtual memory usage of the process-subtree varies greatly. In these cases, 
> it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to