[ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838610#action_12838610 ]
dhruba borthakur commented on MAPREDUCE-1221: --------------------------------------------- Please allow me to present my use case. I have users submitting their own jobs to the cluster. These jobs and neither audited nor vetted by any authority before being deployed on the cluster. The mappers for most of these jobs are written in python or php. In these languages, it is easy for code writers to mistakenly use excessive amounts of memory (via a python dictionary or some such thing). We have seen about 1 such case per month in our cluster. The thing to note that in all 100% of these jobs, the user had a coding error that erroneously kept on inserting elements to his/her dictionary. These are not "valid" jobs, and are usually killed by the user when he/she realises his/her coding mistake. The problem we are encountering is that when such a job is let loose in our cluster, many tasks start eating lots of memory, thus causing excessive swapping and finally makes the OS on those hang. This JIRA attempts to prevent this scenario. Once properly configured, this JIRA will make it really really hard for a user job to be able to bring down nodes in the Hadoop cluster. This JIRA increases the stability and uptime of our cluster to a great extent. I would request all concerned authorities to review this JIRA from this perspective. > Kill tasks on a node if the free physical memory on that machine falls below > a configured threshold > --------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-1221 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker > Affects Versions: 0.22.0 > Reporter: dhruba borthakur > Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, > MAPREDUCE-1221-v3.patch > > > The TaskTracker currently supports killing tasks if the virtual memory of a > task exceeds a set of configured thresholds. I would like to extend this > feature to enable killing tasks if the physical memory used by that task > exceeds a certain threshold. > On a certain operating system (guess?), if user space processes start using > lots of memory, the machine hangs and dies quickly. This means that we would > like to prevent map-reduce jobs from triggering this condition. From my > understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were > designed to address this problem. This works well when most map-reduce jobs > are Java jobs and have well-defined -Xmx parameters that specify the max > virtual memory for each task. On the other hand, if each task forks off > mappers/reducers written in other languages (python/php, etc), the total > virtual memory usage of the process-subtree varies greatly. In these cases, > it is better to use kill-tasks-using-physical-memory-limits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.