[ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838546#action_12838546 ]
Allen Wittenauer commented on MAPREDUCE-1221: --------------------------------------------- I've read through this jira a few times, and looked at some of the previously mentioned jira's around memory limits. I think I see where the issue is actually at. Arun started to fill in the historical background, but I think he may have missed a significant point. Let's retell the story, so that we can get to cruxt of the ops requirement here.... Under HOD w/torque, we configured torque such that it would limit the virtual memory size to be total vm - 4gb. [This left plenty of ram for LInux, our monitoring software, etc, etc. So on a machine with 4x4GB swap partitions and 16GB RAM, the vm limit would be set to 28GB]. Now the thing about hod is that it allocates the -entire- node to an -entire- job.... which means there is a subtle point here, easily missed: the vm limit under torque was the aggregate for *all* of the tasks on the node, not just a single task. So if you had a bad behaving task/job, it will kill all the tasks running on that node. To simulate this ops requirement, hadoop should be taking the memory used by *all* the tasks and then performing some action. While I realize there is a desire to only punish 'bad tasks', I'm not sure if there is an easy way to do that. Putting my jack boots on, my answer is Kill Them All and Let The Users Sort Themselves Out. If I have to pick between killing the system (we're talking *hard hang* here, not happy little panic in my experiences) and punishing potentially innocent users, the answer is easy. Now here is where things get more complex, and there is a very good chance I've gotten this wrong. [Hopefully I have, because it sounds to me like a feature was added in the wrong spot.] It sounds like capacity has the ability to kill tasks based upon vm per node. It has this idea of a max vm size and how much memory each task is asking for. It then schedules based upon a weird slot+mem ratio. While this is a fine and dandy feature that would likely fix the requestors problem, I think it is a bit short sighted not have the kill feature at the task tracker level. The task tracker, regardless of scheduler, should still be able to keep track of all the mem used on the box and kill as necessary. If a scheduler wants to provide alternative logic, more power to it. But tying this to a scheduler just seems a bit ridiculous. > Kill tasks on a node if the free physical memory on that machine falls below > a configured threshold > --------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-1221 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker > Affects Versions: 0.22.0 > Reporter: dhruba borthakur > Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, > MAPREDUCE-1221-v3.patch > > > The TaskTracker currently supports killing tasks if the virtual memory of a > task exceeds a set of configured thresholds. I would like to extend this > feature to enable killing tasks if the physical memory used by that task > exceeds a certain threshold. > On a certain operating system (guess?), if user space processes start using > lots of memory, the machine hangs and dies quickly. This means that we would > like to prevent map-reduce jobs from triggering this condition. From my > understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were > designed to address this problem. This works well when most map-reduce jobs > are Java jobs and have well-defined -Xmx parameters that specify the max > virtual memory for each task. On the other hand, if each task forks off > mappers/reducers written in other languages (python/php, etc), the total > virtual memory usage of the process-subtree varies greatly. In these cases, > it is better to use kill-tasks-using-physical-memory-limits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.