[ https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185356#comment-13185356 ]
Vinod Kumar Vavilapalli commented on MAPREDUCE-3596: ---------------------------------------------------- Thanks Robert. And a very good catch Sid. Doing it on NM is more complicated with the heartbeat thread different from the AMNM RPC. I am tending to do it on the RM itself inside the scheduler. > Sort benchmark got hang after completion of 99% map phase > --------------------------------------------------------- > > Key: MAPREDUCE-3596 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 > Affects Versions: 0.23.0 > Reporter: Ravi Prakash > Assignee: Vinod Kumar Vavilapalli > Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3596-20120111.1.txt, > MAPREDUCE-3596-20120111.txt, MAPREDUCE-3596-20120112.txt, logs.tar.bz2, > logs.tar.bz2 > > > Courtesy [~vinaythota] > {quote} > Ran sort benchmark couple of times and every time the job got hang after > completion 99% map phase. There are some map tasks failed. Also it's not > scheduled some of the pending map tasks. > Cluster size is 350 nodes. > Build Details: > ============== > Compiled: Fri Dec 9 16:25:27 PST 2011 by someone from > branches/branch-0.23/hadoop-common-project/hadoop-common > ResourceManager version: revision 1212681 by someone source checksum > on Fri Dec 9 16:52:07 PST 2011 > Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST > 2011 > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira