[ https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siddharth Seth updated MAPREDUCE-3596: -------------------------------------- Attachment: logs.tar.bz2 Attached some parts of the AM and RM logs. am1/rm1 - first 2 map failures am2/rm2 - 3rd map failure am3/rm3 - last bit before the job was killed. The first failed map was retried successfully. The remaining 2 never got containers allocated. Looks like this may be an issue on the RM (RM logs aren't very useful though - since DEBUG logging wasn't enabled). The AM side table looks ok. After the second failed map - 1 container requested with priority=5 (never allocated) {noformat} 2011-12-16 07:09:15,871 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: addResourceRequest: applicationId=2 priority=5 resourceName=* numContainers=1 #asks=1 {noformat} After the third failed map - 2 container requests with priority=5 (never allocated) {noformat} 2011-12-16 07:26:07,641 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: addResourceRequest: applicationId=2 priority=5 resourceName=* numContainers=2 #asks=1 {noformat} Towards the end, all reduce tasks are around 0.3328 complete, pendingMaps stays at 2. > Sort benchmark got hang after completion of 99% map phase > --------------------------------------------------------- > > Key: MAPREDUCE-3596 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 > Affects Versions: 0.23.0 > Reporter: Ravi Prakash > Priority: Critical > Attachments: logs.tar.bz2 > > > Courtesy [~vinaythota] > {quote} > Ran sort benchmark couple of times and every time the job got hang after > completion 99% map phase. There are some map tasks failed. Also it's not > scheduled some of the pending map tasks. > Cluster size is 350 nodes. > Build Details: > ============== > Compiled: Fri Dec 9 16:25:27 PST 2011 by someone from > branches/branch-0.23/hadoop-common-project/hadoop-common > ResourceManager version: revision 1212681 by someone source checksum > on Fri Dec 9 16:52:07 PST 2011 > Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST > 2011 > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira