[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3596:
--------------------------------------

    Attachment: logs.tar.bz2

Attached some parts of the AM and RM logs.
am1/rm1 - first 2 map failures
am2/rm2 - 3rd map failure
am3/rm3 - last bit before the job was killed.

The first failed map was retried successfully. The remaining 2 never got 
containers allocated.

Looks like this may be an issue on the RM (RM logs aren't very useful though - 
since DEBUG logging wasn't enabled). The AM side table looks ok. After the 
second failed map - 1 container requested with priority=5 (never allocated)
{noformat}
2011-12-16 07:09:15,871 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: addResourceRequest: 
applicationId=2 priority=5 resourceName=* numContainers=1 #asks=1
{noformat}

After the third failed map - 2 container requests with priority=5 (never 
allocated)
{noformat}
2011-12-16 07:26:07,641 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: addResourceRequest: 
applicationId=2 priority=5 resourceName=* numContainers=2 #asks=1
{noformat}

Towards the end, all reduce tasks are around 0.3328 complete, pendingMaps stays 
at 2.
                
> Sort benchmark got hang after completion of 99% map phase
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-3596
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ravi Prakash
>            Priority: Critical
>         Attachments: logs.tar.bz2
>
>
> Courtesy [~vinaythota]
> {quote}
> Ran sort benchmark couple of times and every time the job got hang after 
> completion 99% map phase. There are some map tasks failed. Also it's not 
> scheduled some of the pending map tasks.
> Cluster size is 350 nodes.
> Build Details:
> ==============
> Compiled:       Fri Dec 9 16:25:27 PST 2011 by someone from 
> branches/branch-0.23/hadoop-common-project/hadoop-common 
> ResourceManager version:        revision 1212681 by someone source checksum 
> on Fri Dec 9 16:52:07 PST 2011
> Hadoop version:         revision 1212592 by someone Fri Dec 9 16:25:27 PST 
> 2011
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to