[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185808#comment-13185808 ]
Hadoop QA commented on MAPREDUCE-3656: -------------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510512/MR3656.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1610//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1610//console This message is automatically generated. > Sort job on 350 scale is consistently failing with latest MRV2 code > -------------------------------------------------------------------- > > Key: MAPREDUCE-3656 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2, resourcemanager > Affects Versions: 0.23.1 > Reporter: Karam Singh > Assignee: Siddharth Seth > Priority: Blocker > Fix For: 0.23.1 > > Attachments: MR3656.txt, MR3656.txt, MR3656.txt > > > With the code checked out on last two days. > Sort Job on 350 node scale with 16800 maps and 680 reduces consistently > failing for around last 6 runs > When around 50% of maps are completed, suddenly job jumps to failed state. > On looking at NM log, found RM sent Stop Container Request to NM for AM > container. > But at INFO level from RM log not able find why RM is killing AM when job is > not killed manually. > One thing found common on failed AM logs is -: > org.apache.hadoop.yarn.state.InvalidStateTransitonException > With with different. > For e.g. One log says -: > {code} > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > TA_UPDATE at ASSIGNED > {code} > Whereas other logs says -: > {code} > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > JOB_COUNTER_UPDATE at ERROR > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira