[ https://issues.apache.org/jira/browse/MAPREDUCE-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185887#comment-13185887 ]
Hudson commented on MAPREDUCE-3656: ----------------------------------- Integrated in Hadoop-Hdfs-trunk-Commit #1613 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1613/]) MAPREDUCE-3656. Fixed a race condition in MR AM which is failing the sort benchmark consistently. Contributed by Siddarth Seth. vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1231314 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskAttemptListener.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapred/TestTaskAttemptListenerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java > Sort job on 350 scale is consistently failing with latest MRV2 code > -------------------------------------------------------------------- > > Key: MAPREDUCE-3656 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3656 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2, resourcemanager > Affects Versions: 0.23.1 > Reporter: Karam Singh > Assignee: Siddharth Seth > Priority: Blocker > Fix For: 0.23.1 > > Attachments: MR3656.txt, MR3656.txt, MR3656.txt > > > With the code checked out on last two days. > Sort Job on 350 node scale with 16800 maps and 680 reduces consistently > failing for around last 6 runs > When around 50% of maps are completed, suddenly job jumps to failed state. > On looking at NM log, found RM sent Stop Container Request to NM for AM > container. > But at INFO level from RM log not able find why RM is killing AM when job is > not killed manually. > One thing found common on failed AM logs is -: > org.apache.hadoop.yarn.state.InvalidStateTransitonException > With with different. > For e.g. One log says -: > {code} > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > TA_UPDATE at ASSIGNED > {code} > Whereas other logs says -: > {code} > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > JOB_COUNTER_UPDATE at ERROR > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira