Haibo Chen created MAPREDUCE-6675:
-------------------------------------

             Summary: TestJobImpl.testUnusableNode failed 
                 Key: MAPREDUCE-6675
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6675
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 2.7.3
            Reporter: Haibo Chen
            Assignee: Haibo Chen


TestJobImpl#testUnusableNodeTransition is flaky.

2016-02-13 09:16:42 Running 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 8.324 sec <<< FAILURE! - in 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 
testUnusableNodeTransition(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)
  Time elapsed: 5.165 sec  <<< FAILURE!
2016-02-13 09:16:50 java.lang.AssertionError: expected:<SUCCEEDED> but 
was:<ERROR>
2016-02-13 09:16:50     at org.junit.Assert.fail(Assert.java:88)
2016-02-13 09:16:50     at org.junit.Assert.failNotEquals(Assert.java:743)
2016-02-13 09:16:50     at org.junit.Assert.assertEquals(Assert.java:118)
2016-02-13 09:16:50     at org.junit.Assert.assertEquals(Assert.java:144)
2016-02-13 09:16:50     at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:977)
2016-02-13 09:16:50     at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:627)
2016-02-13 09:16:50 
2016-02-13 09:16:50 
2016-02-13 09:16:50 Results :
2016-02-13 09:16:50 
2016-02-13 09:16:50 Failed tests: 
2016-02-13 09:16:50   
TestJobImpl.testUnusableNodeTransition:627->assertJobState:977 
expected:<SUCCEEDED> but was:<ERROR>
2016-02-13 09:16:50 
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0.


Looking at the code, an JobUpdatedNodesEvent is handled by putting an 
TaskAttemptKill event on the async dispatcher queue and return immediately, but 
the event might not have been processed by the time  all JobTaskEvents events 
are seen by the job (the jobTaskSucceeded events are handed to Job immediately 
without going through the dispatcher). Therefore, there is a slight chance that 
the job will see all three succeeded attempts and  transition to Committing 
state before the taskAttemptKill event is handled by the dispatcher. Committing 
jobs will reject later JobTaskEvents received and causing the failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to