subject:"\[jira\] \[Commented\] \(MAPREDUCE\-4774\) JobImpl does not handle asynchronous task events in FAILED state"

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494612#comment-13494612
 ] 

Hudson commented on MAPREDUCE-4774:
---

Integrated in Hadoop-Yarn-trunk #32 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/32/])
MAPREDUCE-4774. JobImpl does not handle asynchronous task events in FAILED 
state (jlowe via bobby) (Revision 1407679)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407679
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java


 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494647#comment-13494647
 ] 

Hudson commented on MAPREDUCE-4774:
---

Integrated in Hadoop-Hdfs-0.23-Build #431 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/431/])
svn merge -c 1407679 FIXES: MAPREDUCE-4774. JobImpl does not handle 
asynchronous task events in FAILED state (jlowe via bobby) (Revision 1407689)

 Result = UNSTABLE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407689
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java


 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494662#comment-13494662
 ] 

Hudson commented on MAPREDUCE-4774:
---

Integrated in Hadoop-Hdfs-trunk #1222 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1222/])
MAPREDUCE-4774. JobImpl does not handle asynchronous task events in FAILED 
state (jlowe via bobby) (Revision 1407679)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407679
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java


 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494674#comment-13494674
 ] 

Hudson commented on MAPREDUCE-4774:
---

Integrated in Hadoop-Mapreduce-trunk #1253 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1253/])
MAPREDUCE-4774. JobImpl does not handle asynchronous task events in FAILED 
state (jlowe via bobby) (Revision 1407679)

 Result = FAILURE
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407679
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java


 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494377#comment-13494377
 ] 

Robert Joseph Evans commented on MAPREDUCE-4774:


The change looks simple enough and does fix the failing test.  I am +1 p[ending 
Jenkins approval.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494389#comment-13494389
 ] 

Hadoop QA commented on MAPREDUCE-4774:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12552903/MAPREDUCE-4774.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app:

  org.apache.hadoop.mapreduce.v2.app.TestRecovery

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3006//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3006//console

This message is automatically generated.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494392#comment-13494392
 ] 

Robert Joseph Evans commented on MAPREDUCE-4774:


I ran TestRecovery Manually and it looks like it is a spurious failure.  We 
should file a JIRA to fix it.  Checking in the patch now.

 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing conditions because there are many asynchronous processings there, and 
 the test is flaky, in fact).
 In any way, it looks like the root cause of the problem is the possibility of 
 the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at 
 FAILED. 
 Need an expert advice on how that should be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

2012-11-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494459#comment-13494459
 ] 

Hudson commented on MAPREDUCE-4774:
---

Integrated in Hadoop-trunk-Commit #2997 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2997/])
MAPREDUCE-4774. JobImpl does not handle asynchronous task events in FAILED 
state (jlowe via bobby) (Revision 1407679)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407679
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java


 JobImpl does not handle asynchronous task events in FAILED state
 

 Key: MAPREDUCE-4774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Ivan A. Veselovsky
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.5

 Attachments: MAPREDUCE-4774.patch


 The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently 
  fails in mapred build (e.g. see 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/
  , or 
 https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/).
 The test aims to check Job status notifications received through HTTP 
 Servlet. It runs 3 jobs: successfull, killed, and failed. 
 The test expects the servlet to receive some expected notifications in some 
 expected order. It also tries to test the retry-on-failure notification 
 functionality, so on each 1st notification the servlet answers 400 forcing 
 error, and on each 2nd notification attempt it answers ok. 
 In general, the test fails because the actual number and/or type of the 
 notifications differs from the expected.
 Investigation shows that actual root cause of the problem is an incorrect job 
 state transition: the 3rd job mapred task fails (by intentionally thrown  
 RuntimeException, see UtilsForTests#runJobFail()), and the state of the task 
 changes from RUNNING to FAILED.
 At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in  
 method 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId,
  TaskAttemptCompletionEventStatus)), and this event gets processed in 
 AsyncDispatcher, but this transition is impossible according to the event 
 transition map (see 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). 
 This causes the following exception to be thrown upon the event processing:
 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_ATTEMPT_COMPLETED at FAILED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79)
 at java.lang.Thread.run(Thread.java:662) 
 So, the job gets into state INTERNAL_ERROR, the job end notification like 
 this is sent:
 http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR
  
 (here we can see ERROR status instead of FAILED)
 After that the notification servlet receives either only ERROR 
 notification, or one more notification ERROR after FAILED, which finally 
 causes the test to fail. (Some variation in the test behavior caused by 
 racing

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state

8 matches

Site Navigation

Mail list logo

Footer information