[jira] [Commented] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating
[ https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538801#comment-13538801 ] Hudson commented on MAPREDUCE-4890: --- Integrated in Hadoop-Mapreduce-trunk #1292 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1292/]) MAPREDUCE-4890. Invalid TaskImpl state transitions when task fails while speculating. Contributed by Jason Lowe (Revision 1425223) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1425223 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java > Invalid TaskImpl state transitions when task fails while speculating > > > Key: MAPREDUCE-4890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Fix For: 2.0.3-alpha, 0.23.6 > > Attachments: MAPREDUCE-4890.patch > > > There are a couple of issues when a task fails while speculating (i.e.: > multiple attempts are active): > # The other active attempts are not killed. > # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which > can be sent from the other active attempts. These all need to be handled > since they can be sent asynchronously from the other active task attempts. > Failure to handle this properly means jobs that are configured to normally > tolerate failures via mapreduce.map.failures.maxpercent or > mapreduce.reduce.failures.maxpercent and also speculate can easily end up > failing due to invalid state transitions rather than complete successfully > with a few explicitly allowed task failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating
[ https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538787#comment-13538787 ] Hudson commented on MAPREDUCE-4890: --- Integrated in Hadoop-Hdfs-trunk #1262 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1262/]) MAPREDUCE-4890. Invalid TaskImpl state transitions when task fails while speculating. Contributed by Jason Lowe (Revision 1425223) Result = FAILURE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1425223 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java > Invalid TaskImpl state transitions when task fails while speculating > > > Key: MAPREDUCE-4890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Fix For: 2.0.3-alpha, 0.23.6 > > Attachments: MAPREDUCE-4890.patch > > > There are a couple of issues when a task fails while speculating (i.e.: > multiple attempts are active): > # The other active attempts are not killed. > # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which > can be sent from the other active attempts. These all need to be handled > since they can be sent asynchronously from the other active task attempts. > Failure to handle this properly means jobs that are configured to normally > tolerate failures via mapreduce.map.failures.maxpercent or > mapreduce.reduce.failures.maxpercent and also speculate can easily end up > failing due to invalid state transitions rather than complete successfully > with a few explicitly allowed task failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating
[ https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538773#comment-13538773 ] Hudson commented on MAPREDUCE-4890: --- Integrated in Hadoop-Hdfs-0.23-Build #471 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/471/]) svn merge -c 1425223 FIXES: MAPREDUCE-4890. Invalid TaskImpl state transitions when task fails while speculating. Contributed by Jason Lowe (Revision 1425227) Result = UNSTABLE jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1425227 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java > Invalid TaskImpl state transitions when task fails while speculating > > > Key: MAPREDUCE-4890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Fix For: 2.0.3-alpha, 0.23.6 > > Attachments: MAPREDUCE-4890.patch > > > There are a couple of issues when a task fails while speculating (i.e.: > multiple attempts are active): > # The other active attempts are not killed. > # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which > can be sent from the other active attempts. These all need to be handled > since they can be sent asynchronously from the other active task attempts. > Failure to handle this properly means jobs that are configured to normally > tolerate failures via mapreduce.map.failures.maxpercent or > mapreduce.reduce.failures.maxpercent and also speculate can easily end up > failing due to invalid state transitions rather than complete successfully > with a few explicitly allowed task failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating
[ https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538743#comment-13538743 ] Hudson commented on MAPREDUCE-4890: --- Integrated in Hadoop-Yarn-trunk #73 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/73/]) MAPREDUCE-4890. Invalid TaskImpl state transitions when task fails while speculating. Contributed by Jason Lowe (Revision 1425223) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1425223 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java > Invalid TaskImpl state transitions when task fails while speculating > > > Key: MAPREDUCE-4890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Fix For: 2.0.3-alpha, 0.23.6 > > Attachments: MAPREDUCE-4890.patch > > > There are a couple of issues when a task fails while speculating (i.e.: > multiple attempts are active): > # The other active attempts are not killed. > # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which > can be sent from the other active attempts. These all need to be handled > since they can be sent asynchronously from the other active task attempts. > Failure to handle this properly means jobs that are configured to normally > tolerate failures via mapreduce.map.failures.maxpercent or > mapreduce.reduce.failures.maxpercent and also speculate can easily end up > failing due to invalid state transitions rather than complete successfully > with a few explicitly allowed task failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating
[ https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538636#comment-13538636 ] Hudson commented on MAPREDUCE-4890: --- Integrated in Hadoop-trunk-Commit #3154 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3154/]) MAPREDUCE-4890. Invalid TaskImpl state transitions when task fails while speculating. Contributed by Jason Lowe (Revision 1425223) Result = SUCCESS jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1425223 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java > Invalid TaskImpl state transitions when task fails while speculating > > > Key: MAPREDUCE-4890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: MAPREDUCE-4890.patch > > > There are a couple of issues when a task fails while speculating (i.e.: > multiple attempts are active): > # The other active attempts are not killed. > # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which > can be sent from the other active attempts. These all need to be handled > since they can be sent asynchronously from the other active task attempts. > Failure to handle this properly means jobs that are configured to normally > tolerate failures via mapreduce.map.failures.maxpercent or > mapreduce.reduce.failures.maxpercent and also speculate can easily end up > failing due to invalid state transitions rather than complete successfully > with a few explicitly allowed task failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating
[ https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538451#comment-13538451 ] Thomas Graves commented on MAPREDUCE-4890: -- +1 looks good. Thanks Jason. Go ahead and commit. > Invalid TaskImpl state transitions when task fails while speculating > > > Key: MAPREDUCE-4890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: MAPREDUCE-4890.patch > > > There are a couple of issues when a task fails while speculating (i.e.: > multiple attempts are active): > # The other active attempts are not killed. > # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which > can be sent from the other active attempts. These all need to be handled > since they can be sent asynchronously from the other active task attempts. > Failure to handle this properly means jobs that are configured to normally > tolerate failures via mapreduce.map.failures.maxpercent or > mapreduce.reduce.failures.maxpercent and also speculate can easily end up > failing due to invalid state transitions rather than complete successfully > with a few explicitly allowed task failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating
[ https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536248#comment-13536248 ] Hadoop QA commented on MAPREDUCE-4890: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561746/MAPREDUCE-4890.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3140//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3140//console This message is automatically generated. > Invalid TaskImpl state transitions when task fails while speculating > > > Key: MAPREDUCE-4890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: MAPREDUCE-4890.patch > > > There are a couple of issues when a task fails while speculating (i.e.: > multiple attempts are active): > # The other active attempts are not killed. > # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which > can be sent from the other active attempts. These all need to be handled > since they can be sent asynchronously from the other active task attempts. > Failure to handle this properly means jobs that are configured to normally > tolerate failures via mapreduce.map.failures.maxpercent or > mapreduce.reduce.failures.maxpercent and also speculate can easily end up > failing due to invalid state transitions rather than complete successfully > with a few explicitly allowed task failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating
[ https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536204#comment-13536204 ] Jason Lowe commented on MAPREDUCE-4890: --- I was wrong about the KILLED state. KILL_WAIT should handle cleaning up any lingering attempts, and by the time the task transitions from KILL_WAIT to KILLED there should be no active task attempts and therefore no chance of receiving T_ATTEMPT_* events. > Invalid TaskImpl state transitions when task fails while speculating > > > Key: MAPREDUCE-4890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Priority: Critical > > There are a couple of issues when a task fails while speculating (i.e.: > multiple attempts are active): > # The other active attempts are not killed. > # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which > can be sent from the other active attempts. These all need to be handled > since they can be sent asynchronously from the other active task attempts. > Failure to handle this properly means jobs that are configured to normally > tolerate failures via mapreduce.map.failures.maxpercent or > mapreduce.reduce.failures.maxpercent and also speculate can easily end up > failing due to invalid state transitions rather than complete successfully > with a few explicitly allowed task failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating
[ https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535605#comment-13535605 ] Jason Lowe commented on MAPREDUCE-4890: --- Note that it appears the task KILLED state also needs to handle the various T_ATTEMPT_* events since they could arrive asynchronously and legitimately be received in that state. > Invalid TaskImpl state transitions when task fails while speculating > > > Key: MAPREDUCE-4890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Priority: Critical > > There are a couple of issues when a task fails while speculating (i.e.: > multiple attempts are active): > # The other active attempts are not killed. > # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which > can be sent from the other active attempts. These all need to be handled > since they can be sent asynchronously from the other active task attempts. > Failure to handle this properly means jobs that are configured to normally > tolerate failures via mapreduce.map.failures.maxpercent or > mapreduce.reduce.failures.maxpercent and also speculate can easily end up > failing due to invalid state transitions rather than complete successfully > with a few explicitly allowed task failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating
[ https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535603#comment-13535603 ] Jason Lowe commented on MAPREDUCE-4890: --- Example exception trace when a speculative attempt fails after the task already failed: {noformat} 2012-12-18 01:06:35,885 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1354689281155_256490_m_00_4 TaskAttempt Transitioned from FAIL_TASK_CLEANUP to FAILED 2012-12-18 01:06:35,887 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Can't handle this event at current state for task_1354689281155_256490_m_00 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:642) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:95) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:984) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:978) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:128) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:619) 2012-12-18 01:06:35,888 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Invalid event T_ATTEMPT_FAILED on Task task_1354689281155_256490_m_00 2012-12-18 01:06:35,909 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1354689281155_256490Job Transitioned from RUNNING to ERROR {noformat} > Invalid TaskImpl state transitions when task fails while speculating > > > Key: MAPREDUCE-4890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.2-alpha, 0.23.5 >Reporter: Jason Lowe >Priority: Critical > > There are a couple of issues when a task fails while speculating (i.e.: > multiple attempts are active): > # The other active attempts are not killed. > # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which > can be sent from the other active attempts. These all need to be handled > since they can be sent asynchronously from the other active task attempts. > Failure to handle this properly means jobs that are configured to normally > tolerate failures via mapreduce.map.failures.maxpercent or > mapreduce.reduce.failures.maxpercent and also speculate can easily end up > failing due to invalid state transitions rather than complete successfully > with a few explicitly allowed task failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira