[ https://issues.apache.org/jira/browse/TEZ-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600667#comment-14600667 ]
Jeff Zhang edited comment on TEZ-2576 at 6/25/15 4:27 AM: ---------------------------------------------------------- This might cause state machine error when node failure happens when AM is IDLE {code} 2015-06-25 12:13:02,419 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: DAG_VERTEX_RERUNNING at SUCCEEDED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1090) at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1) at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1924) at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) at java.lang.Thread.run(Thread.java:745) {code} was (Author: zjffdu): This might cause state machine error when node failure happens when AM Is the IDLE {code} 2015-06-25 12:13:02,419 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: DAG_VERTEX_RERUNNING at SUCCEEDED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1090) at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1) at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1924) at org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) at java.lang.Thread.run(Thread.java:745) {code} > It is not necessary to send NodeFailureEvent to task attempt of completed DAG > ----------------------------------------------------------------------------- > > Key: TEZ-2576 > URL: https://issues.apache.org/jira/browse/TEZ-2576 > Project: Apache Tez > Issue Type: Bug > Reporter: Jeff Zhang > > When node fails, it would send NodeFailureEvent to all the task attempts on > this node. It is not necessary to send this to the task attempts that belong > to the completed dags. > {code} > for (TezTaskAttemptID taId : container.failedAssignments) { > container.sendNodeFailureToTA(taId, errorMessage, > TaskAttemptTerminationCause.NODE_FAILED); > } > for (TezTaskAttemptID taId : container.completedAttempts) { > container.sendNodeFailureToTA(taId, errorMessage, > TaskAttemptTerminationCause.NODE_FAILED); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)