[ 
https://issues.apache.org/jira/browse/TEZ-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600667#comment-14600667
 ] 

Jeff Zhang edited comment on TEZ-2576 at 6/25/15 4:27 AM:
----------------------------------------------------------

This might cause state machine error when node failure happens when AM is IDLE

{code}
2015-06-25 12:13:02,419 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't 
handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
DAG_VERTEX_RERUNNING at SUCCEEDED
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1090)
        at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1)
        at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1924)
        at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1)
        at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
{code}


was (Author: zjffdu):
This might cause state machine error when node failure happens when AM Is the 
IDLE

{code}
2015-06-25 12:13:02,419 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't 
handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
DAG_VERTEX_RERUNNING at SUCCEEDED
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
        at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1090)
        at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1)
        at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1924)
        at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1)
        at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
{code}

> It is not necessary to send NodeFailureEvent to task attempt of completed DAG
> -----------------------------------------------------------------------------
>
>                 Key: TEZ-2576
>                 URL: https://issues.apache.org/jira/browse/TEZ-2576
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>
> When node fails, it would send NodeFailureEvent to all the task attempts on 
> this node. It is not necessary to send this to the task attempts that belong 
> to the completed dags. 
> {code}
>  for (TezTaskAttemptID taId : container.failedAssignments) {
>         container.sendNodeFailureToTA(taId, errorMessage, 
> TaskAttemptTerminationCause.NODE_FAILED);
>       }
>       for (TezTaskAttemptID taId : container.completedAttempts) {
>         container.sendNodeFailureToTA(taId, errorMessage, 
> TaskAttemptTerminationCause.NODE_FAILED);
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to