[ 
https://issues.apache.org/jira/browse/TEZ-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533776#comment-14533776
 ] 

Jeff Zhang edited comment on TEZ-2429 at 5/8/15 2:47 AM:
---------------------------------------------------------

Can reproduce the InvalidTransition in TestFaultTolerance, looking at the cause
{code}
2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't 
handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
DAG_VERTEX_RERUNNING at SUCCEEDED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:662)
{code}


was (Author: zjffdu):
Can produce the InvalidTransition in TestFaultTolerance, looking at the cause
{code}
2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: Can't 
handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
DAG_VERTEX_RERUNNING at SUCCEEDED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
at 
org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:662)
{code}

> Tez AM does not die after hitting internal error 
> -------------------------------------------------
>
>                 Key: TEZ-2429
>                 URL: https://issues.apache.org/jira/browse/TEZ-2429
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Priority: Blocker
>         Attachments: syslog_dag_1430956448478_0001_16_post, 
> syslog_dag_1430956448478_0001_17
>
>
> From https://builds.apache.org/job/Tez-Build/1055/: 
> 2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> DAG_VERTEX_RERUNNING at SUCCEEDED
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>       at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
>       at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
>       at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
>       at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
>       at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>       at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
>       at java.lang.Thread.run(Thread.java:662)
> 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
> Cleaning up DAG: name=testRandomFailingInputs, with 
> id=dag_1430956448478_0001_16
> 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: 
> Completed cleanup for DAG: name=testRandomFailingInputs, with 
> id=dag_1430956448478_0001_16
> 2015-05-06 23:55:54,424 INFO [Dispatcher thread: Central] impl.DAGImpl: 
> dag_1430956448478_0001_16 terminating due to internal error
> 2015-05-06 23:55:54,433 INFO [IPC Server handler 0 on 47432] 
> app.DAGAppMaster: Starting DAG submitted via RPC: 
> testBasicInputFailureWithExit
> 2015-05-06 23:55:54,455 ERROR [Dispatcher thread: Central] 
> common.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>       at 
> org.apache.tez.dag.history.recovery.RecoveryService.doFlush(RecoveryService.java:458)
>       at 
> org.apache.tez.dag.history.recovery.RecoveryService.handle(RecoveryService.java:289)
>       at 
> org.apache.tez.dag.history.HistoryEventHandler.handleCriticalEvent(HistoryEventHandler.java:102)
>       at 
> org.apache.tez.dag.app.dag.impl.DAGImpl.logJobHistoryUnsuccesfulEvent(DAGImpl.java:1161)
>       at org.apache.tez.dag.app.dag.impl.DAGImpl.finished(DAGImpl.java:1275)
>       at org.apache.tez.dag.app.dag.impl.DAGImpl.access$2600(DAGImpl.java:144)
>       at 
> org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2151)
>       at 
> org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2140)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>       at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079)
>       at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143)
>       at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871)
>       at 
> org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862)
>       at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>       at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
>       at java.lang.Thread.run(Thread.java:662)
> 2015-05-06 23:55:54,456 INFO [Dispatcher thread: Central] impl.VertexImpl: 
> Killing tasks in vertex: vertex_1430956448478_0001_16_10 [l4v1] due to 
> trigger: INTERNAL_ERROR
> 2015-05-06 23:55:54,456 INFO [Dispatcher thread: Central] impl.VertexImpl: 
> vertex_1430956448478_0001_16_10 [l4v1] transitioned from RUNNING to 
> TERMINATING due to event V_TERMINATE
> 2015-05-06 23:55:54,456 INFO [AsyncDispatcher ShutDown handler] 
> common.AsyncDispatcher: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to