[ https://issues.apache.org/jira/browse/TEZ-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533815#comment-14533815 ]
Bikas Saha commented on TEZ-2429: --------------------------------- The main issue though, is whether the AM does not shutdown after the InternalError. If it shuts down then this should not be a blocker for 0.7.0. > Tez AM does not die after hitting internal error > ------------------------------------------------- > > Key: TEZ-2429 > URL: https://issues.apache.org/jira/browse/TEZ-2429 > Project: Apache Tez > Issue Type: Bug > Reporter: Hitesh Shah > Priority: Blocker > Attachments: syslog_dag_1430956448478_0001_16_post, > syslog_dag_1430956448478_0001_17 > > > From https://builds.apache.org/job/Tez-Build/1055/: > 2015-05-06 23:55:54,421 ERROR [Dispatcher thread: Central] impl.DAGImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > DAG_VERTEX_RERUNNING at SUCCEEDED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) > at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079) > at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143) > at > org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871) > at > org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862) > at > org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) > at java.lang.Thread.run(Thread.java:662) > 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: > Cleaning up DAG: name=testRandomFailingInputs, with > id=dag_1430956448478_0001_16 > 2015-05-06 23:55:54,423 INFO [Dispatcher thread: Central] app.DAGAppMaster: > Completed cleanup for DAG: name=testRandomFailingInputs, with > id=dag_1430956448478_0001_16 > 2015-05-06 23:55:54,424 INFO [Dispatcher thread: Central] impl.DAGImpl: > dag_1430956448478_0001_16 terminating due to internal error > 2015-05-06 23:55:54,433 INFO [IPC Server handler 0 on 47432] > app.DAGAppMaster: Starting DAG submitted via RPC: > testBasicInputFailureWithExit > 2015-05-06 23:55:54,455 ERROR [Dispatcher thread: Central] > common.AsyncDispatcher: Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.tez.dag.history.recovery.RecoveryService.doFlush(RecoveryService.java:458) > at > org.apache.tez.dag.history.recovery.RecoveryService.handle(RecoveryService.java:289) > at > org.apache.tez.dag.history.HistoryEventHandler.handleCriticalEvent(HistoryEventHandler.java:102) > at > org.apache.tez.dag.app.dag.impl.DAGImpl.logJobHistoryUnsuccesfulEvent(DAGImpl.java:1161) > at org.apache.tez.dag.app.dag.impl.DAGImpl.finished(DAGImpl.java:1275) > at org.apache.tez.dag.app.dag.impl.DAGImpl.access$2600(DAGImpl.java:144) > at > org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2151) > at > org.apache.tez.dag.app.dag.impl.DAGImpl$InternalErrorTransition.transition(DAGImpl.java:2140) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) > at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:1079) > at org.apache.tez.dag.app.dag.impl.DAGImpl.handle(DAGImpl.java:143) > at > org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1871) > at > org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher.handle(DAGAppMaster.java:1862) > at > org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) > at java.lang.Thread.run(Thread.java:662) > 2015-05-06 23:55:54,456 INFO [Dispatcher thread: Central] impl.VertexImpl: > Killing tasks in vertex: vertex_1430956448478_0001_16_10 [l4v1] due to > trigger: INTERNAL_ERROR > 2015-05-06 23:55:54,456 INFO [Dispatcher thread: Central] impl.VertexImpl: > vertex_1430956448478_0001_16_10 [l4v1] transitioned from RUNNING to > TERMINATING due to event V_TERMINATE > 2015-05-06 23:55:54,456 INFO [AsyncDispatcher ShutDown handler] > common.AsyncDispatcher: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)