[jira] [Commented] (TEZ-1560) Invalid state machine transition in recovery
[ https://issues.apache.org/jira/browse/TEZ-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395421#comment-14395421 ] Carter Shanklin commented on TEZ-1560: -- Here's what I did: * Using the Hortonworks Sandbox based on HDP 2.2.3. * Kicked off a Hive query and waited for some mappers to start running * Ran tc qdisc add dev lo root netem loss 66% This causes 66% packet loss on loopback so we can expect a lot of strange failures to start happening. * Waited about 5 minutes * Ran tc qdisc delete dev lo root netem loss 66% So now there is no packet loss * After about a minute or so the job failed with below error: {code} Status: Failed Invalid event V_INTERNAL_ERROR on Vertex vertex_1427920581283_0018_12_01 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask {code} Invalid state machine transition in recovery Key: TEZ-1560 URL: https://issues.apache.org/jira/browse/TEZ-1560 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Critical Attachments: failed_tez_job.txt.gz {code} 2014-09-04 16:08:25,504 INFO [main] org.apache.tez.dag.app.dag.impl.DAGImpl: dag_1409818083015_0001_1 transitioned from NEW to RUNNING 2014-09-04 16:08:25,504 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, vertexId=vertex_1409818083015_0001_1_00 [v1], state=NEW, numInitedSourceVertices=0, numStartedSourceVertices=0, numRecoveredSourceVertices=0, recoveredEvents=0, tasksIsNull=false, numTasks=0 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Root Inputs exist for Vertex: v1 : {Input={InputName=Input}, {Descriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$NoOpInput, hasPayload=false}, {ControllerDescriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer, hasPayload=false}} 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializer for input: Input, with class: [org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer] 2014-09-04 16:08:25,506 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Setting user vertex manager plugin: org.apache.tez.test.dag.MultiAttemptDAG$FailOnAttemptVertexManagerPlugin on vertex: v1 2014-09-04 16:08:25,508 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Creating 2 for vertex: vertex_1409818083015_0001_1_00 [v1] 2014-09-04 16:08:25,518 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializers: 1 2014-09-04 16:08:25,520 INFO [InputInitializer [v1] #0] org.apache.tez.dag.app.dag.RootInputInitializerManager: Starting InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 [v1] 2014-09-04 16:08:25,522 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.RootInputInitializerManager: Succeeded InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 [v1] 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: vertex_1409818083015_0001_1_00 [v1] transitioned from NEW to INITIALIZING due to event V_INIT 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, vertexId=vertex_1409818083015_0001_1_01 [v2], state=NEW, numInitedSourceVertices0, numStartedSourceVertices=0, numRecoveredSourceVertices=1, tasksIsNull=false, numTasks=0 2014-09-04 16:08:25,523 ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event V_SOURCE_VERTEX_RECOVERED on vertex v2 with vertexId vertex_1409818083015_0001_1_01 at current state NEW org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_SOURCE_VERTEX_RECOVERED at NEW at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1344) at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1) at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1641) at
[jira] [Commented] (TEZ-1560) Invalid state machine transition in recovery
[ https://issues.apache.org/jira/browse/TEZ-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395398#comment-14395398 ] Carter Shanklin commented on TEZ-1560: -- I hit this too while simulating a network failure, Tez 0.5.2. Ping me offline for details if you want more. Invalid state machine transition in recovery Key: TEZ-1560 URL: https://issues.apache.org/jira/browse/TEZ-1560 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Critical {code} 2014-09-04 16:08:25,504 INFO [main] org.apache.tez.dag.app.dag.impl.DAGImpl: dag_1409818083015_0001_1 transitioned from NEW to RUNNING 2014-09-04 16:08:25,504 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, vertexId=vertex_1409818083015_0001_1_00 [v1], state=NEW, numInitedSourceVertices=0, numStartedSourceVertices=0, numRecoveredSourceVertices=0, recoveredEvents=0, tasksIsNull=false, numTasks=0 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Root Inputs exist for Vertex: v1 : {Input={InputName=Input}, {Descriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$NoOpInput, hasPayload=false}, {ControllerDescriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer, hasPayload=false}} 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializer for input: Input, with class: [org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer] 2014-09-04 16:08:25,506 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Setting user vertex manager plugin: org.apache.tez.test.dag.MultiAttemptDAG$FailOnAttemptVertexManagerPlugin on vertex: v1 2014-09-04 16:08:25,508 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Creating 2 for vertex: vertex_1409818083015_0001_1_00 [v1] 2014-09-04 16:08:25,518 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializers: 1 2014-09-04 16:08:25,520 INFO [InputInitializer [v1] #0] org.apache.tez.dag.app.dag.RootInputInitializerManager: Starting InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 [v1] 2014-09-04 16:08:25,522 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.RootInputInitializerManager: Succeeded InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 [v1] 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: vertex_1409818083015_0001_1_00 [v1] transitioned from NEW to INITIALIZING due to event V_INIT 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, vertexId=vertex_1409818083015_0001_1_01 [v2], state=NEW, numInitedSourceVertices0, numStartedSourceVertices=0, numRecoveredSourceVertices=1, tasksIsNull=false, numTasks=0 2014-09-04 16:08:25,523 ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event V_SOURCE_VERTEX_RECOVERED on vertex v2 with vertexId vertex_1409818083015_0001_1_01 at current state NEW org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_SOURCE_VERTEX_RECOVERED at NEW at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1344) at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1) at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1641) at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-09-04 16:08:25,524 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-1560) Invalid state machine transition in recovery
[ https://issues.apache.org/jira/browse/TEZ-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carter Shanklin updated TEZ-1560: - Attachment: failed_tez_job.txt.gz Logs of the failed job. Invalid state machine transition in recovery Key: TEZ-1560 URL: https://issues.apache.org/jira/browse/TEZ-1560 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Critical Attachments: failed_tez_job.txt.gz {code} 2014-09-04 16:08:25,504 INFO [main] org.apache.tez.dag.app.dag.impl.DAGImpl: dag_1409818083015_0001_1 transitioned from NEW to RUNNING 2014-09-04 16:08:25,504 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, vertexId=vertex_1409818083015_0001_1_00 [v1], state=NEW, numInitedSourceVertices=0, numStartedSourceVertices=0, numRecoveredSourceVertices=0, recoveredEvents=0, tasksIsNull=false, numTasks=0 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Root Inputs exist for Vertex: v1 : {Input={InputName=Input}, {Descriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$NoOpInput, hasPayload=false}, {ControllerDescriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer, hasPayload=false}} 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializer for input: Input, with class: [org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer] 2014-09-04 16:08:25,506 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Setting user vertex manager plugin: org.apache.tez.test.dag.MultiAttemptDAG$FailOnAttemptVertexManagerPlugin on vertex: v1 2014-09-04 16:08:25,508 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Creating 2 for vertex: vertex_1409818083015_0001_1_00 [v1] 2014-09-04 16:08:25,518 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializers: 1 2014-09-04 16:08:25,520 INFO [InputInitializer [v1] #0] org.apache.tez.dag.app.dag.RootInputInitializerManager: Starting InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 [v1] 2014-09-04 16:08:25,522 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.RootInputInitializerManager: Succeeded InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 [v1] 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: vertex_1409818083015_0001_1_00 [v1] transitioned from NEW to INITIALIZING due to event V_INIT 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, vertexId=vertex_1409818083015_0001_1_01 [v2], state=NEW, numInitedSourceVertices0, numStartedSourceVertices=0, numRecoveredSourceVertices=1, tasksIsNull=false, numTasks=0 2014-09-04 16:08:25,523 ERROR [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event V_SOURCE_VERTEX_RECOVERED on vertex v2 with vertexId vertex_1409818083015_0001_1_01 at current state NEW org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_SOURCE_VERTEX_RECOVERED at NEW at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1344) at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1) at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1641) at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-09-04 16:08:25,524 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)