[jira] [Commented] (TEZ-1560) Invalid state machine transition in recovery

2015-04-03 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395421#comment-14395421
 ] 

Carter Shanklin commented on TEZ-1560:
--

Here's what I did:

* Using the Hortonworks Sandbox based on HDP 2.2.3.
* Kicked off a Hive query and waited for some mappers to start running
* Ran tc qdisc add dev lo root netem loss 66% This causes 66% packet loss on 
loopback so we can expect a lot of strange failures to start happening.
* Waited about 5 minutes
* Ran tc qdisc delete dev lo root netem loss 66% So now there is no packet 
loss
* After about a minute or so the job failed with below error:

{code}
Status: Failed
Invalid event V_INTERNAL_ERROR on Vertex vertex_1427920581283_0018_12_01
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask
{code}

 Invalid state machine transition in recovery
 

 Key: TEZ-1560
 URL: https://issues.apache.org/jira/browse/TEZ-1560
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Critical
 Attachments: failed_tez_job.txt.gz


 {code}
 2014-09-04 16:08:25,504 INFO [main] org.apache.tez.dag.app.dag.impl.DAGImpl: 
 dag_1409818083015_0001_1 transitioned from NEW to RUNNING
 2014-09-04 16:08:25,504 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, 
 vertexId=vertex_1409818083015_0001_1_00 [v1], state=NEW, 
 numInitedSourceVertices=0, numStartedSourceVertices=0, 
 numRecoveredSourceVertices=0, recoveredEvents=0, tasksIsNull=false, numTasks=0
 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Root Inputs exist for Vertex: v1 
 : {Input={InputName=Input}, 
 {Descriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$NoOpInput, 
 hasPayload=false}, 
 {ControllerDescriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer,
  hasPayload=false}}
 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializer 
 for input: Input, with class: 
 [org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer]
 2014-09-04 16:08:25,506 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Setting user vertex manager 
 plugin: 
 org.apache.tez.test.dag.MultiAttemptDAG$FailOnAttemptVertexManagerPlugin on 
 vertex: v1
 2014-09-04 16:08:25,508 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Creating 2 for vertex: 
 vertex_1409818083015_0001_1_00 [v1]
 2014-09-04 16:08:25,518 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializers: 
 1
 2014-09-04 16:08:25,520 INFO [InputInitializer [v1] #0] 
 org.apache.tez.dag.app.dag.RootInputInitializerManager: Starting 
 InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 
 [v1]
 2014-09-04 16:08:25,522 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.RootInputInitializerManager: Succeeded 
 InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 
 [v1]
 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: vertex_1409818083015_0001_1_00 
 [v1] transitioned from NEW to INITIALIZING due to event V_INIT
 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, 
 vertexId=vertex_1409818083015_0001_1_01 [v2], state=NEW, 
 numInitedSourceVertices0, numStartedSourceVertices=0, 
 numRecoveredSourceVertices=1, tasksIsNull=false, numTasks=0
 2014-09-04 16:08:25,523 ERROR [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event 
 V_SOURCE_VERTEX_RECOVERED on vertex v2 with vertexId 
 vertex_1409818083015_0001_1_01 at current state NEW
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 V_SOURCE_VERTEX_RECOVERED at NEW
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1344)
   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1641)
   at 
 

[jira] [Commented] (TEZ-1560) Invalid state machine transition in recovery

2015-04-03 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395398#comment-14395398
 ] 

Carter Shanklin commented on TEZ-1560:
--

I hit this too while simulating a network failure, Tez 0.5.2. Ping me offline 
for details if you want more.

 Invalid state machine transition in recovery
 

 Key: TEZ-1560
 URL: https://issues.apache.org/jira/browse/TEZ-1560
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Critical

 {code}
 2014-09-04 16:08:25,504 INFO [main] org.apache.tez.dag.app.dag.impl.DAGImpl: 
 dag_1409818083015_0001_1 transitioned from NEW to RUNNING
 2014-09-04 16:08:25,504 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, 
 vertexId=vertex_1409818083015_0001_1_00 [v1], state=NEW, 
 numInitedSourceVertices=0, numStartedSourceVertices=0, 
 numRecoveredSourceVertices=0, recoveredEvents=0, tasksIsNull=false, numTasks=0
 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Root Inputs exist for Vertex: v1 
 : {Input={InputName=Input}, 
 {Descriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$NoOpInput, 
 hasPayload=false}, 
 {ControllerDescriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer,
  hasPayload=false}}
 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializer 
 for input: Input, with class: 
 [org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer]
 2014-09-04 16:08:25,506 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Setting user vertex manager 
 plugin: 
 org.apache.tez.test.dag.MultiAttemptDAG$FailOnAttemptVertexManagerPlugin on 
 vertex: v1
 2014-09-04 16:08:25,508 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Creating 2 for vertex: 
 vertex_1409818083015_0001_1_00 [v1]
 2014-09-04 16:08:25,518 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializers: 
 1
 2014-09-04 16:08:25,520 INFO [InputInitializer [v1] #0] 
 org.apache.tez.dag.app.dag.RootInputInitializerManager: Starting 
 InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 
 [v1]
 2014-09-04 16:08:25,522 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.RootInputInitializerManager: Succeeded 
 InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 
 [v1]
 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: vertex_1409818083015_0001_1_00 
 [v1] transitioned from NEW to INITIALIZING due to event V_INIT
 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, 
 vertexId=vertex_1409818083015_0001_1_01 [v2], state=NEW, 
 numInitedSourceVertices0, numStartedSourceVertices=0, 
 numRecoveredSourceVertices=1, tasksIsNull=false, numTasks=0
 2014-09-04 16:08:25,523 ERROR [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event 
 V_SOURCE_VERTEX_RECOVERED on vertex v2 with vertexId 
 vertex_1409818083015_0001_1_01 at current state NEW
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 V_SOURCE_VERTEX_RECOVERED at NEW
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1344)
   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1641)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2014-09-04 16:08:25,524 FATAL [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1560) Invalid state machine transition in recovery

2015-04-03 Thread Carter Shanklin (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter Shanklin updated TEZ-1560:
-
Attachment: failed_tez_job.txt.gz

Logs of the failed job.

 Invalid state machine transition in recovery
 

 Key: TEZ-1560
 URL: https://issues.apache.org/jira/browse/TEZ-1560
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Critical
 Attachments: failed_tez_job.txt.gz


 {code}
 2014-09-04 16:08:25,504 INFO [main] org.apache.tez.dag.app.dag.impl.DAGImpl: 
 dag_1409818083015_0001_1 transitioned from NEW to RUNNING
 2014-09-04 16:08:25,504 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, 
 vertexId=vertex_1409818083015_0001_1_00 [v1], state=NEW, 
 numInitedSourceVertices=0, numStartedSourceVertices=0, 
 numRecoveredSourceVertices=0, recoveredEvents=0, tasksIsNull=false, numTasks=0
 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Root Inputs exist for Vertex: v1 
 : {Input={InputName=Input}, 
 {Descriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$NoOpInput, 
 hasPayload=false}, 
 {ControllerDescriptor=ClassName=org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer,
  hasPayload=false}}
 2014-09-04 16:08:25,505 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializer 
 for input: Input, with class: 
 [org.apache.tez.test.dag.MultiAttemptDAG$TestRootInputInitializer]
 2014-09-04 16:08:25,506 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Setting user vertex manager 
 plugin: 
 org.apache.tez.test.dag.MultiAttemptDAG$FailOnAttemptVertexManagerPlugin on 
 vertex: v1
 2014-09-04 16:08:25,508 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Creating 2 for vertex: 
 vertex_1409818083015_0001_1_00 [v1]
 2014-09-04 16:08:25,518 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Starting root input initializers: 
 1
 2014-09-04 16:08:25,520 INFO [InputInitializer [v1] #0] 
 org.apache.tez.dag.app.dag.RootInputInitializerManager: Starting 
 InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 
 [v1]
 2014-09-04 16:08:25,522 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.RootInputInitializerManager: Succeeded 
 InputInitializer for Input: Input on vertex vertex_1409818083015_0001_1_00 
 [v1]
 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: vertex_1409818083015_0001_1_00 
 [v1] transitioned from NEW to INITIALIZING due to event V_INIT
 2014-09-04 16:08:25,523 INFO [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Recovered Vertex State, 
 vertexId=vertex_1409818083015_0001_1_01 [v2], state=NEW, 
 numInitedSourceVertices0, numStartedSourceVertices=0, 
 numRecoveredSourceVertices=1, tasksIsNull=false, numTasks=0
 2014-09-04 16:08:25,523 ERROR [AsyncDispatcher event handler] 
 org.apache.tez.dag.app.dag.impl.VertexImpl: Can't handle Invalid event 
 V_SOURCE_VERTEX_RECOVERED on vertex v2 with vertexId 
 vertex_1409818083015_0001_1_01 at current state NEW
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 V_SOURCE_VERTEX_RECOVERED at NEW
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1344)
   at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1641)
   at 
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 2014-09-04 16:08:25,524 FATAL [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)