[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537617#comment-14537617 ] Jeff Zhang commented on TEZ-2421: - Thanks [~bikassaha], Committed to master branch-0.7 Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch, TEZ-2421.4.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537609#comment-14537609 ] Jeff Zhang commented on TEZ-2421: - Although couldn't reproduce the deadlock issue in TestAMRecovery, the method that passing taskSpec taskLocation through TaskEventScheduleTask lgtm, +1, committing soon. Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch, TEZ-2421.4.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537054#comment-14537054 ] Bikas Saha commented on TEZ-2421: - I think T3 cannot proceed because the waiting writelock on T1 is going to prevent other readlocks from getting acquired (otherwise the writelock would starve in the present of a continuous stream of overlapping readlocks). I think recovery will be fine since during recovery everything is running on the central dispatcher and vertex managers are not running (since we dont support vertex manager recovery). I have run TestDAGRecovery and TestAMRecovery many times and there were no further issues. Before the workaround there were issues with them all the time. Yes, TEZ-1019 would provide a better fix. Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch, TEZ-2421.4.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536935#comment-14536935 ] TezQA commented on TEZ-2421: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12731754/TEZ-2421.3.patch against master revision ce69aa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/658//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/658//console This message is automatically generated. Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch, TEZ-2421.4.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536940#comment-14536940 ] TezQA commented on TEZ-2421: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12731755/TEZ-2421.4.patch against master revision ce69aa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/659//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/659//console This message is automatically generated. Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch, TEZ-2421.4.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536915#comment-14536915 ] Bikas Saha commented on TEZ-2421: - bq. I look at the jstack trace, not sure where's the deadlock. App Shared Pool - #1 try to acquire VertexImpl's writelock and no other thread has the writeblock except some thread also try to acquire the readlock Thread 1 has V1 readlock acquired and tries to acquire readlock on V2. Thread 2 wants to acquire writelock on V1 and is blocked because thread 1 has the readlock. Thread 3 has writelock on V2 and is trying to acquire readlock on V1 which is blocked due to the pending writelock on Thread 2. Thus the 3 threads have locked each other out. This will repro when TestAMRecovery is run in a loop or by running a large job with (specially with 1-1 edges) in a cluster in a loop. Attaching a patch that fixes the locking issues. Verified by running test AMRecovery etc. in a loop and a large job in the cluster in a loop. Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537013#comment-14537013 ] Jeff Zhang commented on TEZ-2421: - [~bikassaha] I guess you mean the following scenairo: bq. Thread 1 has V1 readlock acquired and tries to acquire readlock on V2. Thread 2 wants to acquire writelock on V1 and is blocked because thread 1 has the readlock. Thread 3 has writelock on V2 and is trying to acquire readlock on V1 which is blocked due to the pending writelock on Thread 2. || Thread || Owned || Try to acquire || | App Shared Pool - #1 (T1) | | Writelock of Vertex | | TaskSchedulerAppCaller - #0 (T2)| Readlock of Vertex/Task | Readlock of TaskAttempt | | Dispatcher thread:Central (T3) | Writelock of TaskAttempt | Readlock of Vertex | Still not sure why T3 can't continue, because T1 hasn't got the writelock of Vertex, should not block T3, right ? BTW, the patch may still cause issue in recovery. If it is in recovery, the following code in TaskAttempt will still try to acquire the readlock of Vertex, and produce the above scenario. But it is supposed can be fixed after TEZ-1019. {code} TaskSpec createRemoteTaskSpec() throws AMUserCodeException { TaskSpec baseTaskSpec = task.getBaseTaskSpec(); if (baseTaskSpec == null) { // since recovery does not follow normal transitions, TaskEventScheduleTask // is not being honored by the recovery code path. Using this to workaround // until recovery is fixed. Calling the non-locking internal method of the vertex // to get the taskSpec directly. Since everything happens on the central dispatcher // during recovery this is deadlock free for now. TEZ-1019 should remove the need for this. baseTaskSpec = ((VertexImpl) vertex).createRemoteTaskSpec(getID().getTaskID().getId()); } return new TaskSpec(getID(), baseTaskSpec.getDAGName(), baseTaskSpec.getVertexName(), baseTaskSpec.getVertexParallelism(), baseTaskSpec.getProcessorDescriptor(), baseTaskSpec.getInputs(), baseTaskSpec.getOutputs(), baseTaskSpec.getGroupInputs()); } {code} Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch, TEZ-2421.4.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535192#comment-14535192 ] Bikas Saha commented on TEZ-2421: - This is happening because recovery code path is directly manipulating the state changes of vertex/task/attempt instead of following the normal state transitions. TEZ-1019 is tracking this but has not yet been committed. I will try to fix this. Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535229#comment-14535229 ] Bikas Saha commented on TEZ-2421: - At this point, I will need to investigate the recovery logic further for a workaround/fix. Since this issue does not always happen, I suggest removing it as a blocker for 0.7.0 to enable the new API's to be consumed by other projects. We can follow up immediately with 0.7.1 with a specific fix for this issue. Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533734#comment-14533734 ] Bikas Saha commented on TEZ-2421: - The main issue is that the attempt takes a lock upwards into the vertex while vertex takes locks downwards into the attempt. One way has to be broken to prevent deadlock. The key culprits are getting the remoteTaskSpec and getting the taskLocation. Instead of the attempt up-calling into the vertex to get these after getting scheduled, the vertex is now sending these to the task when it schedules the task. [~zjffdu] [~sseth] [~hitesh] Please review. Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-2421.1.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533857#comment-14533857 ] TezQA commented on TEZ-2421: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12731352/TEZ-2421.3.patch against master revision 05f77fe. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestAMRecovery org.apache.tez.test.TestDAGRecovery Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/655//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/655//console This message is automatically generated. Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533771#comment-14533771 ] TezQA commented on TEZ-2421: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12731341/TEZ-2421.2.patch against master revision 05f77fe. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.dag.impl.TestDAGImpl Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/653//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/653//console This message is automatically generated. Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533904#comment-14533904 ] Jeff Zhang commented on TEZ-2421: - It cause the TestAMRecovery fail. {code} 2015-05-08 13:35:25,672 INFO [Dispatcher thread: Central] impl.VertexImpl: Source task attempt completed for vertex: vertex_1431063298340_0001_1_01 [v2] attempt: attempt_1431063298340_0001_1_00_00_0 with state: SUCCEEDED vertexState: RUNNING 2015-05-08 13:35:25,672 ERROR [Dispatcher thread: Central] common.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.createRemoteTaskSpec(TaskAttemptImpl.java:461) at org.apache.tez.dag.app.dag.impl.TaskAttemptImpl$ScheduleTaskattemptTransition.transition(TaskAttemptImpl.java:1012) at org.apache.tez.dag.app.dag.impl.TaskAttemptImpl$ScheduleTaskattemptTransition.transition(TaskAttemptImpl.java:1) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:673) at org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1) at org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:1920) at org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGAppMaster.java:1) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114) at java.lang.Thread.run(Thread.java:745) {code} Deadlock in AM because attempt and vertex locking each other out Key: TEZ-2421 URL: https://issues.apache.org/jira/browse/TEZ-2421 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Priority: Blocker Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch Ideally locks should be taken one way - either going down or up. Preferably not going up because most such data can be passed in during object construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2421) Deadlock in AM because attempt and vertex locking each other out
[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529679#comment-14529679 ] Bikas Saha commented on TEZ-2421: - App Shared Pool - #1 #102 daemon prio=5 os_prio=0 tid=0x02426000 nid=0x8bd waiting on condition [0x7fa2a841d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0006f58b09c0 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1389) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleVertexTasks(VertexManager.java:206) - locked 0x0006f58d2c08 (a org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl) at org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.handleSourceTaskFinished(InputReadyVertexManager.java:277) at org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onSourceTaskCompleted(InputReadyVertexManager.java:198) - locked 0x0006f58d2d90 (a org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventSourceTaskCompleted.invoke(VertexManager.java:601) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:656) - locked 0x0006f58d2d30 (a org.apache.tez.dag.app.dag.impl.VertexManager) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:651) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:651) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:640) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) TaskSchedulerAppCaller #0 #92 daemon prio=5 os_prio=0 tid=0x01884800 nid=0x8af waiting on condition [0x7fa2a9127000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007aa165038 (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.getState(TaskAttemptImpl.java:547) at org.apache.tez.dag.app.dag.impl.TaskImpl.selectBestAttempt(TaskImpl.java:715) at org.apache.tez.dag.app.dag.impl.TaskImpl.getProgress(TaskImpl.java:473) at org.apache.tez.dag.app.dag.impl.VertexImpl.computeProgress(VertexImpl.java:1179) at org.apache.tez.dag.app.dag.impl.VertexImpl.getProgress(VertexImpl.java:1117) at org.apache.tez.dag.app.dag.impl.DAGImpl.getProgress(DAGImpl.java:767) at org.apache.tez.dag.app.DAGAppMaster.getProgress(DAGAppMaster.java:1134) at org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.getProgress(TaskSchedulerEventHandler.java:556) at org.apache.tez.dag.app.rm.TaskSchedulerAppCallbackWrapper$GetProgressCallable.call(TaskSchedulerAppCallbackWrapper.java:291) at org.apache.tez.dag.app.rm.TaskSchedulerAppCallbackWrapper$GetProgressCallable.call(TaskSchedulerAppCallbackWrapper.java:282) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at