[ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495538#comment-14495538 ]
TezQA commented on TEZ-2310: ---------------------------- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725444/TEZ-2310.1.patch against master revision 11b5843. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/467//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/467//console This message is automatically generated. > AM Deadlock in VertexImpl > ------------------------- > > Key: TEZ-2310 > URL: https://issues.apache.org/jira/browse/TEZ-2310 > Project: Apache Tez > Issue Type: Bug > Reporter: Daniel Dai > Assignee: Bikas Saha > Fix For: 0.7.0 > > Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch > > > See the following deadlock in testing: > Thread#1: > {code} > Daemon Thread [App Shared Pool - #3] (Suspended) > owns: VertexManager$VertexManagerPluginContextImpl (id=327) > owns: ShuffleVertexManager (id=328) > owns: VertexManager (id=329) > waiting for: VertexManager$VertexManagerPluginContextImpl (id=326) > > VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) > line: 344 > > StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) > line: 138 > > StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, > VertexStateUpdate) line: 122 > StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) > line: 116 > StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: > 106 > VertexImpl.maybeSendConfiguredEvent() line: 3385 > VertexImpl.doneReconfiguringVertex() line: 1634 > VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() > line: 339 > ShuffleVertexManager.schedulePendingTasks(int) line: 561 > ShuffleVertexManager.schedulePendingTasks() line: 620 > ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: > 731 > ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744 > VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527 > VertexManager$VertexManagerEvent$1.run() line: 612 > VertexManager$VertexManagerEvent$1.run() line: 607 > AccessController.doPrivileged(PrivilegedExceptionAction<T>, > AccessControlContext) line: not available [native method] > Subject.doAs(Subject, PrivilegedExceptionAction<T>) line: 415 > UserGroupInformation.doAs(PrivilegedExceptionAction<T>) line: 1548 > > VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() > line: 607 > > VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() > line: 596 > ListenableFutureTask<V>(FutureTask<V>).run() line: 262 > ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 > ThreadPoolExecutor$Worker.run() line: 615 > Thread.run() line: 745 > {code} > Thread #2 > {code} > Daemon Thread [App Shared Pool - #2] (Suspended) > owns: VertexManager$VertexManagerPluginContextImpl (id=326) > owns: PigGraceShuffleVertexManager (id=344) > owns: VertexManager (id=345) > Unsafe.park(boolean, long) line: not available [native method] > LockSupport.park(Object) line: 186 > > ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() > line: 834 > > ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) > line: 964 > > ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) > line: 1282 > ReentrantReadWriteLock$ReadLock.lock() line: 731 > VertexImpl.getTotalTasks() line: 952 > VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) > line: 162 > > PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() > line: 435 > > PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(Map<String,List<Integer>>) > line: 353 > VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541 > VertexManager$VertexManagerEvent$1.run() line: 612 > VertexManager$VertexManagerEvent$1.run() line: 607 > AccessController.doPrivileged(PrivilegedExceptionAction<T>, > AccessControlContext) line: not available [native method] > Subject.doAs(Subject, PrivilegedExceptionAction<T>) line: 415 > UserGroupInformation.doAs(PrivilegedExceptionAction<T>) line: 1548 > > VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() > line: 607 > > VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() > line: 596 > ListenableFutureTask<V>(FutureTask<V>).run() line: 262 > ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145 > ThreadPoolExecutor$Worker.run() line: 615 > Thread.run() line: 745 > {code} > What happens is thread #1 holding a writeLock (VertexImpl:1628) and enter > into a synchronized block (ShuffleVertexManager.onVertexStateUpdated), in the > mean time, thread #2 already in the synchronized block > (ShuffleVertexManager.onVertexStarted) and try to get a > readLock(VertexImpl:952). Holding a lock and then enter a synchronized block > might be dangerous. > I attach a patch which avoiding that and then deadlock goes away. Not sure if > that is the right fix or if any other patterns like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)