[ 
https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495538#comment-14495538
 ] 

TezQA commented on TEZ-2310:
----------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12725444/TEZ-2310.1.patch
  against master revision 11b5843.

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/467//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/467//console

This message is automatically generated.

> AM Deadlock in VertexImpl
> -------------------------
>
>                 Key: TEZ-2310
>                 URL: https://issues.apache.org/jira/browse/TEZ-2310
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Bikas Saha
>             Fix For: 0.7.0
>
>         Attachments: TEZ-2310-0.patch, TEZ-2310.1.patch
>
>
> See the following deadlock in testing:
> Thread#1:
> {code}
> Daemon Thread [App Shared Pool - #3] (Suspended)      
>       owns: VertexManager$VertexManagerPluginContextImpl  (id=327)    
>       owns: ShuffleVertexManager  (id=328)    
>       owns: VertexManager  (id=329)   
>       waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326)     
>       
> VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate)
>  line: 344        
>       
> StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) 
> line: 138      
>       
> StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer,
>  VertexStateUpdate) line: 122    
>       StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) 
> line: 116   
>       StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 
> 106      
>       VertexImpl.maybeSendConfiguredEvent() line: 3385        
>       VertexImpl.doneReconfiguringVertex() line: 1634 
>       VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() 
> line: 339        
>       ShuffleVertexManager.schedulePendingTasks(int) line: 561        
>       ShuffleVertexManager.schedulePendingTasks() line: 620   
>       ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 
> 731       
>       ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744  
>       VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527  
>       VertexManager$VertexManagerEvent$1.run() line: 612      
>       VertexManager$VertexManagerEvent$1.run() line: 607      
>       AccessController.doPrivileged(PrivilegedExceptionAction<T>, 
> AccessControlContext) line: not available [native method]   
>       Subject.doAs(Subject, PrivilegedExceptionAction<T>) line: 415   
>       UserGroupInformation.doAs(PrivilegedExceptionAction<T>) line: 1548      
>       
> VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
>  line: 607  
>       
> VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call()
>  line: 596  
>       ListenableFutureTask<V>(FutureTask<V>).run() line: 262  
>       ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145      
>       ThreadPoolExecutor$Worker.run() line: 615       
>       Thread.run() line: 745  
> {code}
> Thread #2
> {code}
> Daemon Thread [App Shared Pool - #2] (Suspended)      
>       owns: VertexManager$VertexManagerPluginContextImpl  (id=326)    
>       owns: PigGraceShuffleVertexManager  (id=344)    
>       owns: VertexManager  (id=345)   
>       Unsafe.park(boolean, long) line: not available [native method]  
>       LockSupport.park(Object) line: 186      
>       
> ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
>  line: 834        
>       
> ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int)
>  line: 964   
>       
> ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int)
>  line: 1282    
>       ReentrantReadWriteLock$ReadLock.lock() line: 731        
>       VertexImpl.getTotalTasks() line: 952    
>       VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) 
> line: 162        
>       
> PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() 
> line: 435    
>       
> PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(Map<String,List<Integer>>)
>  line: 353 
>       VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541      
>       VertexManager$VertexManagerEvent$1.run() line: 612      
>       VertexManager$VertexManagerEvent$1.run() line: 607      
>       AccessController.doPrivileged(PrivilegedExceptionAction<T>, 
> AccessControlContext) line: not available [native method]   
>       Subject.doAs(Subject, PrivilegedExceptionAction<T>) line: 415   
>       UserGroupInformation.doAs(PrivilegedExceptionAction<T>) line: 1548      
>       
> VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
>  line: 607      
>       
> VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call()
>  line: 596      
>       ListenableFutureTask<V>(FutureTask<V>).run() line: 262  
>       ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145      
>       ThreadPoolExecutor$Worker.run() line: 615       
>       Thread.run() line: 745  
> {code}
> What happens is thread #1 holding a writeLock (VertexImpl:1628) and enter 
> into a synchronized block (ShuffleVertexManager.onVertexStateUpdated), in the 
> mean time, thread #2 already in the synchronized block 
> (ShuffleVertexManager.onVertexStarted) and try to get a 
> readLock(VertexImpl:952). Holding a lock and then enter a synchronized block 
> might be dangerous. 
> I attach a patch which avoiding that and then deadlock goes away. Not sure if 
> that is the right fix or if any other patterns like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to