[ 
https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536915#comment-14536915
 ] 

Bikas Saha commented on TEZ-2421:
---------------------------------

bq. I look at the jstack trace, not sure where's the deadlock. App Shared Pool 
- #1 try to acquire VertexImpl's writelock and no other thread has the 
writeblock except some thread also try to acquire the readlock
Thread 1 has V1 readlock acquired and tries to acquire readlock on V2. Thread 2 
wants to acquire writelock on V1 and is blocked because thread 1 has the 
readlock. Thread 3 has writelock on V2 and is trying to acquire readlock on V1 
which is blocked due to the pending writelock on Thread 2. Thus the 3 threads 
have locked each other out. This will repro when TestAMRecovery is run in a 
loop or by running a large job with (specially with 1-1 edges) in a cluster in 
a loop.

Attaching a patch that fixes the locking issues. Verified by running test 
AMRecovery etc. in a loop and a large job in the cluster in a loop.

> Deadlock in AM because attempt and vertex locking each other out
> ----------------------------------------------------------------
>
>                 Key: TEZ-2421
>                 URL: https://issues.apache.org/jira/browse/TEZ-2421
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>            Priority: Blocker
>         Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch
>
>
> Ideally locks should be taken one way - either going down or up. Preferably 
> not going up because most such data can be passed in during object 
> construction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to