[ https://issues.apache.org/jira/browse/TEZ-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536915#comment-14536915 ]
Bikas Saha commented on TEZ-2421: --------------------------------- bq. I look at the jstack trace, not sure where's the deadlock. App Shared Pool - #1 try to acquire VertexImpl's writelock and no other thread has the writeblock except some thread also try to acquire the readlock Thread 1 has V1 readlock acquired and tries to acquire readlock on V2. Thread 2 wants to acquire writelock on V1 and is blocked because thread 1 has the readlock. Thread 3 has writelock on V2 and is trying to acquire readlock on V1 which is blocked due to the pending writelock on Thread 2. Thus the 3 threads have locked each other out. This will repro when TestAMRecovery is run in a loop or by running a large job with (specially with 1-1 edges) in a cluster in a loop. Attaching a patch that fixes the locking issues. Verified by running test AMRecovery etc. in a loop and a large job in the cluster in a loop. > Deadlock in AM because attempt and vertex locking each other out > ---------------------------------------------------------------- > > Key: TEZ-2421 > URL: https://issues.apache.org/jira/browse/TEZ-2421 > Project: Apache Tez > Issue Type: Bug > Reporter: Bikas Saha > Assignee: Bikas Saha > Priority: Blocker > Attachments: TEZ-2421.1.patch, TEZ-2421.2.patch, TEZ-2421.3.patch > > > Ideally locks should be taken one way - either going down or up. Preferably > not going up because most such data can be passed in during object > construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)