[ https://issues.apache.org/jira/browse/TEZ-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324981#comment-15324981 ]
Bikas Saha commented on TEZ-3297: --------------------------------- I am not sure we can simply remove the lock since it may affect visibility. Also the assumption that task count wont change may be inaccurate in the future. With progressive creation of splits task count may change with time. Similarly input output specs are theoretically pluggable and different per task. Lets be cautious wrt these future features when fixing this issue else we may forget about it later on. A deadlock could sometimes be better than wrong results :) > Deadlock scenario in AM during ShuffleVertexManager auto reduce > --------------------------------------------------------------- > > Key: TEZ-3297 > URL: https://issues.apache.org/jira/browse/TEZ-3297 > Project: Apache Tez > Issue Type: Bug > Reporter: Zhiyuan Yang > Priority: Critical > Attachments: TEZ-3297.1.patch, TEZ-3297.2.patch, am_log, thread_dump > > > Here is what's happening in the attached thread dump. > App Pool thread #9 does the auto reduce on V2 and initializes the new edge > manager, it holds the V2 write lock and wants read lock of source vertex V1. > At the same time, another App Pool thread #2 schedules a task of V1 and gets > the output spec, so it holds the V1 read lock and wants V2 read lock. > Also, dispatcher thread wants the V1 write lock to begin the state machine > transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, > thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock. > This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, > and #9 blocks #2. > There is no problem with ReadWriteLock behavior in this case. Please see this > java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565. -- This message was sent by Atlassian JIRA (v6.3.4#6332)