[ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=780507&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780507
 ]

ASF GitHub Bot logged work on HDFS-16600:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Jun/22 12:48
            Start Date: 11/Jun/22 12:48
    Worklog Time Spent: 10m 
      Work Description: slfan1989 commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152922067

   > Please refer to the stack:
   
   Thank you very much, I have understood that this is indeed a deadlock, 
because the same thread needs to use both a read lock and a write lock.
   
   > evictBlocks could not successfully acquire the write lock, since 
[createRBW_logic](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1588)
 holds the read lock of this block pool. And 
[createRBW_logic](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1588)
 is waiting for evictBlocks to finish. so it's deadlock.
   
   very good explanation.
   
   > I'm interested in this deadlock, can you provide a reproduction process? 
thanks~
   
   Thanks for your patience in explaining, this is my guess, now it looks like 
this won't happen(deadlock) because createRbw And addVolume won't be executed 
in the same thread, and createRbw And LazyWriter won't deadlock because they're 
not executed in one thread.
   
   LGTM +1
   
   > The last question is
   why we first add blockpool readlock, and then add volume write lock, how is 
the order of this lock derived?




Issue Time Tracking
-------------------

    Worklog Id:     (was: 780507)
    Time Spent: 4h  (was: 3h 50m)

> Deadlock on DataNode
> --------------------
>
>                 Key: HDFS-16600
>                 URL: https://issues.apache.org/jira/browse/HDFS-16600
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
>         b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to