[jira] [Commented] (HDFS-5832) Deadlock found in NN between SafeMode#canLeave and DatanodeManager#handleHeartbeat

Vinay (JIRA) Sat, 25 Jan 2014 21:53:02 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882189#comment-13882189
 ]


Vinay commented on HDFS-5832:
-----------------------------

As mentioned in HDFS-5132, 
Moving SafemodeMonitor#run() checks under fsn write lock, will solve the issue. 

1. handleHeartbeat() is always done under fsn readlock
2. incrementSafeBlockCount() and getNumLivedatanodes() will always will be 
called under writeLock().

By directly seeing the synchronization order it appears to be deadlock. But its 
avoided by the fsn lock.
 I think jcarder will not identify the read-write lock mechanism.

For this reason only I have made HDFS-5368 duplicate of HDFS-5132

> Deadlock found in NN between SafeMode#canLeave and 
> DatanodeManager#handleHeartbeat
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-5832
>                 URL: https://issues.apache.org/jira/browse/HDFS-5832
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Blocker
>         Attachments: HDFS-5832.patch, jcarder_nn_deadlock.gif
>
>
> Found the deadlock during the Namenode startup. Attached jcarder report which 
> shows the cycles about the deadlock situation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5832) Deadlock found in NN between SafeMode#canLeave and DatanodeManager#handleHeartbeat

Reply via email to