[ https://issues.apache.org/jira/browse/HDFS-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884219#action_12884219 ]
dhruba borthakur commented on HDFS-1056: ---------------------------------------- This fix could impact other code paths too, especially since the DN comparision is used by many code paths. Maybe a unit test would be good. also, does this problem exist in trunk? > Multi-node RPC deadlocks during block recovery > ---------------------------------------------- > > Key: HDFS-1056 > URL: https://issues.apache.org/jira/browse/HDFS-1056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node > Affects Versions: 0.20.2, 0.21.0, 0.22.0 > Reporter: Todd Lipcon > Fix For: 0.20-append > > Attachments: > 0013-HDFS-1056.-Fix-possible-multinode-deadlocks-during-b.patch > > > Believe it or not, I'm seeing HADOOP-3657 / HADOOP-3673 in a 5-node 0.20 > cluster. I have many concurrent writes on the cluster, and when I kill a DN, > some percentage of the time I get one of these cross-node deadlocks among 3 > of the nodes (replication 3). All of the DN RPC server threads are tied up > waiting on RPC clients to other datanodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.