[ https://issues.apache.org/jira/browse/HDFS-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicolas Spiegelberg updated HDFS-1056: -------------------------------------- Attachment: 0013-HDFS-1056.-Fix-possible-multinode-deadlocks-during-b.patch added Todd's fix for 0.20-append. no unit test yet > Multi-node RPC deadlocks during block recovery > ---------------------------------------------- > > Key: HDFS-1056 > URL: https://issues.apache.org/jira/browse/HDFS-1056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node > Affects Versions: 0.20.2, 0.21.0, 0.22.0 > Reporter: Todd Lipcon > Fix For: 0.20-append > > Attachments: > 0013-HDFS-1056.-Fix-possible-multinode-deadlocks-during-b.patch > > > Believe it or not, I'm seeing HADOOP-3657 / HADOOP-3673 in a 5-node 0.20 > cluster. I have many concurrent writes on the cluster, and when I kill a DN, > some percentage of the time I get one of these cross-node deadlocks among 3 > of the nodes (replication 3). All of the DN RPC server threads are tied up > waiting on RPC clients to other datanodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.