[ https://issues.apache.org/jira/browse/HADOOP-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624727#action_12624727 ]
Hudson commented on HADOOP-3673: -------------------------------- Integrated in Hadoop-trunk #581 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/]) > Deadlock in Datanode RPC servers > -------------------------------- > > Key: HADOOP-3673 > URL: https://issues.apache.org/jira/browse/HADOOP-3673 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.18.0 > Reporter: dhruba borthakur > Assignee: Tsz Wo (Nicholas), SZE > Priority: Blocker > Fix For: 0.18.0 > > Attachments: 3673_20080702.patch, 3673_20080702b.patch, > 3673_20080702c.patch, 3673_20080702d.patch, 3673_20080702e.patch, > 3673_20080707.patch, 3673_20080707b.patch, 3673_20080707b_0.18.patch > > > There is a deadlock scenario in the way Lease Recovery is triggered using the > Datanode RPC server via HADOOP-3310. > Each Datanode has dfs.datanode.handler.count handler threads (default of 3). > These handler threads are used to support the generation-stamp-dance protocol > as described in HADOOP-1700. > Let me try to explain the scenario with an example. Suppose, a cluster has > two datanodes. Also, let's assume that dfs.datanode.handler.count is set to > 1. Suppose that there are two clients, each writing to a separate file with a > replication factor of 2. Let's assume that both clients encounter an IO error > and triggers the generation-stamp-dance protocol. The first client may invoke > recoverBlock on the first datanode while the second client may invoke > recoverBlock on the second datanode. Now, each of the datanode will try to > make a getBlockMetaDataInfo() to the other datanode. But since each datanode > has only 1 server handler threads, both threads will block for eternity. > Deadlock! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.