[ https://issues.apache.org/jira/browse/HDFS-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643246#comment-13643246 ]
Hudson commented on HDFS-4721: ------------------------------ Integrated in Hadoop-trunk-Commit #3673 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3673/]) HDFS-4721. Speed up lease recovery by avoiding stale datanodes and choosing the datanode with the most recent heartbeat as the primary. Contributed by Varun Sharma (Revision 1476399) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476399 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfoUnderConstruction.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestHeartbeatHandling.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java > Speed up lease/block recovery when DN fails and a block goes into recovery > -------------------------------------------------------------------------- > > Key: HDFS-4721 > URL: https://issues.apache.org/jira/browse/HDFS-4721 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.0.3-alpha > Reporter: Varun Sharma > Assignee: Varun Sharma > Fix For: 2.0.5-beta > > Attachments: 4721-branch2.patch, 4721-trunk.patch, > 4721-trunk-v2.patch, 4721-trunk-v3.patch, 4721-trunk-v4.patch, 4721-v2.patch, > 4721-v3.patch, 4721-v4.patch, 4721-v5.patch, 4721-v6.patch, 4721-v7.patch, > 4721-v8.patch > > > This was observed while doing HBase WAL recovery. HBase uses append to write > to its write ahead log. So initially the pipeline is setup as > DN1 --> DN2 --> DN3 > This WAL needs to be read when DN1 fails since it houses the HBase > regionserver for the WAL. > HBase first recovers the lease on the WAL file. During recovery, we choose > DN1 as the primary DN to do the recovery even though DN1 has failed and is > not heartbeating any more. > Avoiding the stale DN1 would speed up recovery and reduce hbase MTTR. There > are two options. > a) Ride on HDFS 3703 and if stale node detection is turned on, we do not > choose stale datanodes (typically not heart beated for 20-30 seconds) as > primary DN(s) > b) We sort the replicas in order of last heart beat and always pick the ones > which gave the most recent heart beat > Going to the dead datanode increases lease + block recovery since the block > goes into UNDER_RECOVERY state even though no one is recovering it actively. > Please let me know if this makes sense. If yes, whether we should move > forward with a) or b). > Thanks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira