[ 
https://issues.apache.org/jira/browse/HDFS-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638295#comment-13638295
 ] 

Nicolas Liochon commented on HDFS-4721:
---------------------------------------

The algo as I understand it is:
 namenode sends, with the heartbeat, the request to start the recovery to one 
datanode. The recovery is finished when the file is no longer in construction.
 the chosen datanode will call sequentially all other datanodes in the 
pipeline, including itself, to synchronize on the block size.
 the chosen datanode will then update the namenode, and the file won't be 
anymore in construction.
 
Issues is: if one of the DN is dead, we will have to wait for a few socket 
timeout or more, as we will try to contact it.
In this JIRA, it's fixed by skipping the stale datanode. But:
 - if the server is only stale, it won't be participating to the recovery (not 
sure of the impact. If it's acceptable, it's great).
 - if the server is dead, we're done for a wait of at least 30s.
 
Would it be possible to consider the file as not in construction as soon as the 
chosen datanode has updated it's own replica?
Then we would not depend anymore on the others: with one datanode we would be 
done.
                
> Speed up lease/block recovery when DN fails and a block goes into recovery
> --------------------------------------------------------------------------
>
>                 Key: HDFS-4721
>                 URL: https://issues.apache.org/jira/browse/HDFS-4721
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.0.3-alpha
>            Reporter: Varun Sharma
>             Fix For: 2.0.4-alpha
>
>         Attachments: 4721-hadoop2.patch
>
>
> This was observed while doing HBase WAL recovery. HBase uses append to write 
> to its write ahead log. So initially the pipeline is setup as
> DN1 --> DN2 --> DN3
> This WAL needs to be read when DN1 fails since it houses the HBase 
> regionserver for the WAL.
> HBase first recovers the lease on the WAL file. During recovery, we choose 
> DN1 as the primary DN to do the recovery even though DN1 has failed and is 
> not heartbeating any more.
> Avoiding the stale DN1 would speed up recovery and reduce hbase MTTR. There 
> are two options.
> a) Ride on HDFS 3703 and if stale node detection is turned on, we do not 
> choose stale datanodes (typically not heart beated for 20-30 seconds) as 
> primary DN(s)
> b) We sort the replicas in order of last heart beat and always pick the ones 
> which gave the most recent heart beat
> Going to the dead datanode increases lease + block recovery since the block 
> goes into UNDER_RECOVERY state even though no one is recovering it actively. 
> Please let me know if this makes sense. If yes, whether we should move 
> forward with a) or b).
> Thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to