[ 
https://issues.apache.org/jira/browse/HDFS-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911069#comment-13911069
 ] 

Brandon Li commented on HDFS-5924:
----------------------------------

The suggested improvement sounds good to me. A couple more comments for the 
patch:
* By default dfs.client.datanode-restart.timeout is 30 seconds. However, 4 
second is hardcoded in the code as the maximum delay.  30 seconds here may 
confuse users.
* Should DFS_DATANODE_RESTART_REPLICA_EXPIRY_DEFAULT be less than 
min(DFS_CLIENT_DATANODE_RESTART_TIMEOUT_DEFAULT, 4 seconds) ? If datanode takes 
too long to start up, it loses the chance to be included in the original 
pipleline. 

> Utilize OOB upgrade message processing for writes
> -------------------------------------------------
>
>                 Key: HDFS-5924
>                 URL: https://issues.apache.org/jira/browse/HDFS-5924
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, ha, hdfs-client, namenode
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>         Attachments: HDFS-5924_RBW_RECOVERY.patch, 
> HDFS-5924_RBW_RECOVERY.patch
>
>
> After HDFS-5585 and HDFS-5583, clients and datanodes can coordinate 
> shutdown-restart in order to minimize failures or locality loss.
> In this jira, HDFS client is made aware of the restart OOB ack and perform 
> special write pipeline recovery. Datanode is also modified to load marked RBW 
> replicas as RBW instead of RWR as long as the restart did not take long. 
> For clients, it considers doing this kind of recovery only when there is only 
> one node left in the pipeline or the restarting node is a local datanode.  
> For both clients and datanodes, the timeout or expiration is configurable, 
> meaning this feature can be turned off by setting timeout variables to 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to