[ 
https://issues.apache.org/jira/browse/HDFS-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-5586.
------------------------------

    Resolution: Duplicate

Most of the planned changes will be covered after HDFS-5498. There are some 
missing, but I don't think it is critical at this point. To name a few for 
later reference,

- Quick registration with NN. When NN get a registration request from a 
datanode that isn't "dead" (i.e. restart), the blocks on the node will be 
removed from the blocksmap and readded when the initial block report is 
received. If DN isn't going to change its content significantly and the 
identity (storage ID) stays the same, NN may be better off keeping the block 
list for the DN and update it few minutes later when the block report is 
received.

- DN to persist more state so that it can start serving sooner. Even if a DN is 
up, it won't be able to serve clients before registering with NN, because it 
cannot verify the block token. Saving the shared secret is risky though.

The quick DN registration change will lower the DN restart overhead on NN, but 
reasonably paced DN rolling upgrades should still be acceptable even without 
this.  This will be more useful in the case where DNs are restarted en masse. 
So I will not call it a necessary improvement for rolling upgrades.

> Add quick-restart option for datanode
> -------------------------------------
>
>                 Key: HDFS-5586
>                 URL: https://issues.apache.org/jira/browse/HDFS-5586
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, ha, hdfs-client, namenode
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>
> This feature, combined with the graceful shutdown feature, will enable data 
> nodes to come back up and start serving quickly.  This is likely a command 
> line option for data node, which triggers it to look for saved state 
> information in its local storage.  If the information is present and 
> reasonably up-to-date, data node would skip some of the startup steps.
> Ideally it should be able to do quick registration without requiring removal 
> of all blocks from the date node descriptor on the name node and 
> reconstructing it with the initial full block report. This implies that all 
> RBW blocks are recorded during shutdown and on start-up they are not turned 
> into RWR. Other than the quick registration, name node should treat the 
> restart as if few heart beats were lost from the node. There should be no 
> unexpected replica state changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to