[ https://issues.apache.org/jira/browse/HDFS-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kihwal Lee resolved HDFS-5586. ------------------------------ Resolution: Duplicate Most of the planned changes will be covered after HDFS-5498. There are some missing, but I don't think it is critical at this point. To name a few for later reference, - Quick registration with NN. When NN get a registration request from a datanode that isn't "dead" (i.e. restart), the blocks on the node will be removed from the blocksmap and readded when the initial block report is received. If DN isn't going to change its content significantly and the identity (storage ID) stays the same, NN may be better off keeping the block list for the DN and update it few minutes later when the block report is received. - DN to persist more state so that it can start serving sooner. Even if a DN is up, it won't be able to serve clients before registering with NN, because it cannot verify the block token. Saving the shared secret is risky though. The quick DN registration change will lower the DN restart overhead on NN, but reasonably paced DN rolling upgrades should still be acceptable even without this. This will be more useful in the case where DNs are restarted en masse. So I will not call it a necessary improvement for rolling upgrades. > Add quick-restart option for datanode > ------------------------------------- > > Key: HDFS-5586 > URL: https://issues.apache.org/jira/browse/HDFS-5586 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, ha, hdfs-client, namenode > Reporter: Kihwal Lee > Assignee: Kihwal Lee > > This feature, combined with the graceful shutdown feature, will enable data > nodes to come back up and start serving quickly. This is likely a command > line option for data node, which triggers it to look for saved state > information in its local storage. If the information is present and > reasonably up-to-date, data node would skip some of the startup steps. > Ideally it should be able to do quick registration without requiring removal > of all blocks from the date node descriptor on the name node and > reconstructing it with the initial full block report. This implies that all > RBW blocks are recorded during shutdown and on start-up they are not turned > into RWR. Other than the quick registration, name node should treat the > restart as if few heart beats were lost from the node. There should be no > unexpected replica state changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)