[ https://issues.apache.org/jira/browse/HDFS-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345098#comment-14345098 ]
Kai Zheng commented on HDFS-7876: --------------------------------- The optimization allows NNs loading fsimage and DNs scanning blocks to be parallel, and sounds good. I guess you need to be careful to avoid race conditions like some operations happening from RPC that rely on but before the loading completion of fsimage. Is it ok if a DN finishes scanning blocks and starts to report blocks while NN is still loading image ? > DataNodes start to scan blocks earlier > -------------------------------------- > > Key: HDFS-7876 > URL: https://issues.apache.org/jira/browse/HDFS-7876 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode > Affects Versions: 3.0.0 > Reporter: Xinwei Qin > Assignee: Xinwei Qin > > When Hadoop cluster restarts, DataNodes will scan local blocks, and report > this infomation to NameNode. DataNodes start to scan local blocks after > obtaining the NamespaceInfo from NameNode via RPC call versionRequest(), > which needs the establishment of NameNode RPC server. > Now, the RPC server will not be created and started until the completion of > loading FsImage. So, DataNodes cannot start to scan blocks immediately, and > must wait for NameNode to load FsImage. This will cause time wasting of > DataNode when the FsImage is very large. > Since the RPC server has very little dependence of FsImage, and the > NamespaceInfo (namespaceID, clustered, blockpoolID, cTime, etc.) can be > constructed from VERSION file, we can create and start RPC server before > loading FsImage, so that DataNodes can get NamespaceInfo from NameNode via > RPC call as soon as possible, and start to scan blocks earlier, which will > shorten restart time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)