Xinwei Qin  created HDFS-7876:
---------------------------------

             Summary: DataNodes start to scan blocks earlier
                 Key: HDFS-7876
                 URL: https://issues.apache.org/jira/browse/HDFS-7876
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: datanode, namenode
    Affects Versions: 3.0.0
            Reporter: Xinwei Qin 
            Assignee: Xinwei Qin 


When Hadoop cluster restarts, DataNodes will scan local blocks, and report this 
infomation to NameNode. DataNodes start to scan local blocks after obtaining 
the NamespaceInfo from NameNode via RPC call versionRequest(), which needs the 
establishment of NameNode RPC server. 

Now, the RPC server will not be created and started until the completion of 
loading FsImage. So, DataNodes cannot start to scan blocks immediately, and 
must wait for NameNode to load FsImage. This will cause time wasting of 
DataNode when the FsImage is very large. 

Since the RPC server has very little dependence of FsImage, and the 
NamespaceInfo (namespaceID, clustered, blockpoolID, cTime, etc.) can be 
constructed from VERSION file, we can create and start RPC server before 
loading FsImage, so that DataNodes can get NamespaceInfo from NameNode via RPC 
call as soon as possible, and start to scan blocks earlier, which will shorten 
restart time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to