sodonnel commented on a change in pull request #1028: HDFS-14617 - Improve fsimage load time by writing sub-sections to the fsimage index URL: https://github.com/apache/hadoop/pull/1028#discussion_r313799031
########## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatPBINode.java ########## @@ -217,33 +272,147 @@ void loadINodeDirectorySection(InputStream in) throws IOException { INodeDirectory p = dir.getInode(e.getParent()).asDirectory(); for (long id : e.getChildrenList()) { INode child = dir.getInode(id); - addToParent(p, child); + if (addToParent(p, child)) { + if (child.isFile()) { + inodeList.add(child); + } + if (inodeList.size() >= 1000) { + addToCacheAndBlockMap(inodeList); + inodeList.clear(); + } + } Review comment: I have added a message like this to both adding the inode and inode references to the directory: ``` LOG.warn("Failed to add the inode reference {} to the directory {}", ref.getId(), p.getId()); ``` I opted to log only the inode and directory "inode id" as I am not sure if the system will be able to resolve the full path of an inode or directory at this stage, as it is still loading the image. Also this "should never happen" so hopefully we will not see these messages in practice, but if we do, it will likely require manually investigating an image corruption, so the ID numbers should be enough to start with that. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org