[ https://issues.apache.org/jira/browse/HDFS-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127414#comment-13127414 ]
Praveen Kumar K J V S commented on HDFS-1447: --------------------------------------------- Good work. In the old code the time for processing is approximately O(n^n) hence the time for execution increases exponentially as the number of file increases in a directory In the new code the order is much linear. I wonder if there is way for making time complexity of the function logarithmic +1 if all the tests pass > Make getGenerationStampFromFile() more efficient, so it doesn't reprocess > full directory listing for every block > ---------------------------------------------------------------------------------------------------------------- > > Key: HDFS-1447 > URL: https://issues.apache.org/jira/browse/HDFS-1447 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node > Affects Versions: 0.20.2 > Reporter: Matt Foley > Assignee: Matt Foley > Attachments: HDFS-1447.patch, Test_HDFS_1447_NotForCommitt.java.patch > > > Make getGenerationStampFromFile() more efficient. Currently this routine is > called by addToReplicasMap() for every blockfile in the directory tree, and > it walks each file's containing directory on every call. There is a simple > refactoring that should make it more efficient. > This work item is one of four sub-tasks for HDFS-1443, Improve Datanode > startup time. > The fix will probably be folded into sibling task HDFS-1446, which is already > refactoring the method that calls getGenerationStampFromFile(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira