[ https://issues.apache.org/jira/browse/HDFS-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223943#comment-14223943 ]
Colin Patrick McCabe commented on HDFS-7433: -------------------------------------------- I might be missing something here, but it looks like the map is keyed on storageID. So how does {{DatanodeDescriptor#hashCode}} enter into it? We have a map of string to DatanodeDescriptor here, seems like the only relevant hash code is {{String#hashCode}}. What am I missing? In {{datanodeDump}}, seems like you can avoid creating the TreeMap... just create an array of storageID strings (with keys.values().toArray or something), call sort on it, and step through it. You get O(1) lookup from the hash table for each key. > DatanodeMap lookups & DatanodeID hashCodes are inefficient > ---------------------------------------------------------- > > Key: HDFS-7433 > URL: https://issues.apache.org/jira/browse/HDFS-7433 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.0.0-alpha, 3.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Priority: Critical > Attachments: HDFS-7433.patch > > > The datanode map is currently a {{TreeMap}}. For many thousands of > datanodes, tree lookups are ~10X more expensive than a {{HashMap}}. > Insertions and removals are up to 100X more expensive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)