[jira] [Commented] (HDFS-7433) DatanodeMap lookups & DatanodeID hashCodes are inefficient

Colin Patrick McCabe (JIRA) Mon, 24 Nov 2014 18:15:54 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223943#comment-14223943
 ]


Colin Patrick McCabe commented on HDFS-7433:
--------------------------------------------

I might be missing something here, but it looks like the map is keyed on 
storageID.  So how does {{DatanodeDescriptor#hashCode}} enter into it?  We have 
a map of string to DatanodeDescriptor here, seems like the only relevant hash 
code is {{String#hashCode}}.  What am I missing?

In {{datanodeDump}}, seems like you can avoid creating the TreeMap... just 
create an array of storageID strings (with keys.values().toArray or something), 
call sort on it, and step through it.  You get O(1) lookup from the hash table 
for each key.

> DatanodeMap lookups & DatanodeID hashCodes are inefficient
> ----------------------------------------------------------
>
>                 Key: HDFS-7433
>                 URL: https://issues.apache.org/jira/browse/HDFS-7433
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-7433.patch
>
>
> The datanode map is currently a {{TreeMap}}.  For many thousands of 
> datanodes, tree lookups are ~10X more expensive than a {{HashMap}}.  
> Insertions and removals are up to 100X more expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7433) DatanodeMap lookups & DatanodeID hashCodes are inefficient

Reply via email to