[ https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878462#action_12878462 ]
Scott Carey commented on HDFS-1114: ----------------------------------- if you are using a power of two hash table, you can avoid problems caused by hash value clustering by using a Fibonacci Hash. Essentially, use the multiplicative hash with a special value g: (h * g) & mask where h is the hash value and g is the 'golden ratio' number for the size of the table used. Since multiplication on today's processors is far faster than division or remainders, this can be used to 'uncluster' hash values. A single consecutive run of values gets maximally distributed into the space, and high and low bits are redistributed evenly so that the mask does not increase collisions. Whether this is a desired property or not will depend on the properties of the hash values and whether or not an open addressing solution is used. Open addressing can further reduce the memory footprint by allowing the raw object to be placed in the map instead of a container object or list. some links found from a few searches: http://www.brpreiss.com/books/opus4/html/page214.html http://staff.newport.ac.uk/ctubb01/ct/advp/hashtables.pdf > Reducing NameNode memory usage by an alternate hash table > --------------------------------------------------------- > > Key: HDFS-1114 > URL: https://issues.apache.org/jira/browse/HDFS-1114 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: Tsz Wo (Nicholas), SZE > Assignee: Tsz Wo (Nicholas), SZE > Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch > > > NameNode uses a java.util.HashMap to store BlockInfo objects. When there are > many blocks in HDFS, this map uses a lot of memory in the NameNode. We may > optimize the memory usage by a light weight hash table implementation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.