[ 
https://issues.apache.org/jira/browse/HDFS-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878462#action_12878462
 ] 

Scott Carey commented on HDFS-1114:
-----------------------------------

if you are using a power of two hash table, you can avoid problems caused by 
hash value clustering by using a Fibonacci Hash.   Essentially, use the 
multiplicative hash with a special value g:

(h * g) & mask

where h is the hash value and g is the 'golden ratio' number for the size of 
the table used.  Since multiplication on today's processors is far faster than 
division or remainders, this can be used to 'uncluster' hash values.  A single 
consecutive run of values gets maximally distributed into the space, and high 
and low bits are redistributed evenly so that the mask does not increase 
collisions.  Whether this is a desired property or not will depend on the 
properties of the hash values and whether or not an open addressing solution is 
used.

Open addressing can further reduce the memory footprint by allowing the raw 
object to be placed in the map instead of a container object or list.

some links found from a few searches:
http://www.brpreiss.com/books/opus4/html/page214.html
http://staff.newport.ac.uk/ctubb01/ct/advp/hashtables.pdf

> Reducing NameNode memory usage by an alternate hash table
> ---------------------------------------------------------
>
>                 Key: HDFS-1114
>                 URL: https://issues.apache.org/jira/browse/HDFS-1114
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: GSet20100525.pdf, gset20100608.pdf, h1114_20100607.patch
>
>
> NameNode uses a java.util.HashMap to store BlockInfo objects.  When there are 
> many blocks in HDFS, this map uses a lot of memory in the NameNode.  We may 
> optimize the memory usage by a light weight hash table implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to