[ 
https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tharindu Rusira updated MAHOUT-1242:
------------------------------------

    Attachment: MAHOUT-1242.patch

Hi [~dawidweiss], I'm currently working on this issue and for the time being I 
attach a simple implementation of the final step of murmurhash3 as you 
suggested. 
Your feedback is highly appreciated.
Thanks.
P.S. not tested

> No key redistribution function for associative maps
> ---------------------------------------------------
>
>                 Key: MAHOUT-1242
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1242
>             Project: Mahout
>          Issue Type: Improvement
>          Components: collections, Math
>            Reporter: Dawid Weiss
>         Attachments: MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just 
> returns the key value:
> {code}
>   /**
>    * Returns a hashcode for the specified value.
>    *
>    * @return a hash code value for the specified value.
>    */
>   public static int hash(int value) {
>     return value;
>     //return value * 0x278DDE6D; // see 
> org.apache.mahout.math.jet.random.engine.DRand
>     /*
>     value &= 0x7FFFFFFF; // make it >=0
>     int hashCode = 0;
>     do hashCode = 31*hashCode + value%10;
>     while ((value /= 10) > 0);
>     return 28629151*hashCode; // spread even further; h*31^5
>     */
>   }
>  {code}
> This easily leads to very degenerate behavior on keys that have constant 
> lower bits (long collision chains). A simple (and strong) hash function like 
> the final step of murmurhash3 goes a long way at ensuring the keys 
> distribution is more uniform regardless of the input distribution.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to