[ https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837569#comment-13837569 ]
Dawid Weiss commented on MAHOUT-1242: ------------------------------------- You introduced an unused import (Random). And the code could be simplified a bit too (although it remains identical at the bytecode level it seems clearer): {code} /** * Hashes a 4-byte sequence (Java int). */ public static int hash(int k) { k ^= k >>> 16; k *= 0x85ebca6b; k ^= k >>> 13; k *= 0xc2b2ae35; k ^= k >>> 16; return k; } {code} > No key redistribution function for associative maps > --------------------------------------------------- > > Key: MAHOUT-1242 > URL: https://issues.apache.org/jira/browse/MAHOUT-1242 > Project: Mahout > Issue Type: Improvement > Components: collections, Math > Reporter: Dawid Weiss > Attachments: MAHOUT-1242.patch > > > All integer-based maps currently use HashFunctions.hash(int) which just > returns the key value: > {code} > /** > * Returns a hashcode for the specified value. > * > * @return a hash code value for the specified value. > */ > public static int hash(int value) { > return value; > //return value * 0x278DDE6D; // see > org.apache.mahout.math.jet.random.engine.DRand > /* > value &= 0x7FFFFFFF; // make it >=0 > int hashCode = 0; > do hashCode = 31*hashCode + value%10; > while ((value /= 10) > 0); > return 28629151*hashCode; // spread even further; h*31^5 > */ > } > {code} > This easily leads to very degenerate behavior on keys that have constant > lower bits (long collision chains). A simple (and strong) hash function like > the final step of murmurhash3 goes a long way at ensuring the keys > distribution is more uniform regardless of the input distribution. -- This message was sent by Atlassian JIRA (v6.1#6144)