I think the reasoning for wanting non-identity hashing comes from thoughts like this: a good hash function takes small (single bit) changes in the inputs, and produces typically 16 (32 / 2) bit changes in the output. The identity mapping does not come close to satisfying this.
Another, practical way to look at it is that often, the integers you get as inputs may come from consecutive sequences in a fixed small range (0 to N << MAX_INTEGER), and you may, for example, be partitioning your hash outputs not by mod k, but instead by consecutive ranges (0 to N/k - 1, N/k to 2N/k, ... ). The identity hash function would cause major problems here, while a good hash function (as defined above) would be fine with it. For the record, I agree with the java standard on this too, in that if someone really wants to gaurantee "nice" distribution on their hash outputs, they can do their own. Having collisionless hashes which are optimally fast is best. -jake On Mon, Jan 25, 2010 at 9:07 AM, Sean Owen <sro...@gmail.com> wrote: > (Out of curiosity what does the distribution have to do with it -- > what's a distribution for which something besides identity is better?) > > On Mon, Jan 25, 2010 at 5:05 PM, Dawid Weiss <dawid.we...@gmail.com> > wrote: > > I was also thinking about this when implementing HPPC. Like I said, > > unless you know exactly that you have a weird data distribution, >