I think the reasoning for wanting non-identity hashing comes from thoughts
like
this: a good hash function takes small (single bit) changes in the inputs,
and
produces typically 16 (32 / 2) bit changes in the output.  The identity
mapping
does not come close to satisfying this.

Another, practical way to look at it is that often, the integers you get as
inputs
may come from consecutive sequences in a fixed small range (0 to N <<
MAX_INTEGER),
and you may, for example, be partitioning your hash outputs not by mod k,
but
instead by consecutive ranges (0 to N/k - 1, N/k to 2N/k, ... ).  The
identity hash
function would cause major problems here, while a good hash function (as
defined
above) would be fine with it.

For the record, I agree with the java standard on this too, in that if
someone
really wants to gaurantee "nice" distribution on their hash outputs, they
can
do their own.  Having collisionless hashes which are optimally fast is best.

  -jake

On Mon, Jan 25, 2010 at 9:07 AM, Sean Owen <sro...@gmail.com> wrote:

> (Out of curiosity what does the distribution have to do with it --
> what's a distribution for which something besides identity is better?)
>
> On Mon, Jan 25, 2010 at 5:05 PM, Dawid Weiss <dawid.we...@gmail.com>
> wrote:
> > I was also thinking about this when implementing HPPC. Like I said,
> > unless you know exactly that you have a weird data distribution,
>

Reply via email to