Jim Meyering <jim <at> meyering.net> writes:
> Unless I become exceptionally unresponsive,
> please post before pushing changes to hash.c.
>
> I would have requested that you not mix clean-up like this with
> a semantics-changing change set:
Sorry for jumping the gun on that. I'll be a little more patient for the rest
of my series, then.
> > + hash: provide default callback functions
> > + * lib/hash.c (raw_hasher, raw_comparator): New functions.
> > + (hash_initialize): Use them as defaults.
> > + * tests/test-hash.c (main): Test this.
>
> This looks fine, too.
> Yes, I've often hashed pointer values.
> However, note that the low bits of a pointer are often zero,
> so that for small table sizes, this may be a very poor choice.
Agreed (and these days, pointers are usually aligned to at least 8 bytes, not
just 4).
> size_t
> hash_address (const void *addr, size_t table_size)
> {
> size_t k = (size_t) addr;
> assert (k % 4 == 0);
> return (k / 4) % table_size;
> }
That discards bits, in the case where non-malloced values are being used. The
assertion is nice when you know that all values are malloc'd, but is harsh for
a default. So, how about I squash this slight modification into my patch
before pushing this commit?
diff --git i/lib/hash.c w/lib/hash.c
index 52a211e..c9866c5 100644
--- i/lib/hash.c
+++ w/lib/hash.c
@@ -483,7 +483,14 @@ hash_reset_tuning (Hash_tuning *tuning)
static size_t
raw_hasher (const void *data, size_t n)
{
- return (size_t) data % n;
+ /* When hashing unique pointers, it is often the case that they were
+ generated by malloc and thus have the property that the low-order
+ bits are 0. As this tends to give poorer performance with small
+ tables, we rotate the pointer value before performing division,
+ in an attempt to improve hash quality. */
+ size_t val = data;
+ val = (val >> 3) | (val << (CHAR_BIT * sizeof val - 3));
+ return val % n;
}
/* If the user passes a NULL comparator, we use pointer comparison. */