On Fri, 17 May 2013, Benjamin Kaduk wrote:

On Fri, 17 May 2013, Simon Wilkinson wrote:

hash to be used to for the authentication name hash table isn't specified.

Reusing the old NameHash is probably not a good choice. I'll think about think about this, too.

Consulting Knuth, it sounds like the state of the art is a pretty old solution -- for a multiword/multicharacter key, use
h(K) = (h_1(x_1) + h_2(x_2) + ... + h_n(x_n)) mod M
where each h_i() applies to word i and is a hash function independent of the other h_j() (for us, M should probably remain 8191).

We could use either 8- or 32-bit words (since our data structure is inherently 32-bit word-aligned), but he notes that using characters allows the hash to be just a lookup table.

That formulation has very nice properties when the h_i() are chosen randomly, but we probably only want to specify a fixed number and reuse them cyclically for longer keys (as opposed to specifying an algorithm for defining an arbitrary h_i()). It seems convenient to specify enough h_i() to handle a full block/entry's worth of name data -- the 32k of storage needed to hold all the hash function tables in memory is not excessively large. This would also provide a convenient opportunity to apply the mod-8191 operation before overflow is possible. Hmm, on an i7 that would even fit in L1 cache; maybe locality concerns are not particularly important.

Using a fixed set of repeating h_i() will mean that names which differ only by having a pair of characters swapped which are 128 positions apart will hash to the same value (and names which differ by arbitrarily many such swaps or permutations). I am not particularly familiar with the structure of x500 names, but I hope that 128 is large enough for these congruencies to be effectively random.

-Ben
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to