On Saturday 11 October 2008 14:53:53 Christoph Otto wrote:

> Calling string_hash with a seed value other than the one used in src/hash.c
> (3793) can cause strange and wonderful failures if the STRING is reused by
> imcc.
>
> What happens is that after the STRING's hash is computed, it's cached in
> s->hashval.  This is works fine unless the first caller of string_hash on a
> given STRING uses a seed other than 3793 *and* the STRING is reused by imcc
> as a hash key.  When this happens, the second call to string_hash sees a
> cached hash which was computed with an unexpected seed.  When this STRING
> is used as a hash key, parrot_hash_get_bucket looks in the wrong bucket and
> fails to find the associated value.
>
> This leads to various levels of badness.  In Pipp's case, it means that
> with the following PIR code, the lookup of the hypothetical do_stuff METHOD
> would fail because the STRING 'do_stuff' would be hashed by
> Parrot_PhpArray_get_string_keyed_str with an unexpected seed.

I'd rather remove the hash seed from the key calculation.  Instead, let's use 
a global seed (#defined somewhere) as the initial seed, cache the calculated 
key value, then hash against any hash seed and the string's length if the 
hash has a non-zero seed.  That doesn't spread the hash seed's entropy 
througout the key as much, but it lets us cache calculated keys and should 
give us more entropy to reduce collisions.

Any mathematician is welcome to prove that this makes things worse, however.

-- c

Reply via email to