# New Ticket Created by Christoph Otto # Please include the string: [perl #59810] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=59810 >
Calling string_hash with a seed value other than the one used in src/hash.c (3793) can cause strange and wonderful failures if the STRING is reused by imcc. What happens is that after the STRING's hash is computed, it's cached in s->hashval. This is works fine unless the first caller of string_hash on a given STRING uses a seed other than 3793 *and* the STRING is reused by imcc as a hash key. When this happens, the second call to string_hash sees a cached hash which was computed with an unexpected seed. When this STRING is used as a hash key, parrot_hash_get_bucket looks in the wrong bucket and fails to find the associated value. This leads to various levels of badness. In Pipp's case, it means that with the following PIR code, the lookup of the hypothetical do_stuff METHOD would fail because the STRING 'do_stuff' would be hashed by Parrot_PhpArray_get_string_keyed_str with an unexpected seed. p = new 'PhpArray' p['do_stuff'] = 'x' #caches an unexpected hash in s->hashval p.'do_stuff'() #method lookup fails with reused STRING 'do_stuff' Because string_hash is marked PARROT_API and presumably intended for external use, a caller should be able to call it with an arbitrary seed without messing up the rest of Parrot. The attached patch allows this by adding a seedval field to parrot_string_t and checking both the seedval and the hashval before returning the cached hashval of a STRING. I'd appreciate comments on whether this is the right approach and if there is anything the patch misses.
Index: src/hash.c =================================================================== --- src/hash.c (revision 31889) +++ src/hash.c (working copy) @@ -146,7 +146,7 @@ static size_t key_hash_STRING(PARROT_INTERP, ARGMOD(STRING *s), size_t seed) { - if (s->hashval) { + if (s->hashval && s->seedval == seed) { return s->hashval; } Index: src/string.c =================================================================== --- src/string.c (revision 31889) +++ src/string.c (working copy) @@ -2245,6 +2245,7 @@ h = CHARSET_COMPUTE_HASH(interp, s, seed); s->hashval = h; + s->seedval = seed; return h; } Index: include/parrot/pobj.h =================================================================== --- include/parrot/pobj.h (revision 31889) +++ include/parrot/pobj.h (working copy) @@ -124,6 +124,7 @@ UINTVAL bufused; UINTVAL strlen; UINTVAL hashval; /* cached hash value computation */ + size_t seedval; /* cached hash seed */ /* parrot_string_representation_t representation;*/ const struct _encoding *encoding; Index: docs/pdds/pdd28_strings.pod =================================================================== --- docs/pdds/pdd28_strings.pod (revision 31889) +++ docs/pdds/pdd28_strings.pod (working copy) @@ -265,6 +265,7 @@ UINTVAL bufused; UINTVAL strlen; UINTVAL hashval; + size_t seedval; const struct _encoding *encoding; const struct _charset *charset; const struct _normalization *normalization; @@ -303,6 +304,12 @@ A cache of the hash value of the string, for rapid lookups when the string is used as a hash key. +=item seedval + +A cache of the seed used to compute the hash value of the string. This value +ensures that hashes computed with different seeds will not interfere with +eachother. + =item encoding How the data is encoded (e.g. fixed 8-bit characters, UTF-8, or UTF-32). Note