# New Ticket Created by  Christoph Otto 
# Please include the string:  [perl #59810]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=59810 >


Calling string_hash with a seed value other than the one used in src/hash.c 
(3793) can cause strange and wonderful failures if the STRING is reused by imcc.

What happens is that after the STRING's hash is computed, it's cached in 
s->hashval.  This is works fine unless the first caller of string_hash on a 
given STRING uses a seed other than 3793 *and* the STRING is reused by imcc as 
a hash key.  When this happens, the second call to string_hash sees a cached 
hash which was computed with an unexpected seed.  When this STRING is used as 
a hash key, parrot_hash_get_bucket looks in the wrong bucket and fails to find 
the associated value.

This leads to various levels of badness.  In Pipp's case, it means that with 
the following PIR code, the lookup of the hypothetical do_stuff METHOD would 
fail because the STRING 'do_stuff' would be hashed by 
Parrot_PhpArray_get_string_keyed_str with an unexpected seed.

p = new 'PhpArray'
p['do_stuff'] = 'x' #caches an unexpected hash in s->hashval
p.'do_stuff'()      #method lookup fails with reused STRING 'do_stuff'

Because string_hash is marked PARROT_API and presumably intended for external 
use, a caller should be able to call it with an arbitrary seed without messing 
up the rest of Parrot.  The attached patch allows this by adding a seedval 
field to parrot_string_t and checking both the seedval and the hashval before 
returning the cached hashval of a STRING.

I'd appreciate comments on whether this is the right approach and if there is 
anything the patch misses.
Index: src/hash.c
===================================================================
--- src/hash.c	(revision 31889)
+++ src/hash.c	(working copy)
@@ -146,7 +146,7 @@
 static size_t
 key_hash_STRING(PARROT_INTERP, ARGMOD(STRING *s), size_t seed)
 {
-    if (s->hashval) {
+    if (s->hashval && s->seedval == seed) {
         return s->hashval;
     }
 
Index: src/string.c
===================================================================
--- src/string.c	(revision 31889)
+++ src/string.c	(working copy)
@@ -2245,6 +2245,7 @@
 
     h          = CHARSET_COMPUTE_HASH(interp, s, seed);
     s->hashval = h;
+    s->seedval = seed;
 
     return h;
 }
Index: include/parrot/pobj.h
===================================================================
--- include/parrot/pobj.h	(revision 31889)
+++ include/parrot/pobj.h	(working copy)
@@ -124,6 +124,7 @@
     UINTVAL     bufused;
     UINTVAL     strlen;
     UINTVAL     hashval; /* cached hash value computation */
+    size_t      seedval; /* cached hash seed */
 
     /*    parrot_string_representation_t representation;*/
     const struct _encoding *encoding;
Index: docs/pdds/pdd28_strings.pod
===================================================================
--- docs/pdds/pdd28_strings.pod	(revision 31889)
+++ docs/pdds/pdd28_strings.pod	(working copy)
@@ -265,6 +265,7 @@
       UINTVAL                       bufused;
       UINTVAL                       strlen;
       UINTVAL                       hashval;
+      size_t                        seedval;
       const struct _encoding       *encoding;
       const struct _charset        *charset;
       const struct _normalization  *normalization;
@@ -303,6 +304,12 @@
 A cache of the hash value of the string, for rapid lookups when the string is
 used as a hash key.
 
+=item seedval
+
+A cache of the seed used to compute the hash value of the string.  This value
+ensures that hashes computed with different seeds will not interfere with
+eachother.
+
 =item encoding
 
 How the data is encoded (e.g. fixed 8-bit characters, UTF-8, or UTF-32).  Note

Reply via email to