Re: [druntime] TypeInfo.getHash problems

Dmitry Olshansky Thu, 01 Mar 2012 08:35:30 -0800

On 01.03.2012 18:56, H. S. Teoh wrote:

On Thu, Mar 01, 2012 at 12:36:31PM +0400, Dmitry Olshansky wrote:

On 01.03.2012 1:45, H. S. Teoh wrote:

I found the following bugs with the current implementations of
getHash in the various TypeInfo classes:


1) TypeInfo_Array.getHash doesn't work correctly with class objects.
with custom toHash() methods. Or, for that matter, arrays of AA's
(which have their own problems--see (3a)). It just does a hash of the
binary representation of the array.

2) TypeInfo_StaticArray.getHash doesn't suffer from (1), but it does
suffer from another problem: it simply sums the hashes from each
element, but this is prone to reordering hash collisions: an int[3]
with elements [1,2,3] will have the same hash value as [3,2,1] or any
reordering thereof.

3) TypeInfo_AssociativeArray.getHash is not implemented, so it just
defaults to the base class getHash, which just does a hash of the
binary representation of the AA. Unfortunately, this is fraught with
problems:

        a) Two AA's with exactly the same key/value pairs may be binary
        unequal, because the underlying size of the hash table may be
        different (this happens when you initialize an AA with a literal
        vs. manually setting each element).

        b) If values are objects that implement toHash(), then the
        current implementation will have the wrong behaviour.

4) TypeInfo_Class.getHash will call the object's toHash method, if
implemented, otherwise it hashes the binary representation of the
object. But if the object has members that are references/pointers or
other objects/structs that implement getHash, this will be wrong.

[...]

To fix (1) may cause noticeable slowdown when the key is an array of
basic types, such as dstring (we'd have to compute the hash of each
dchar, then compute the hash of the hashes).  Is there an easy way to
tell whether or not an array's elements are binary-hashable? If so,
we can still use hashing of the binary representation of the array
for the simple cases (dstring, wstring, array of structs without
nested refs/objects), and save the slower code for more complicated
arrays.


std.traits hasIndirections?


This has to be fixed in druntime, though. Is there a druntime equivalent
to std.traits?

The fact that std.traits belongs to phobos doesn't mean it's impossibleto smuggle few artifacts to druntime :)


[..]

For (3), we can collect the hashes of the keys (already stored in the
bucket) and the hashes of the values, and doing a hash on them. (This
seems to be the easiest one to fix?)

Or hash them as array of pairs, but whatever is faster.


I did a quick implementation of this:

        https://github.com/D-Programming-Language/druntime/pull/171

Basically do a hash of the respective hash values of the key and value,
and sum them over all pairs. The sum is commutative, so it avoids the
problem of two AA's with identical contents but different internal
hashtable sizes causing pair reordering.


Looks like a clean solution.

To fix (4) will require compile-time reflection... but may introduce
slowdowns? Is there a way to detect the easy cases (binary-hashable)
so that we only incur slowdown on the complicated cases?


see  point 1.

[...]

It would be good if there was a clean way to do this in druntime.




--
Dmitry Olshansky

Re: [druntime] TypeInfo.getHash problems

Reply via email to