Re: [Pharo-project] Hashed collection changes, the performance graphs

Andres Valloud Wed, 28 Oct 2009 19:15:18 -0700

Martin,

One of the constituencies I thought of when I decided to leave 
identityHash alone was folks like you.  Now, as a representative, if you 
are ok with dealing with broken identityHash senders (which I hope will 
be few), then most of my motivation for leaving identityHash unchanged 
is gone.  Thus, I would not mind changing identityHash and implementing 
primIdentityHash.


What about others?  Would anybody mind if identityHash was changed?

Some comments below...

> I took a survey of the senders of #identityHash in the latest web image. 
> There aren't that many. The largest category is those that want the 
> printString of the identityHash.
>   

These would probably need to be changed to get the printString of the 
primIdentityHash.

> Of those that care about the value of the identityHash, there are 
> several that use it in #hash methods. The most common is this definition:
>
> hash
>    ^self identityHash
>
> These are presumably overriding superclass behavior to restore Object 
> behavior.

I'd like to take a look at these, I suspect there may be low hanging 
fruit waiting to be fixed.

> If the authors knew about the limited range of #identityHash, that is 
> entirely possible. I tend to think it more likely that in most cases 
> these implementations are just the simplest way to follow the dictate 
> that 'a=b -> a hash = b hash', and that they didn't really think about 
> the impact on collection performance.
>   

Or maybe they chose identityHash because they can assume uniqueness (= 
effectively being ==)...

> 5 improved, 2 harmed. And one of the listed harmed is MethodDictionary, 
> whose performance would not be harmed, but I assume the VM would not be 
> happy if their hashing was changed (anybody know for sure whether that's 
> true?)
>   

The VM probably knows a lot about identityHash values, and most likely 
uses the primIdentityHash values because then it doesn't have to shift 
on access.

> They could, and I admit to having written this kind of code in the past, 
> but I doubt that I'm typical in doing so. Do you know of any Pharo code 
> that actually *does* this sort of thing? There isn't any in the 
> distributed web image, but I didn't look at every package that is meant 
> to be loadable in Pharo.
>   

I might suspect that Magma does this kind of stuff... but that's just a 
guess.  I didn't immediately see any code doing so.  As long as package 
maintainers are fine with two quite different versions of Pharo with 
very different identityHash method behaviors, then I do not have a problem.

>> Clever hacks such as
>>
>> SomeObject>>hash
>>
>>   ^(self variableA identityHash bitShift: 12) + self variableB identityHash
>>
>>
>> would also remain undisturbed.  
>>     
>
> Yes, if #identityHash is changed it's the clever hacks that will have to 
> change. This could be a disadvantage of this approach, but often, as in 
> the case of IdentityDictionary, IdentitySet, and 
> WeakIdentityKeyDictionary, the necessary change is simply to remove the 
> clever hack, get simpler code, and enjoy better performance than you got 
> with the clever hack, so making the change is IMO an improvement.
>   

We agree, mod I wouldn't want to impose version maintenance homework on 
maintainers of large packages.  For the sake of illustration only, and 
using Magma without knowing if it would be affected, I wouldn't want 
whoever is maintaining Magma to maintain two branches... one for Pharo 
1.xyz, and one for Pharo 1.xyz++.

>> Finally, I do not know of any Smalltalk 
>> in which identityHash does not answer the actual object header bits for 
>> the identity hash. If we change identityHash, then AFAIK Pharo would
>> become the only Smalltalk in which identityHash does not answer 
>> consecutive values between 0 and (2^k)-1 (k=12 for Squeak/Pharo, k=14 
>> for VisualWorks 32 bits, k=20 for VisualWorks 64 bits, IIRC k=15 for VA 
>> and VisualSmalltalk).  
>>     
>
> GemStone is a Smalltalk that does not answer consecutive values for 
> identityHash.

Haha, I was thinking of "regular" image based Smalltalks...

> In GemStone the identityHash is computed from the object's 
> OOP, and OOPs are not consecutive.

Not necessarily, although I suspect identityHash values map to an 
integer interval along the lines of [0, 2^40-1].  So, if you look at 
hash(x) as a function, the image of hash(x) is a set of consecutive 
intervals.  Using bitShift: to scale identityHash values would make the 
image of hash(x) sparse (with the exception of small integers, 
characters and, to some extent in VW 64 bit, small doubles).

>  And Smalltalk-80 basically used the 
> same scheme, though you could only have 32K objects, every one had a 
> different identityHash based on OOP.
>   

These are also consecutive values... [0, 2^15-1], basically.

> Also, most (all?) Smalltalks with limited ranges for identityHash do 
> have a larger range of identityHash for SmallIntegers (usually ^self), 
> so you can't use the clever hacks if you might have any SmallIntegers in 
> your collection. So any general-purpose collection must already deal 
> with the full SmallInteger range of identity hashes as keys, cannot use 
> the clever hacks, and so is likely to only be improved by changing 
> #identityHash. This is a key point that I forgot to bring up last night.
>   

Well, more or less, because with scaledIdentityHash you'd need to 
implement it in SmallInteger as ^self... but yes, I think hashed 
collections shouldn't be put into a position where they judge what's a 
good hash value and what isn't (and  spend CPU time doing so at 
runtime!!!).  Java does this, and as far as I could see back when I 
studied Java's hashing implementation, IMO it's not a good idea.

Andres.

_______________________________________________
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] Hashed collection changes, the performance graphs

Reply via email to