Martin, One of the constituencies I thought of when I decided to leave identityHash alone was folks like you. Now, as a representative, if you are ok with dealing with broken identityHash senders (which I hope will be few), then most of my motivation for leaving identityHash unchanged is gone. Thus, I would not mind changing identityHash and implementing primIdentityHash.
What about others? Would anybody mind if identityHash was changed? Some comments below... > I took a survey of the senders of #identityHash in the latest web image. > There aren't that many. The largest category is those that want the > printString of the identityHash. > These would probably need to be changed to get the printString of the primIdentityHash. > Of those that care about the value of the identityHash, there are > several that use it in #hash methods. The most common is this definition: > > hash > ^self identityHash > > These are presumably overriding superclass behavior to restore Object > behavior. I'd like to take a look at these, I suspect there may be low hanging fruit waiting to be fixed. > If the authors knew about the limited range of #identityHash, that is > entirely possible. I tend to think it more likely that in most cases > these implementations are just the simplest way to follow the dictate > that 'a=b -> a hash = b hash', and that they didn't really think about > the impact on collection performance. > Or maybe they chose identityHash because they can assume uniqueness (= effectively being ==)... > 5 improved, 2 harmed. And one of the listed harmed is MethodDictionary, > whose performance would not be harmed, but I assume the VM would not be > happy if their hashing was changed (anybody know for sure whether that's > true?) > The VM probably knows a lot about identityHash values, and most likely uses the primIdentityHash values because then it doesn't have to shift on access. > They could, and I admit to having written this kind of code in the past, > but I doubt that I'm typical in doing so. Do you know of any Pharo code > that actually *does* this sort of thing? There isn't any in the > distributed web image, but I didn't look at every package that is meant > to be loadable in Pharo. > I might suspect that Magma does this kind of stuff... but that's just a guess. I didn't immediately see any code doing so. As long as package maintainers are fine with two quite different versions of Pharo with very different identityHash method behaviors, then I do not have a problem. >> Clever hacks such as >> >> SomeObject>>hash >> >> ^(self variableA identityHash bitShift: 12) + self variableB identityHash >> >> >> would also remain undisturbed. >> > > Yes, if #identityHash is changed it's the clever hacks that will have to > change. This could be a disadvantage of this approach, but often, as in > the case of IdentityDictionary, IdentitySet, and > WeakIdentityKeyDictionary, the necessary change is simply to remove the > clever hack, get simpler code, and enjoy better performance than you got > with the clever hack, so making the change is IMO an improvement. > We agree, mod I wouldn't want to impose version maintenance homework on maintainers of large packages. For the sake of illustration only, and using Magma without knowing if it would be affected, I wouldn't want whoever is maintaining Magma to maintain two branches... one for Pharo 1.xyz, and one for Pharo 1.xyz++. >> Finally, I do not know of any Smalltalk >> in which identityHash does not answer the actual object header bits for >> the identity hash. If we change identityHash, then AFAIK Pharo would >> become the only Smalltalk in which identityHash does not answer >> consecutive values between 0 and (2^k)-1 (k=12 for Squeak/Pharo, k=14 >> for VisualWorks 32 bits, k=20 for VisualWorks 64 bits, IIRC k=15 for VA >> and VisualSmalltalk). >> > > GemStone is a Smalltalk that does not answer consecutive values for > identityHash. Haha, I was thinking of "regular" image based Smalltalks... > In GemStone the identityHash is computed from the object's > OOP, and OOPs are not consecutive. Not necessarily, although I suspect identityHash values map to an integer interval along the lines of [0, 2^40-1]. So, if you look at hash(x) as a function, the image of hash(x) is a set of consecutive intervals. Using bitShift: to scale identityHash values would make the image of hash(x) sparse (with the exception of small integers, characters and, to some extent in VW 64 bit, small doubles). > And Smalltalk-80 basically used the > same scheme, though you could only have 32K objects, every one had a > different identityHash based on OOP. > These are also consecutive values... [0, 2^15-1], basically. > Also, most (all?) Smalltalks with limited ranges for identityHash do > have a larger range of identityHash for SmallIntegers (usually ^self), > so you can't use the clever hacks if you might have any SmallIntegers in > your collection. So any general-purpose collection must already deal > with the full SmallInteger range of identity hashes as keys, cannot use > the clever hacks, and so is likely to only be improved by changing > #identityHash. This is a key point that I forgot to bring up last night. > Well, more or less, because with scaledIdentityHash you'd need to implement it in SmallInteger as ^self... but yes, I think hashed collections shouldn't be put into a position where they judge what's a good hash value and what isn't (and spend CPU time doing so at runtime!!!). Java does this, and as far as I could see back when I studied Java's hashing implementation, IMO it's not a good idea. Andres. _______________________________________________ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project