[sqlalchemy] Re: Python 2.6 hash behavior change
On Mar 5, 2008, at 3:35 AM, Denis S. Otkidach wrote: All hash table implementations (including one in Python) work the same way: first it looks up a list of key-value pairs by hash, and then iterate through the list to find a needed key by comparing it with = operator. Thus __eq__ method _is_ used to lookup values in dictionaries. So, this won't work in some cases: 1) when __eq__ may return False when comparing to identical (but not the same, since Python always checks with is first), 2) when __eq__ may return True for objects we want to be assumed different - in some rare cases when they produce the same hash. Although both are not our cases (we always return True, and using id() for hash guarantees they will never clash), I believe your suggestion to use IdentitySet and IdentityDict is a right direction. I want to establish a Expression-friendly token in the source code for sets and dicts such that we can change underlying implementations as needed. We were doing some source code browsing yesterday and it seems to use __cmp__() for the equality check - so we might not need to move to IdentitySet/Dict just yet. My knowledge of hashtables says that two objects with different hash values can still be subject to the equality comparison if they are placed in the same bucket, so we do need to worry about equality comparison in any case. We have filed http://bugs.python.org/issue2235 to deal with __hash__ not being checked along an inheritance hierarchy so at least as far as 2.6, we will be able to make a base class that provides a default __hash__() method, but it seems like Guido in py3k wants __hash__() and __eq__() to always be redefined at the same level.It seems overly rigid to me but I'm not the person to take up an argument about that (any takers ? :) ). It seems like the long term approach here is to get really nice IdentitySet and IdentityDicts going, and this may even be a place I want to start looking into optional, native extensions for performance purposes. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Python 2.6 hash behavior change
On Mon, Mar 3, 2008 at 8:23 PM, Michael Bayer [EMAIL PROTECTED] wrote: We define __eq__() all over the place so that would be a lot of __hash__() methods to add, all of which return id(self). I wonder if we shouldn't just make a util.Mixin called Hashable so that we can centralize the idea. Are you sure this is a correct way? Below is an example demonstrating the problem with it: class C(object): ... def __init__(self, value): ... self._value = value ... def __eq__(self, other): ... return self._value==other._value ... def __hash__(self): ... return id(self) ... c1 = C(1) c2 = C(1) c1==c2 True d = {c1: None} c1 in d True c2 in d False I.e. although c2 is equal to c1 and thus should be found in dictionary, it is not. The defined __hash__ method must return equal numbers for equal object. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Python 2.6 hash behavior change
On Mar 4, 2008, at 4:26 AM, Denis S. Otkidach wrote: On Mon, Mar 3, 2008 at 8:23 PM, Michael Bayer [EMAIL PROTECTED] wrote: We define __eq__() all over the place so that would be a lot of __hash__() methods to add, all of which return id(self). I wonder if we shouldn't just make a util.Mixin called Hashable so that we can centralize the idea. Are you sure this is a correct way? Below is an example demonstrating the problem with it: class C(object): ... def __init__(self, value): ... self._value = value ... def __eq__(self, other): ... return self._value==other._value ... def __hash__(self): ... return id(self) ... c1 = C(1) c2 = C(1) c1==c2 True d = {c1: None} c1 in d True c2 in d False I.e. although c2 is equal to c1 and thus should be found in dictionary, it is not. The defined __hash__ method must return equal numbers for equal object. Well actually, in our particular case that's the behavior that we *do* want; pretty much everywhere we've defined __eq__(), we've done it not to redefine what it means for a==b, but to produce SQL expressions - so in that sense __eq__() is entirely broken for its normal usage in SQLAlchemy (as well as in all the other SQL tools out there using this approach). For this reason, internally we can't do things like d in [a,b,c] if those are SQL expressions, since __eq__() evaluates to true in all cases - we use sets when we need a collection of SQL expressions where we can test for presence, so that their hash value is used. However, while Im not familiar with the internals of Python dictionaries, depending on how they implemented it we still may need to use IdentitySet and IdentityDict, two classes (well we have the first one at least) which ignore the __hash__() and __eq__() methods entirely and hash their contents strictly based on id(obj). This is because a hashtable usually stores items in buckets based on a modulus of the __hash__() value; if two items are in the same bucket, an equality comparison is used to locate the correct object. If Python's dict uses __eq__() for the equality comparison, we'd be in trouble. I have a strong suspicion that they do not (since I think we would have noticed by now), and that they use __hash__() for the equality comparison as well, but I'm not sure; and also not sure if this is slated to change in py2.6. I think I might want to look into defining in util ExpressionSet / ExpressionKeyDict set symbols (subject to the new names jek is sure to propose... ;) ) which would be used throughout the source code to store SQL expression constructs as keys. That way at least we can change the underlying implementation based on observed quirks of the version in use. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Python 2.6 hash behavior change
On Mar 3, 2008, at 11:23 AM, Michael Bayer wrote: On Mar 3, 2008, at 11:56 AM, Patrick Hartling wrote: hash(Works()) hash(Works2()) # Raises TypeError with Python 2.6 because Fails is deemed unhashable. hash(Fails()) Does anyone know of a workaround for this issue? So far, sqlalchemy.schema.PrimaryKeyConstraint and sqlalchemy.schema.Column are the two classes that have tripped me up. I have added __hash__() methods to both, but I cannot vouch for the correctness of the implementations. hmm, subtle difference between the 2.5 docs: If a class does not define a __cmp__() method it should not define a __hash__() operation either; if it defines __cmp__() or __eq__() but not __hash__(), its instances will not be usable as dictionary keys. If a class defines mutable objects and implements a __cmp__() or __eq__() method, it should not implement __hash__(), since the dictionary implementation requires that a key's hash value is immutable (if the object's hash value changes, it will be in the wrong hash bucket). and the 2.6 docs: If a class does not define a __cmp__() or __eq__() method it should not define a __hash__() operation either; if it defines __cmp__() or __eq__() but not __hash__(), its instances will not be usable as dictionary keys. If a class defines mutable objects and implements a __cmp__() or __eq__() method, it should not implement __hash__(), since the dictionary implementation requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket). both claim that if we define __eq__() but not __hash__(), it wont be useable as a dictionary key. but in the case of 2.5 this seems to be incorrect, or at least not enforced. The only difference here is that 2.6 says if we dont define __cmp__() or __eq__(), then we shouldn't define __hash__() either whereas 2.5 only mentions __cmp__() in that regard. Subtle indeed. We define __eq__() all over the place so that would be a lot of __hash__() methods to add, all of which return id(self). I wonder if we shouldn't just make a util.Mixin called Hashable so that we can centralize the idea. That sounds like a reasonable approach. I have not looked much at the SQLAlchemy internals before today, but if I follow your idea correctly, I could apply that solution wherever I run into problems during testing. -Patrick -- Patrick L. Hartling Senior Software Engineer, Priority 5 http://www.priority5.com/ PGP.sig Description: This is a digitally signed message part