[sqlalchemy] Re: Python 2.6 hash behavior change

2008-03-05 Thread Michael Bayer


On Mar 5, 2008, at 3:35 AM, Denis S. Otkidach wrote:

 All hash table implementations (including one in Python) work the same
 way: first it looks up a list of key-value pairs by hash, and then
 iterate through the list to find a needed key by comparing it with =
 operator. Thus __eq__ method _is_ used to lookup values in
 dictionaries. So, this won't work in some cases:
 1) when __eq__ may return False when comparing to identical (but not
 the same, since Python always checks with is first),
 2) when __eq__ may return True for objects we want to be assumed
 different - in some rare cases when they produce the same hash.

 Although both are not our cases (we always return True, and using id()
 for hash guarantees they will never clash), I believe your suggestion
 to use IdentitySet and IdentityDict is a right direction.

I want to establish a Expression-friendly token in the source code  
for sets and dicts such that we can change underlying implementations  
as needed.  We were doing some source code browsing yesterday and it  
seems to use __cmp__() for the equality check - so we might not need  
to move to IdentitySet/Dict just yet.

My knowledge of hashtables says that two objects with different hash  
values can still be subject to the equality comparison if they are  
placed in the same bucket, so we do need to worry about equality  
comparison in any case.

We have filed http://bugs.python.org/issue2235 to deal with __hash__  
not being checked along an inheritance hierarchy so at least as far as  
2.6, we will be able to make a base class that provides a default  
__hash__() method, but it seems like Guido in py3k wants __hash__()  
and __eq__() to always be redefined at the same level.It seems  
overly rigid to me but I'm not the person to take up an argument about  
that (any takers ? :)  ).

It seems like the long term approach here is to get really nice  
IdentitySet and IdentityDicts going, and this may even be a place I  
want to start looking into optional, native extensions for performance  
purposes.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Python 2.6 hash behavior change

2008-03-04 Thread Denis S. Otkidach

On Mon, Mar 3, 2008 at 8:23 PM, Michael Bayer [EMAIL PROTECTED] wrote:
  We define __eq__() all over the place so that would be a lot of
  __hash__() methods to add, all of which return id(self).  I wonder if
  we shouldn't just make a util.Mixin called Hashable so that we can
  centralize the idea.

Are you sure this is a correct way? Below is an example demonstrating
the problem with it:

 class C(object):
... def __init__(self, value):
... self._value = value
... def __eq__(self, other):
... return self._value==other._value
... def __hash__(self):
... return id(self)
...
 c1 = C(1)
 c2 = C(1)
 c1==c2
True
 d = {c1: None}
 c1 in d
True
 c2 in d
False

I.e. although c2 is equal to c1 and thus should be found in
dictionary, it is not. The defined __hash__ method must return equal
numbers for equal object.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Python 2.6 hash behavior change

2008-03-04 Thread Michael Bayer


On Mar 4, 2008, at 4:26 AM, Denis S. Otkidach wrote:


 On Mon, Mar 3, 2008 at 8:23 PM, Michael Bayer [EMAIL PROTECTED] 
  wrote:
 We define __eq__() all over the place so that would be a lot of
 __hash__() methods to add, all of which return id(self).  I wonder if
 we shouldn't just make a util.Mixin called Hashable so that we can
 centralize the idea.

 Are you sure this is a correct way? Below is an example demonstrating
 the problem with it:

 class C(object):
 ... def __init__(self, value):
 ... self._value = value
 ... def __eq__(self, other):
 ... return self._value==other._value
 ... def __hash__(self):
 ... return id(self)
 ...
 c1 = C(1)
 c2 = C(1)
 c1==c2
 True
 d = {c1: None}
 c1 in d
 True
 c2 in d
 False

 I.e. although c2 is equal to c1 and thus should be found in
 dictionary, it is not. The defined __hash__ method must return equal
 numbers for equal object.


Well actually, in our particular case that's the behavior that we *do*  
want; pretty much everywhere we've defined __eq__(), we've done it not  
to redefine what it means for a==b, but to produce SQL expressions -  
so in that sense __eq__() is entirely broken for its normal usage in  
SQLAlchemy (as well as in all the other SQL tools out there using this  
approach).  For this reason, internally we can't do things like d in  
[a,b,c] if those are SQL expressions, since __eq__() evaluates to  
true in all cases - we use sets when we need a collection of SQL  
expressions where we can test for presence, so that their hash value  
is used.

However, while Im not familiar with the internals of Python  
dictionaries, depending on how they implemented it we still may need  
to use IdentitySet and IdentityDict, two classes (well we have the  
first one at least) which ignore the __hash__() and __eq__() methods  
entirely and hash their contents strictly based on id(obj).   This is  
because a hashtable usually stores items in buckets based on a  
modulus of the __hash__() value; if two items are in the same bucket,  
an equality comparison is used to locate the correct object.  If  
Python's dict uses __eq__() for the equality comparison, we'd be in  
trouble.  I have a strong suspicion that they do not (since I think we  
would have noticed by now), and that they use __hash__() for the  
equality comparison as well, but I'm not sure; and also not sure if  
this is slated to change in py2.6.

I think I might want to look into defining in util  ExpressionSet /  
ExpressionKeyDict set symbols (subject to the new names jek is sure to  
propose... ;)  ) which would be used throughout the source code to  
store SQL expression constructs as keys.  That way at least we can  
change the underlying implementation based on observed quirks of the  
version in use.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Python 2.6 hash behavior change

2008-03-03 Thread Patrick Hartling

On Mar 3, 2008, at 11:23 AM, Michael Bayer wrote:




On Mar 3, 2008, at 11:56 AM, Patrick Hartling wrote:



hash(Works())
hash(Works2())
# Raises TypeError with Python 2.6 because Fails is deemed  
unhashable.

hash(Fails())

Does anyone know of a workaround for this issue? So far,
sqlalchemy.schema.PrimaryKeyConstraint and sqlalchemy.schema.Column
are the two classes that have tripped me up. I have added __hash__()
methods to both, but I cannot vouch for the correctness of the
implementations.



hmm, subtle difference between the 2.5 docs:

If a class does not define a __cmp__() method it should not define a
__hash__() operation either; if it defines __cmp__() or __eq__() but
not __hash__(), its instances will not be usable as dictionary keys.
If a class defines mutable objects and implements a __cmp__() or
__eq__() method, it should not implement __hash__(), since the
dictionary implementation requires that a key's hash value is
immutable (if the object's hash value changes, it will be in the wrong
hash bucket).

and the 2.6 docs:

If a class does not define a __cmp__() or __eq__() method it should
not define a __hash__() operation either; if it defines __cmp__() or
__eq__() but not __hash__(), its instances will not be usable as
dictionary keys. If a class defines mutable objects and implements a
__cmp__() or __eq__() method, it should not implement __hash__(),
since the dictionary implementation requires that a key’s hash value
is immutable (if the object’s hash value changes, it will be in the
wrong hash bucket).

both claim that if we define __eq__() but not __hash__(), it wont be
useable as a dictionary key.  but in the case of 2.5 this seems to be
incorrect, or at least not enforced.  The only difference here is that
2.6 says  if we dont define __cmp__() or __eq__(), then we shouldn't
define __hash__() either whereas 2.5 only mentions __cmp__() in that
regard.


Subtle indeed.


We define __eq__() all over the place so that would be a lot of
__hash__() methods to add, all of which return id(self).  I wonder if
we shouldn't just make a util.Mixin called Hashable so that we can
centralize the idea.



That sounds like a reasonable approach. I have not looked much at the  
SQLAlchemy internals before today, but if I follow your idea  
correctly, I could apply that solution wherever I run into problems  
during testing.


 -Patrick


--
Patrick L. Hartling
Senior Software Engineer, Priority 5
http://www.priority5.com/



PGP.sig
Description: This is a digitally signed message part