Re: hashkey/digest for a complex object

2010-10-10 Thread Hrvoje Niksic
Steven D'Aprano st...@remove-this-cybersource.com.au writes:

 On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
 for any hashable x (this is a simple consequence of the fact that
 hash(x) == x for any int x (by 'int' I mean 2.X int)).

 It's a beautiful theory, but, alas, it is not the case.

 hash(-1) == -1
 False

This is a counter-example for the (invalid) premise that hash(x) == x,
but not for the invariant of hash(hash(x)) == hash(x).

 hash(hash(-1)) == hash(-1)
True
 hash(hash(2**64)) == hash(2**64)
True

 Aside: what do you mean by '2.x int'? Do you mean an int in 2.x versions 
 before, or after, ints and longs were partially integrated?

I would take it to mean the type 2.x calls 'int', i.e. fixed-width
integer type.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-10 Thread Arnaud Delobelle
Steven D'Aprano st...@remove-this-cybersource.com.au writes:

 On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
 for any hashable x (this is a simple consequence of the fact that
 hash(x) == x for any int x (by 'int' I mean 2.X int)).

 It's a beautiful theory, but, alas, it is not the case.

 hash(-1) == -1
 False
 hash(2**64) == 2**64
 False

 to give only two of an infinite number of counter-examples.

I can only see one counterexample, (-1).  2**64 is of type 'long' in 2.X
on your machine (or, to be pedantic, 2.x where x = 2).  And, in fact,
(-1) is the only int such that hash(x) != x.

In can only guess that (-1) is a value that has a special meaning when
hashing.  Try this (Python 2.6):

 class A(object):
... def __hash__(self): return -1
... 
 a = A()
 hash(a)
-2


 Aside: what do you mean by '2.x int'? Do you mean an int in 2.x versions 
 before, or after, ints and longs were partially integrated?

Either.

 [st...@sylar ~]$ python2.1
 Python 2.1.3 (#1, Aug 12 2010, 01:53:57)
 [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
 Type copyright, credits or license for more information.
 2**64
 Traceback (most recent call last):
   File stdin, line 1, in ?
 OverflowError: integer exponentiation



 People keep forgetting that 2.2 introduced nearly as many far-reaching 
 changes as 3.0.

I didn't forget, but what bearing does it have on this particular issue?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-10 Thread kj
In 87hbgu8irb@gmail.com Arnaud Delobelle arno...@gmail.com writes:

Steven D'Aprano st...@remove-this-cybersource.com.au writes:

 On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
 for any hashable x (this is a simple consequence of the fact that
 hash(x) == x for any int x (by 'int' I mean 2.X int)).

 It's a beautiful theory, but, alas, it is not the case.

 hash(-1) == -1
 False
 hash(2**64) == 2**64
 False

 to give only two of an infinite number of counter-examples.

(Infinite???)

And, in fact,
(-1) is the only int such that hash(x) != x.

Arnaud, how did you determine that -1 is the only such int?  I
can't imagine any other approach other than a brute-force check of
all ints...  When I tried this I ran into unforeseen limitations
in xrange, etc.  It's all very doable, but still, it would take at
least about 3 hours on my laptop.

In can only guess that (-1) is a value that has a special meaning when
hashing.  Try this (Python 2.6):

 class A(object):
... def __hash__(self): return -1
... 
 a = A()
 hash(a)
-2

Very cool.

BTW, thank you for the explanation in your previous post.  It makes
a lot of sense.  I find it interesting (as Hrvoje pointed out) that
the hash function is (or appears to be) idempotent on integers
(long or not), even though it is not the identity on the integers.
Thanks to Steven for the counterexamples to show the latter.  I've
learned tons from this exchange.

~kj
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-10 Thread Steven D'Aprano
On Sun, 10 Oct 2010 11:06:00 +0100, Arnaud Delobelle wrote:

 Steven D'Aprano st...@remove-this-cybersource.com.au writes:
 
 On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x)
 hold for any hashable x (this is a simple consequence of the fact that
 hash(x) == x for any int x (by 'int' I mean 2.X int)).

 It's a beautiful theory, but, alas, it is not the case.

 hash(-1) == -1
 False
 hash(2**64) == 2**64
 False

 to give only two of an infinite number of counter-examples.
 
 I can only see one counterexample, (-1).  2**64 is of type 'long' in 2.X
 on your machine (or, to be pedantic, 2.x where x = 2).  And, in fact,
 (-1) is the only int such that hash(x) != x.

Fair point. I was mistaken.

I had the impression that the integration between ints and longs since 
Python 2.2 was more extensive than it actually is. I made the above 
comments based on the idea that since Python 2.2, longs are a subclass of 
ints, e.g. that isinstance(2**64, int) would return True. Turns out that 
I'm wrong, in which case I agree with you that -1 is the only int counter-
example.

As an aside, this makes me glad that I have continued writing 
isinstance(x, (int, long)) in my code, even though I knew it was 
unnecessary. Not the first time, and probably not the last, that a this 
can never happen test saved the day.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-10 Thread Steven D'Aprano
On Sun, 10 Oct 2010 13:49:09 +, kj wrote:

 to give only two of an infinite number of counter-examples.
 
 (Infinite???)

I was mistaken. Given Arnaud's specification that we look only at the 
Python 2.x ints, I believe that there is only one counter-example, namely 
-1.

However, in Python 3.x I would be correct. The reasoning is a simple 
application of the pigeon-hole principle. Since hash(n) is limited to 
returning a finite number of distinct results, but there are an infinite 
number of Python 3 ints, it follows that there must be an infinite number 
of collisions.


And, in fact,
(-1) is the only int such that hash(x) != x.
 
 Arnaud, how did you determine that -1 is the only such int?  I can't
 imagine any other approach other than a brute-force check of all ints...

Reading the source code is also a good approach.

I don't read C very well, but even I can work out what this is doing:

static long
int_hash(PyIntObject *v)
{
/* XXX If this is changed, you also need to change the way
   Python's long, float and complex types are hashed. */
long x = v - ob_ival;
if (x == -1)
x = -2;
return x;
}


(from intobject.c in Python 2.6.1). It takes a Python int object, 
extracts the value of it as a C long, returns -2 if the value is -1 
otherwise returns the value.


  When I tried this I ran into unforeseen limitations in xrange, etc. 
 It's all very doable, but still, it would take at least about 3 hours on
 my laptop.


What are you doing? This takes less than 20 seconds to test up to 2**24:

import time
def test(n):
t = time.time()
assert hash(0) == 0
assert hash(1) == 1
assert hash(-1) == -2
for i in xrange(2, n):
assert hash(i) == i, failed %d % i
assert hash(-i) == -i, failed %d % -i
t = time.time() - t
return t


 test(2**24)
18.076412916183472

Since 2**31 is 2**7 = 128 times bigger, I estimate that testing 
everything up to 2**31 should take less than 45 minutes. And my PC is not 
exactly the fastest machine on the block.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-10 Thread Arnaud Delobelle
kj no.em...@please.post writes:

 In 87hbgu8irb@gmail.com Arnaud Delobelle arno...@gmail.com writes:

[...]

And, in fact,
(-1) is the only int such that hash(x) != x.

 Arnaud, how did you determine that -1 is the only such int?  I
 can't imagine any other approach other than a brute-force check of
 all ints...  When I tried this I ran into unforeseen limitations
 in xrange, etc.  It's all very doable, but still, it would take at
 least about 3 hours on my laptop.

I looked at the source, namely the get_hash() function in the
intobject.c file in the 2.x source, and the get_hash() function in the
longobject.c file in the 3.x source.

In can only guess that (-1) is a value that has a special meaning when
hashing.  Try this (Python 2.6):

 class A(object):
... def __hash__(self): return -1
... 
 a = A()
 hash(a)
-2

 Very cool.

 BTW, thank you for the explanation in your previous post.  It makes
 a lot of sense.  I find it interesting (as Hrvoje pointed out) that
 the hash function is (or appears to be) idempotent on integers
 (long or not), even though it is not the identity on the integers.
 Thanks to Steven for the counterexamples to show the latter.  I've
 learned tons from this exchange.

It still seems to hold that hash() is idempotent on all objects.

I have learnt too that hash(-1) is not (-1), and that it seems that a
hash value is not allowed to be (-1).  There is one thing left to find
out.  Why can't it be (-1)?  Maybe looking at the source for the hash()
builtin would yield a clue, or maybe someone who knows would be able to
tell us?

-- 
Arnaud
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-10 Thread Hrvoje Niksic
Arnaud Delobelle arno...@gmail.com writes:

 I have learnt too that hash(-1) is not (-1), and that it seems that a
 hash value is not allowed to be (-1).  There is one thing left to find
 out.  Why can't it be (-1)?

Because -1 has a special meaning in the C function equivalent to
Python's hash().  PyObject_Hash returning -1 means that an exception was
raised somewhere inside the object's __hash__ method.  For that reason
hash functions that really return -1 must change that to another value,
and -2 is as good a replacement as any.

This is documented in
http://docs.python.org/c-api/object.html?highlight=pyobject_hash#PyObject_Hash
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-10 Thread kj
In 4cb1dc9a$0$29970$c3e8da3$54964...@news.astraweb.com Steven D'Aprano 
st...@remove-this-cybersource.com.au writes:

Reading the source code is also a good approach.

Every time I have attempted to answer a question by looking at the
Python C source, all I've achieved was wasting time, sometimes a
*lot* of time, so by now I've developed what can only be described
as a phobia to it.  I probably need professional help at this point.

~kj
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-09 Thread kj
In 87y6a9lqnj@gmail.com Arnaud Delobelle arno...@gmail.com writes:

You could do something like this:

deep_methods = {   
list: lambda f, l: tuple(map(f, l)),
dict: lambda f, d: frozenset((k, f(v)) for k, v in d.items()),
set: lambda f, s: frozenset(map(f, s)),
# Add more if needed
}

def apply_method(f, obj):
try:
method = deep_methods[type(obj)]
except KeyError:
return obj
return method(f, obj)

def deepfreeze(obj):
Return a 'hashable version' of an object
return apply_method(deepfreeze, obj)

def deephash(obj):
Return hash(deepfreeze(obj)) without deepfreezing
return hash(apply_method(deephash, obj))

# Example of deepfreezable object:
obj = [1, foo, {(2, 4): {7, 5, 4}, bar: baz}]
   ^   ^
   |   |
   `---`--- what's this?


 deepfreeze(obj)
(1, 'foo', frozenset({('bar', 'baz'), ((2, 4), frozenset({4, 5, 7}))}))
 deephash(obj)
1341422540
 hash(deepfreeze(obj))
1341422540


After fixing the missing  in deepfreeze this code works as
advertised, but I'm mystified by the identity between hash(deepfreeze(...))
and deephash(...).  Without some knowledge of the Python internals,
I don't see how this follows.

More specifically, it is not obvious to me that, for example,

hash(frozenset((whatever,)))

would be identical to

hash(frozenset((hash(whatever),)))

but this identity has held every time I've checked it.  Similarly
for other more complicated variations on this theme.

Anyway, thanks for the code.  It's very useful.

~kj
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-09 Thread Arnaud Delobelle
kj no.em...@please.post writes:

 In 87y6a9lqnj@gmail.com Arnaud Delobelle arno...@gmail.com writes:

You could do something like this:

deep_methods = {   
list: lambda f, l: tuple(map(f, l)),
dict: lambda f, d: frozenset((k, f(v)) for k, v in d.items()),
set: lambda f, s: frozenset(map(f, s)),
# Add more if needed
}

def apply_method(f, obj):
try:
method = deep_methods[type(obj)]
except KeyError:
return obj
return method(f, obj)

def deepfreeze(obj):
Return a 'hashable version' of an object
return apply_method(deepfreeze, obj)

def deephash(obj):
Return hash(deepfreeze(obj)) without deepfreezing
return hash(apply_method(deephash, obj))

# Example of deepfreezable object:
obj = [1, foo, {(2, 4): {7, 5, 4}, bar: baz}]
^   ^
|   |
`---`--- what's this?

This is set literal notation, introduced in Python 3!
In 2.X, you would write set([7, 5, 4])



 deepfreeze(obj)
(1, 'foo', frozenset({('bar', 'baz'), ((2, 4), frozenset({4, 5, 7}))}))
 deephash(obj)
1341422540
 hash(deepfreeze(obj))
1341422540


 After fixing the missing  in deepfreeze this code works as
 advertised, 

Oops, I must admit I added the docstrings after pasting the code :)

 but I'm mystified by the identity between hash(deepfreeze(...))  and
 deephash(...).  Without some knowledge of the Python internals, I
 don't see how this follows.

OK.  I haven't actually proved it, but it follows from the following
facts:

1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
for any hashable x (this is a simple consequence of the fact that
hash(x) == x for any int x (by 'int' I mean 2.X int)).

2. Container hashable objects compute their hash from the hash of their
elements.

I don't think either of these two facts is documented, but it would be quite
easy to verify them in the Python source (I must admit I have not done
it).  And it is difficult to conceive how it could be otherwise.

-- 
Arnaud
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-09 Thread Steven D'Aprano
On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote:

 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold
 for any hashable x (this is a simple consequence of the fact that
 hash(x) == x for any int x (by 'int' I mean 2.X int)).

It's a beautiful theory, but, alas, it is not the case.

 hash(-1) == -1
False
 hash(2**64) == 2**64
False

to give only two of an infinite number of counter-examples.

Aside: what do you mean by '2.x int'? Do you mean an int in 2.x versions 
before, or after, ints and longs were partially integrated?

[st...@sylar ~]$ python2.1
Python 2.1.3 (#1, Aug 12 2010, 01:53:57)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type copyright, credits or license for more information.
 2**64
Traceback (most recent call last):
  File stdin, line 1, in ?
OverflowError: integer exponentiation



People keep forgetting that 2.2 introduced nearly as many far-reaching 
changes as 3.0.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-07 Thread Terry Reedy

On 10/6/2010 2:58 PM, kj wrote:


These objects are non-mutable once they are created,


See below.


like to use a two-step comparison for equality, based on the
assumption that I can compute (either at creation time, or as needed
and memoized) a hashkey/digest for each object.  The test for
equality of two of these objects would first compare their hashkeys.
If they are different, the two objects are declared different; if
they match, then a more stringent test for equality is performed.


I believe Python strings do this (cache the hash). Equality comparison 
can check length, hashes, and only then chars.



So the problem is to come up with a reasonable hashkey for each of
these objects.  The objects have two significant attributes, and
two of these objects should be regarded as equal if these attributes
are the same in both.  The first attribute is a simple dictionary
whose keys are integers and values are strings.  The second attribute
is more complicated.  It is a tree structure, represented as a
dictionary of dictionaries of dictionaries... until we get to the
leaf elements, which are frozensets of strings.  The keys at every
level of this structure are strings.  E.g. a simple example of such
an attribute would look like:

{'A': {'a': set(['1', '2', '3']),
'b': set(['4', '5'])},
  'B': set(['6', '7', '8'])}


If these two attributes, and hence the dicts, are public, then your 
instances are mutable.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-07 Thread kj
In mailman.1415.1286438617.29448.python-l...@python.org Terry Reedy 
tjre...@udel.edu writes:

If these two attributes, and hence the dicts, are public, then your 
instances are mutable.

I guess I should have written immutable among consenting adults.

As far as I know, Python does not support private attributes, so
I guess the dicts are public no matter what I do.  I suppose that
I can implement frozendict, but I can't think of any way to
enforce the immutability of these frozendicts that would not be
trivial to circumvent (it better be, in fact, otherwise I wouldn't
be able to initialize the damn things).

If you had something else in mind, please let me know.

~kj
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-07 Thread kj
In m2fwwjazs2@web.de de...@web.de (Diez B. Roggisch) writes:

kj no.em...@please.post writes:

 The short version of this question is: where can I find the algorithm
 used by the tuple class's __hash__ method?

Surprisingly, in the source:

http://google.com/codesearch/p?hl=de#-2BKs-LW4I0/trunk/python/src/Objects/tupleobject.cq=python%20tuplehashsa=Ncd=1ct=rc

Thanks to you, and to all who pointed me to the place in the source
where this is.

How exactly did you search for this?  Taking a hint from the url
above, I went to Google Code Search and searched for python tuple
hash (minus the quotes, of course), which produced a ton of
irrelevant stuff (almost 80K hits).  Searching for python tuple
hash lang:c cut down the number of hits to ~8K, but still too much
to wade through.  Clearly I need a smarter search strategy (one
that does not include foreknowledge of the name of the actual
function in the C source, of course).

~kj
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-07 Thread Diez B. Roggisch
kj no.em...@please.post writes:

 In m2fwwjazs2@web.de de...@web.de (Diez B. Roggisch) writes:

kj no.em...@please.post writes:

 The short version of this question is: where can I find the algorithm
 used by the tuple class's __hash__ method?

Surprisingly, in the source:

http://google.com/codesearch/p?hl=de#-2BKs-LW4I0/trunk/python/src/Objects/tupleobject.cq=python%20tuplehashsa=Ncd=1ct=rc

 Thanks to you, and to all who pointed me to the place in the source
 where this is.

 How exactly did you search for this?  Taking a hint from the url
 above, I went to Google Code Search and searched for python tuple
 hash (minus the quotes, of course), which produced a ton of
 irrelevant stuff (almost 80K hits).  Searching for python tuple
 hash lang:c cut down the number of hits to ~8K, but still too much
 to wade through.  Clearly I need a smarter search strategy (one
 that does not include foreknowledge of the name of the actual
 function in the C source, of course).

I tried codesearch first. Not satisfied after 30 seconds with the
results, I did the most straight forward thing - I downloaded and
un-packed the python source... and took a look. 

From that I learned the tuplehash function name.

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-07 Thread kj
In 87pqvmp611@web.de de...@web.de (Diez B. Roggisch) writes:

I tried codesearch first. Not satisfied after 30 seconds with the
results, I did the most straight forward thing - I downloaded and
un-packed the python source... and took a look. 

From that I learned the tuplehash function name.

You must be at least somewhat familiar with the Python source.
Everytime I've peeked into it I just feel lost, but it's clearly
something I need to master sooner or later...  I can't wait for
the next one of those occasional introductions to the Python
internals at our local Python club.

Thanks,

~kj
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-07 Thread Diez B. Roggisch
kj no.em...@please.post writes:

 In 87pqvmp611@web.de de...@web.de (Diez B. Roggisch) writes:

I tried codesearch first. Not satisfied after 30 seconds with the
results, I did the most straight forward thing - I downloaded and
un-packed the python source... and took a look. 

From that I learned the tuplehash function name.

 You must be at least somewhat familiar with the Python source.
 Everytime I've peeked into it I just feel lost, but it's clearly
 something I need to master sooner or later...  I can't wait for
 the next one of those occasional introductions to the Python
 internals at our local Python club.

No, I'm not the tiniest bit. I just followed my instincts in looking
into the Objects folder, because that's where I suspected the
definition of objects to be

And grep has been proven useful in these cases as well.

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-07 Thread MRAB

On 07/10/2010 11:42, kj wrote:

Inmailman.1415.1286438617.29448.python-l...@python.org  Terry 
Reedytjre...@udel.edu  writes:


If these two attributes, and hence the dicts, are public, then your
instances are mutable.


I guess I should have written immutable among consenting adults.

As far as I know, Python does not support private attributes, so
I guess the dicts are public no matter what I do.  I suppose that
I can implement frozendict, but I can't think of any way to
enforce the immutability of these frozendicts that would not be
trivial to circumvent (it better be, in fact, otherwise I wouldn't
be able to initialize the damn things).


You would initialise them by creating them from a list of tuples (or an
iterable which yields tuples), like with a dict:

 dict([(1, foo), (2, bar)])
{1: 'foo', 2: 'bar'}
--
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-07 Thread Arnaud Delobelle
kj no.em...@please.post writes:

 The short version of this question is: where can I find the algorithm
 used by the tuple class's __hash__ method?

 Now, for the long version of this question, I'm working with some
 complext Python objects that I want to be able to compare for
 equality easily.

 These objects are non-mutable once they are created, so I would
 like to use a two-step comparison for equality, based on the
 assumption that I can compute (either at creation time, or as needed
 and memoized) a hashkey/digest for each object.  The test for
 equality of two of these objects would first compare their hashkeys.
 If they are different, the two objects are declared different; if
 they match, then a more stringent test for equality is performed.

 So the problem is to come up with a reasonable hashkey for each of
 these objects.  The objects have two significant attributes, and
 two of these objects should be regarded as equal if these attributes
 are the same in both.  The first attribute is a simple dictionary
 whose keys are integers and values are strings.  The second attribute
 is more complicated.  It is a tree structure, represented as a
 dictionary of dictionaries of dictionaries... until we get to the
 leaf elements, which are frozensets of strings.  The keys at every
 level of this structure are strings.  E.g. a simple example of such
 an attribute would look like:

 {'A': {'a': set(['1', '2', '3']),
'b': set(['4', '5'])},
  'B': set(['6', '7', '8'])}

 I'm looking for a good algorithm for computing a hash key for
 something like this?  (Basically I'm looking for a good way to
 combine hashkeys.)

You could do something like this:


deep_methods = {   
list: lambda f, l: tuple(map(f, l)),
dict: lambda f, d: frozenset((k, f(v)) for k, v in d.items()),
set: lambda f, s: frozenset(map(f, s)),
# Add more if needed
}

def apply_method(f, obj):
try:
method = deep_methods[type(obj)]
except KeyError:
return obj
return method(f, obj)

def deepfreeze(obj):
Return a 'hashable version' of an object
return apply_method(deepfreeze, obj)

def deephash(obj):
Return hash(deepfreeze(obj)) without deepfreezing
return hash(apply_method(deephash, obj))

# Example of deepfreezable object:
obj = [1, foo, {(2, 4): {7, 5, 4}, bar: baz}]


 deepfreeze(obj)
(1, 'foo', frozenset({('bar', 'baz'), ((2, 4), frozenset({4, 5, 7}))}))
 deephash(obj)
1341422540
 hash(deepfreeze(obj))
1341422540


-- 
Arnaud
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-06 Thread Duncan Booth
kj no.em...@please.post wrote:

 The short version of this question is: where can I find the algorithm
 used by the tuple class's __hash__ method?
 

http://svn.python.org/view/python/trunk/Objects/tupleobject.c?revision=81029view=markup

static long
tuplehash(PyTupleObject *v)
{
register long x, y;
register Py_ssize_t len = Py_SIZE(v);
register PyObject **p;
long mult = 103L;
x = 0x345678L;
p = v-ob_item;
while (--len = 0) {
y = PyObject_Hash(*p++);
if (y == -1)
return -1;
x = (x ^ y) * mult;
/* the cast might truncate len; that doesn't change hash stability */
mult += (long)(82520L + len + len);
}
x += 97531L;
if (x == -1)
x = -2;
return x;
}



 I'm looking for a good algorithm for computing a hash key for
 something like this?  (Basically I'm looking for a good way to
 combine hashkeys.)

If you want to combine the hashes from several objects then the 
easiest way is just to create a tuple of the objects and hash it.

-- 
Duncan Booth http://kupuguy.blogspot.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-06 Thread geremy condra
On Wed, Oct 6, 2010 at 11:58 AM, kj no.em...@please.post wrote:


 The short version of this question is: where can I find the algorithm
 used by the tuple class's __hash__ method?

From Objects/tuple.c, line 315 in Python3.2:

static long
tuplehash(PyTupleObject *v)
{
register long x, y;
register Py_ssize_t len = Py_SIZE(v);
register PyObject **p;
long mult = 103L;
x = 0x345678L;
p = v-ob_item;
while (--len = 0) {
y = PyObject_Hash(*p++);
if (y == -1)
return -1;
x = (x ^ y) * mult;
/* the cast might truncate len; that doesn't change hash stability */
mult += (long)(82520L + len + len);
}
x += 97531L;
if (x == -1)
x = -2;
return x;
}

 Now, for the long version of this question, I'm working with some
 complext Python objects that I want to be able to compare for
 equality easily.

 These objects are non-mutable once they are created, so I would
 like to use a two-step comparison for equality, based on the
 assumption that I can compute (either at creation time, or as needed
 and memoized) a hashkey/digest for each object.  The test for
 equality of two of these objects would first compare their hashkeys.
 If they are different, the two objects are declared different; if
 they match, then a more stringent test for equality is performed.

 So the problem is to come up with a reasonable hashkey for each of
 these objects.  The objects have two significant attributes, and
 two of these objects should be regarded as equal if these attributes
 are the same in both.  The first attribute is a simple dictionary
 whose keys are integers and values are strings.  The second attribute
 is more complicated.  It is a tree structure, represented as a
 dictionary of dictionaries of dictionaries... until we get to the
 leaf elements, which are frozensets of strings.  The keys at every
 level of this structure are strings.  E.g. a simple example of such
 an attribute would look like:

 {'A': {'a': set(['1', '2', '3']),
       'b': set(['4', '5'])},
  'B': set(['6', '7', '8'])}

 I'm looking for a good algorithm for computing a hash key for
 something like this?  (Basically I'm looking for a good way to
 combine hashkeys.)

Sounds like you're trying to hash mutable data structures, which is a
no-no, but assuming you've got that part all figured out then the easy
way out would be to print the whole thing and take the hash of the
resulting string. A (possibly) better way would be to do a Schwartzian
transform[1] and freeze everything. Other approaches may be better,
but those are the first two out of my head.

Geremy Condra

[1] http://en.wikipedia.org/wiki/Schwartzian_transform
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-06 Thread Diez B. Roggisch
kj no.em...@please.post writes:

 The short version of this question is: where can I find the algorithm
 used by the tuple class's __hash__ method?

Surprisingly, in the source:

http://google.com/codesearch/p?hl=de#-2BKs-LW4I0/trunk/python/src/Objects/tupleobject.cq=python%20tuplehashsa=Ncd=1ct=rc

 Now, for the long version of this question, I'm working with some
 complext Python objects that I want to be able to compare for
 equality easily.

 These objects are non-mutable once they are created, so I would
 like to use a two-step comparison for equality, based on the
 assumption that I can compute (either at creation time, or as needed
 and memoized) a hashkey/digest for each object.  The test for
 equality of two of these objects would first compare their hashkeys.
 If they are different, the two objects are declared different; if
 they match, then a more stringent test for equality is performed.

 So the problem is to come up with a reasonable hashkey for each of
 these objects.  The objects have two significant attributes, and
 two of these objects should be regarded as equal if these attributes
 are the same in both.  The first attribute is a simple dictionary
 whose keys are integers and values are strings.  The second attribute
 is more complicated.  It is a tree structure, represented as a
 dictionary of dictionaries of dictionaries... until we get to the
 leaf elements, which are frozensets of strings.  The keys at every
 level of this structure are strings.  E.g. a simple example of such
 an attribute would look like:

 {'A': {'a': set(['1', '2', '3']),
'b': set(['4', '5'])},
  'B': set(['6', '7', '8'])}

 I'm looking for a good algorithm for computing a hash key for
 something like this?  (Basically I'm looking for a good way to
 combine hashkeys.)

Creating tuples from dicts, recursively, and stabilized by using sorted
on items.

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: hashkey/digest for a complex object

2010-10-06 Thread Robert Kern

On 10/6/10 1:58 PM, kj wrote:


The short version of this question is: where can I find the algorithm
used by the tuple class's __hash__ method?


The function tuplehash() in Objects/tupleobject.c, predictably enough.


Now, for the long version of this question, I'm working with some
complext Python objects that I want to be able to compare for
equality easily.

These objects are non-mutable once they are created, so I would
like to use a two-step comparison for equality, based on the
assumption that I can compute (either at creation time, or as needed
and memoized) a hashkey/digest for each object.  The test for
equality of two of these objects would first compare their hashkeys.
If they are different, the two objects are declared different; if
they match, then a more stringent test for equality is performed.


The most straightforward way to implement __hash__ for a complicated object is 
to normalize its relevant data to a tuple of hashable objects and then call 
hash() on that tuple. A straightforward way to compare such objects is to 
calculate that very same normalized tuple for each one and compare those tuples. 
Cache it if necessary. Don't bother hashing to implement __eq__ unless if you 
are really optimizing for space and time.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list