Re: hashkey/digest for a complex object
Steven D'Aprano st...@remove-this-cybersource.com.au writes: On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote: 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold for any hashable x (this is a simple consequence of the fact that hash(x) == x for any int x (by 'int' I mean 2.X int)). It's a beautiful theory, but, alas, it is not the case. hash(-1) == -1 False This is a counter-example for the (invalid) premise that hash(x) == x, but not for the invariant of hash(hash(x)) == hash(x). hash(hash(-1)) == hash(-1) True hash(hash(2**64)) == hash(2**64) True Aside: what do you mean by '2.x int'? Do you mean an int in 2.x versions before, or after, ints and longs were partially integrated? I would take it to mean the type 2.x calls 'int', i.e. fixed-width integer type. -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
Steven D'Aprano st...@remove-this-cybersource.com.au writes: On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote: 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold for any hashable x (this is a simple consequence of the fact that hash(x) == x for any int x (by 'int' I mean 2.X int)). It's a beautiful theory, but, alas, it is not the case. hash(-1) == -1 False hash(2**64) == 2**64 False to give only two of an infinite number of counter-examples. I can only see one counterexample, (-1). 2**64 is of type 'long' in 2.X on your machine (or, to be pedantic, 2.x where x = 2). And, in fact, (-1) is the only int such that hash(x) != x. In can only guess that (-1) is a value that has a special meaning when hashing. Try this (Python 2.6): class A(object): ... def __hash__(self): return -1 ... a = A() hash(a) -2 Aside: what do you mean by '2.x int'? Do you mean an int in 2.x versions before, or after, ints and longs were partially integrated? Either. [st...@sylar ~]$ python2.1 Python 2.1.3 (#1, Aug 12 2010, 01:53:57) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type copyright, credits or license for more information. 2**64 Traceback (most recent call last): File stdin, line 1, in ? OverflowError: integer exponentiation People keep forgetting that 2.2 introduced nearly as many far-reaching changes as 3.0. I didn't forget, but what bearing does it have on this particular issue? -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
In 87hbgu8irb@gmail.com Arnaud Delobelle arno...@gmail.com writes: Steven D'Aprano st...@remove-this-cybersource.com.au writes: On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote: 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold for any hashable x (this is a simple consequence of the fact that hash(x) == x for any int x (by 'int' I mean 2.X int)). It's a beautiful theory, but, alas, it is not the case. hash(-1) == -1 False hash(2**64) == 2**64 False to give only two of an infinite number of counter-examples. (Infinite???) And, in fact, (-1) is the only int such that hash(x) != x. Arnaud, how did you determine that -1 is the only such int? I can't imagine any other approach other than a brute-force check of all ints... When I tried this I ran into unforeseen limitations in xrange, etc. It's all very doable, but still, it would take at least about 3 hours on my laptop. In can only guess that (-1) is a value that has a special meaning when hashing. Try this (Python 2.6): class A(object): ... def __hash__(self): return -1 ... a = A() hash(a) -2 Very cool. BTW, thank you for the explanation in your previous post. It makes a lot of sense. I find it interesting (as Hrvoje pointed out) that the hash function is (or appears to be) idempotent on integers (long or not), even though it is not the identity on the integers. Thanks to Steven for the counterexamples to show the latter. I've learned tons from this exchange. ~kj -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
On Sun, 10 Oct 2010 11:06:00 +0100, Arnaud Delobelle wrote: Steven D'Aprano st...@remove-this-cybersource.com.au writes: On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote: 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold for any hashable x (this is a simple consequence of the fact that hash(x) == x for any int x (by 'int' I mean 2.X int)). It's a beautiful theory, but, alas, it is not the case. hash(-1) == -1 False hash(2**64) == 2**64 False to give only two of an infinite number of counter-examples. I can only see one counterexample, (-1). 2**64 is of type 'long' in 2.X on your machine (or, to be pedantic, 2.x where x = 2). And, in fact, (-1) is the only int such that hash(x) != x. Fair point. I was mistaken. I had the impression that the integration between ints and longs since Python 2.2 was more extensive than it actually is. I made the above comments based on the idea that since Python 2.2, longs are a subclass of ints, e.g. that isinstance(2**64, int) would return True. Turns out that I'm wrong, in which case I agree with you that -1 is the only int counter- example. As an aside, this makes me glad that I have continued writing isinstance(x, (int, long)) in my code, even though I knew it was unnecessary. Not the first time, and probably not the last, that a this can never happen test saved the day. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
On Sun, 10 Oct 2010 13:49:09 +, kj wrote: to give only two of an infinite number of counter-examples. (Infinite???) I was mistaken. Given Arnaud's specification that we look only at the Python 2.x ints, I believe that there is only one counter-example, namely -1. However, in Python 3.x I would be correct. The reasoning is a simple application of the pigeon-hole principle. Since hash(n) is limited to returning a finite number of distinct results, but there are an infinite number of Python 3 ints, it follows that there must be an infinite number of collisions. And, in fact, (-1) is the only int such that hash(x) != x. Arnaud, how did you determine that -1 is the only such int? I can't imagine any other approach other than a brute-force check of all ints... Reading the source code is also a good approach. I don't read C very well, but even I can work out what this is doing: static long int_hash(PyIntObject *v) { /* XXX If this is changed, you also need to change the way Python's long, float and complex types are hashed. */ long x = v - ob_ival; if (x == -1) x = -2; return x; } (from intobject.c in Python 2.6.1). It takes a Python int object, extracts the value of it as a C long, returns -2 if the value is -1 otherwise returns the value. When I tried this I ran into unforeseen limitations in xrange, etc. It's all very doable, but still, it would take at least about 3 hours on my laptop. What are you doing? This takes less than 20 seconds to test up to 2**24: import time def test(n): t = time.time() assert hash(0) == 0 assert hash(1) == 1 assert hash(-1) == -2 for i in xrange(2, n): assert hash(i) == i, failed %d % i assert hash(-i) == -i, failed %d % -i t = time.time() - t return t test(2**24) 18.076412916183472 Since 2**31 is 2**7 = 128 times bigger, I estimate that testing everything up to 2**31 should take less than 45 minutes. And my PC is not exactly the fastest machine on the block. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
kj no.em...@please.post writes: In 87hbgu8irb@gmail.com Arnaud Delobelle arno...@gmail.com writes: [...] And, in fact, (-1) is the only int such that hash(x) != x. Arnaud, how did you determine that -1 is the only such int? I can't imagine any other approach other than a brute-force check of all ints... When I tried this I ran into unforeseen limitations in xrange, etc. It's all very doable, but still, it would take at least about 3 hours on my laptop. I looked at the source, namely the get_hash() function in the intobject.c file in the 2.x source, and the get_hash() function in the longobject.c file in the 3.x source. In can only guess that (-1) is a value that has a special meaning when hashing. Try this (Python 2.6): class A(object): ... def __hash__(self): return -1 ... a = A() hash(a) -2 Very cool. BTW, thank you for the explanation in your previous post. It makes a lot of sense. I find it interesting (as Hrvoje pointed out) that the hash function is (or appears to be) idempotent on integers (long or not), even though it is not the identity on the integers. Thanks to Steven for the counterexamples to show the latter. I've learned tons from this exchange. It still seems to hold that hash() is idempotent on all objects. I have learnt too that hash(-1) is not (-1), and that it seems that a hash value is not allowed to be (-1). There is one thing left to find out. Why can't it be (-1)? Maybe looking at the source for the hash() builtin would yield a clue, or maybe someone who knows would be able to tell us? -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
Arnaud Delobelle arno...@gmail.com writes: I have learnt too that hash(-1) is not (-1), and that it seems that a hash value is not allowed to be (-1). There is one thing left to find out. Why can't it be (-1)? Because -1 has a special meaning in the C function equivalent to Python's hash(). PyObject_Hash returning -1 means that an exception was raised somewhere inside the object's __hash__ method. For that reason hash functions that really return -1 must change that to another value, and -2 is as good a replacement as any. This is documented in http://docs.python.org/c-api/object.html?highlight=pyobject_hash#PyObject_Hash -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
In 4cb1dc9a$0$29970$c3e8da3$54964...@news.astraweb.com Steven D'Aprano st...@remove-this-cybersource.com.au writes: Reading the source code is also a good approach. Every time I have attempted to answer a question by looking at the Python C source, all I've achieved was wasting time, sometimes a *lot* of time, so by now I've developed what can only be described as a phobia to it. I probably need professional help at this point. ~kj -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
In 87y6a9lqnj@gmail.com Arnaud Delobelle arno...@gmail.com writes: You could do something like this: deep_methods = { list: lambda f, l: tuple(map(f, l)), dict: lambda f, d: frozenset((k, f(v)) for k, v in d.items()), set: lambda f, s: frozenset(map(f, s)), # Add more if needed } def apply_method(f, obj): try: method = deep_methods[type(obj)] except KeyError: return obj return method(f, obj) def deepfreeze(obj): Return a 'hashable version' of an object return apply_method(deepfreeze, obj) def deephash(obj): Return hash(deepfreeze(obj)) without deepfreezing return hash(apply_method(deephash, obj)) # Example of deepfreezable object: obj = [1, foo, {(2, 4): {7, 5, 4}, bar: baz}] ^ ^ | | `---`--- what's this? deepfreeze(obj) (1, 'foo', frozenset({('bar', 'baz'), ((2, 4), frozenset({4, 5, 7}))})) deephash(obj) 1341422540 hash(deepfreeze(obj)) 1341422540 After fixing the missing in deepfreeze this code works as advertised, but I'm mystified by the identity between hash(deepfreeze(...)) and deephash(...). Without some knowledge of the Python internals, I don't see how this follows. More specifically, it is not obvious to me that, for example, hash(frozenset((whatever,))) would be identical to hash(frozenset((hash(whatever),))) but this identity has held every time I've checked it. Similarly for other more complicated variations on this theme. Anyway, thanks for the code. It's very useful. ~kj -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
kj no.em...@please.post writes: In 87y6a9lqnj@gmail.com Arnaud Delobelle arno...@gmail.com writes: You could do something like this: deep_methods = { list: lambda f, l: tuple(map(f, l)), dict: lambda f, d: frozenset((k, f(v)) for k, v in d.items()), set: lambda f, s: frozenset(map(f, s)), # Add more if needed } def apply_method(f, obj): try: method = deep_methods[type(obj)] except KeyError: return obj return method(f, obj) def deepfreeze(obj): Return a 'hashable version' of an object return apply_method(deepfreeze, obj) def deephash(obj): Return hash(deepfreeze(obj)) without deepfreezing return hash(apply_method(deephash, obj)) # Example of deepfreezable object: obj = [1, foo, {(2, 4): {7, 5, 4}, bar: baz}] ^ ^ | | `---`--- what's this? This is set literal notation, introduced in Python 3! In 2.X, you would write set([7, 5, 4]) deepfreeze(obj) (1, 'foo', frozenset({('bar', 'baz'), ((2, 4), frozenset({4, 5, 7}))})) deephash(obj) 1341422540 hash(deepfreeze(obj)) 1341422540 After fixing the missing in deepfreeze this code works as advertised, Oops, I must admit I added the docstrings after pasting the code :) but I'm mystified by the identity between hash(deepfreeze(...)) and deephash(...). Without some knowledge of the Python internals, I don't see how this follows. OK. I haven't actually proved it, but it follows from the following facts: 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold for any hashable x (this is a simple consequence of the fact that hash(x) == x for any int x (by 'int' I mean 2.X int)). 2. Container hashable objects compute their hash from the hash of their elements. I don't think either of these two facts is documented, but it would be quite easy to verify them in the Python source (I must admit I have not done it). And it is difficult to conceive how it could be otherwise. -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
On Sat, 09 Oct 2010 21:39:51 +0100, Arnaud Delobelle wrote: 1. hash() is an idempotent function, i.e. hash(hash(x)) == hash(x) hold for any hashable x (this is a simple consequence of the fact that hash(x) == x for any int x (by 'int' I mean 2.X int)). It's a beautiful theory, but, alas, it is not the case. hash(-1) == -1 False hash(2**64) == 2**64 False to give only two of an infinite number of counter-examples. Aside: what do you mean by '2.x int'? Do you mean an int in 2.x versions before, or after, ints and longs were partially integrated? [st...@sylar ~]$ python2.1 Python 2.1.3 (#1, Aug 12 2010, 01:53:57) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type copyright, credits or license for more information. 2**64 Traceback (most recent call last): File stdin, line 1, in ? OverflowError: integer exponentiation People keep forgetting that 2.2 introduced nearly as many far-reaching changes as 3.0. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
On 10/6/2010 2:58 PM, kj wrote: These objects are non-mutable once they are created, See below. like to use a two-step comparison for equality, based on the assumption that I can compute (either at creation time, or as needed and memoized) a hashkey/digest for each object. The test for equality of two of these objects would first compare their hashkeys. If they are different, the two objects are declared different; if they match, then a more stringent test for equality is performed. I believe Python strings do this (cache the hash). Equality comparison can check length, hashes, and only then chars. So the problem is to come up with a reasonable hashkey for each of these objects. The objects have two significant attributes, and two of these objects should be regarded as equal if these attributes are the same in both. The first attribute is a simple dictionary whose keys are integers and values are strings. The second attribute is more complicated. It is a tree structure, represented as a dictionary of dictionaries of dictionaries... until we get to the leaf elements, which are frozensets of strings. The keys at every level of this structure are strings. E.g. a simple example of such an attribute would look like: {'A': {'a': set(['1', '2', '3']), 'b': set(['4', '5'])}, 'B': set(['6', '7', '8'])} If these two attributes, and hence the dicts, are public, then your instances are mutable. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
In mailman.1415.1286438617.29448.python-l...@python.org Terry Reedy tjre...@udel.edu writes: If these two attributes, and hence the dicts, are public, then your instances are mutable. I guess I should have written immutable among consenting adults. As far as I know, Python does not support private attributes, so I guess the dicts are public no matter what I do. I suppose that I can implement frozendict, but I can't think of any way to enforce the immutability of these frozendicts that would not be trivial to circumvent (it better be, in fact, otherwise I wouldn't be able to initialize the damn things). If you had something else in mind, please let me know. ~kj -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
In m2fwwjazs2@web.de de...@web.de (Diez B. Roggisch) writes: kj no.em...@please.post writes: The short version of this question is: where can I find the algorithm used by the tuple class's __hash__ method? Surprisingly, in the source: http://google.com/codesearch/p?hl=de#-2BKs-LW4I0/trunk/python/src/Objects/tupleobject.cq=python%20tuplehashsa=Ncd=1ct=rc Thanks to you, and to all who pointed me to the place in the source where this is. How exactly did you search for this? Taking a hint from the url above, I went to Google Code Search and searched for python tuple hash (minus the quotes, of course), which produced a ton of irrelevant stuff (almost 80K hits). Searching for python tuple hash lang:c cut down the number of hits to ~8K, but still too much to wade through. Clearly I need a smarter search strategy (one that does not include foreknowledge of the name of the actual function in the C source, of course). ~kj -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
kj no.em...@please.post writes: In m2fwwjazs2@web.de de...@web.de (Diez B. Roggisch) writes: kj no.em...@please.post writes: The short version of this question is: where can I find the algorithm used by the tuple class's __hash__ method? Surprisingly, in the source: http://google.com/codesearch/p?hl=de#-2BKs-LW4I0/trunk/python/src/Objects/tupleobject.cq=python%20tuplehashsa=Ncd=1ct=rc Thanks to you, and to all who pointed me to the place in the source where this is. How exactly did you search for this? Taking a hint from the url above, I went to Google Code Search and searched for python tuple hash (minus the quotes, of course), which produced a ton of irrelevant stuff (almost 80K hits). Searching for python tuple hash lang:c cut down the number of hits to ~8K, but still too much to wade through. Clearly I need a smarter search strategy (one that does not include foreknowledge of the name of the actual function in the C source, of course). I tried codesearch first. Not satisfied after 30 seconds with the results, I did the most straight forward thing - I downloaded and un-packed the python source... and took a look. From that I learned the tuplehash function name. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
In 87pqvmp611@web.de de...@web.de (Diez B. Roggisch) writes: I tried codesearch first. Not satisfied after 30 seconds with the results, I did the most straight forward thing - I downloaded and un-packed the python source... and took a look. From that I learned the tuplehash function name. You must be at least somewhat familiar with the Python source. Everytime I've peeked into it I just feel lost, but it's clearly something I need to master sooner or later... I can't wait for the next one of those occasional introductions to the Python internals at our local Python club. Thanks, ~kj -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
kj no.em...@please.post writes: In 87pqvmp611@web.de de...@web.de (Diez B. Roggisch) writes: I tried codesearch first. Not satisfied after 30 seconds with the results, I did the most straight forward thing - I downloaded and un-packed the python source... and took a look. From that I learned the tuplehash function name. You must be at least somewhat familiar with the Python source. Everytime I've peeked into it I just feel lost, but it's clearly something I need to master sooner or later... I can't wait for the next one of those occasional introductions to the Python internals at our local Python club. No, I'm not the tiniest bit. I just followed my instincts in looking into the Objects folder, because that's where I suspected the definition of objects to be And grep has been proven useful in these cases as well. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
On 07/10/2010 11:42, kj wrote: Inmailman.1415.1286438617.29448.python-l...@python.org Terry Reedytjre...@udel.edu writes: If these two attributes, and hence the dicts, are public, then your instances are mutable. I guess I should have written immutable among consenting adults. As far as I know, Python does not support private attributes, so I guess the dicts are public no matter what I do. I suppose that I can implement frozendict, but I can't think of any way to enforce the immutability of these frozendicts that would not be trivial to circumvent (it better be, in fact, otherwise I wouldn't be able to initialize the damn things). You would initialise them by creating them from a list of tuples (or an iterable which yields tuples), like with a dict: dict([(1, foo), (2, bar)]) {1: 'foo', 2: 'bar'} -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
kj no.em...@please.post writes: The short version of this question is: where can I find the algorithm used by the tuple class's __hash__ method? Now, for the long version of this question, I'm working with some complext Python objects that I want to be able to compare for equality easily. These objects are non-mutable once they are created, so I would like to use a two-step comparison for equality, based on the assumption that I can compute (either at creation time, or as needed and memoized) a hashkey/digest for each object. The test for equality of two of these objects would first compare their hashkeys. If they are different, the two objects are declared different; if they match, then a more stringent test for equality is performed. So the problem is to come up with a reasonable hashkey for each of these objects. The objects have two significant attributes, and two of these objects should be regarded as equal if these attributes are the same in both. The first attribute is a simple dictionary whose keys are integers and values are strings. The second attribute is more complicated. It is a tree structure, represented as a dictionary of dictionaries of dictionaries... until we get to the leaf elements, which are frozensets of strings. The keys at every level of this structure are strings. E.g. a simple example of such an attribute would look like: {'A': {'a': set(['1', '2', '3']), 'b': set(['4', '5'])}, 'B': set(['6', '7', '8'])} I'm looking for a good algorithm for computing a hash key for something like this? (Basically I'm looking for a good way to combine hashkeys.) You could do something like this: deep_methods = { list: lambda f, l: tuple(map(f, l)), dict: lambda f, d: frozenset((k, f(v)) for k, v in d.items()), set: lambda f, s: frozenset(map(f, s)), # Add more if needed } def apply_method(f, obj): try: method = deep_methods[type(obj)] except KeyError: return obj return method(f, obj) def deepfreeze(obj): Return a 'hashable version' of an object return apply_method(deepfreeze, obj) def deephash(obj): Return hash(deepfreeze(obj)) without deepfreezing return hash(apply_method(deephash, obj)) # Example of deepfreezable object: obj = [1, foo, {(2, 4): {7, 5, 4}, bar: baz}] deepfreeze(obj) (1, 'foo', frozenset({('bar', 'baz'), ((2, 4), frozenset({4, 5, 7}))})) deephash(obj) 1341422540 hash(deepfreeze(obj)) 1341422540 -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
kj no.em...@please.post wrote: The short version of this question is: where can I find the algorithm used by the tuple class's __hash__ method? http://svn.python.org/view/python/trunk/Objects/tupleobject.c?revision=81029view=markup static long tuplehash(PyTupleObject *v) { register long x, y; register Py_ssize_t len = Py_SIZE(v); register PyObject **p; long mult = 103L; x = 0x345678L; p = v-ob_item; while (--len = 0) { y = PyObject_Hash(*p++); if (y == -1) return -1; x = (x ^ y) * mult; /* the cast might truncate len; that doesn't change hash stability */ mult += (long)(82520L + len + len); } x += 97531L; if (x == -1) x = -2; return x; } I'm looking for a good algorithm for computing a hash key for something like this? (Basically I'm looking for a good way to combine hashkeys.) If you want to combine the hashes from several objects then the easiest way is just to create a tuple of the objects and hash it. -- Duncan Booth http://kupuguy.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
On Wed, Oct 6, 2010 at 11:58 AM, kj no.em...@please.post wrote: The short version of this question is: where can I find the algorithm used by the tuple class's __hash__ method? From Objects/tuple.c, line 315 in Python3.2: static long tuplehash(PyTupleObject *v) { register long x, y; register Py_ssize_t len = Py_SIZE(v); register PyObject **p; long mult = 103L; x = 0x345678L; p = v-ob_item; while (--len = 0) { y = PyObject_Hash(*p++); if (y == -1) return -1; x = (x ^ y) * mult; /* the cast might truncate len; that doesn't change hash stability */ mult += (long)(82520L + len + len); } x += 97531L; if (x == -1) x = -2; return x; } Now, for the long version of this question, I'm working with some complext Python objects that I want to be able to compare for equality easily. These objects are non-mutable once they are created, so I would like to use a two-step comparison for equality, based on the assumption that I can compute (either at creation time, or as needed and memoized) a hashkey/digest for each object. The test for equality of two of these objects would first compare their hashkeys. If they are different, the two objects are declared different; if they match, then a more stringent test for equality is performed. So the problem is to come up with a reasonable hashkey for each of these objects. The objects have two significant attributes, and two of these objects should be regarded as equal if these attributes are the same in both. The first attribute is a simple dictionary whose keys are integers and values are strings. The second attribute is more complicated. It is a tree structure, represented as a dictionary of dictionaries of dictionaries... until we get to the leaf elements, which are frozensets of strings. The keys at every level of this structure are strings. E.g. a simple example of such an attribute would look like: {'A': {'a': set(['1', '2', '3']), 'b': set(['4', '5'])}, 'B': set(['6', '7', '8'])} I'm looking for a good algorithm for computing a hash key for something like this? (Basically I'm looking for a good way to combine hashkeys.) Sounds like you're trying to hash mutable data structures, which is a no-no, but assuming you've got that part all figured out then the easy way out would be to print the whole thing and take the hash of the resulting string. A (possibly) better way would be to do a Schwartzian transform[1] and freeze everything. Other approaches may be better, but those are the first two out of my head. Geremy Condra [1] http://en.wikipedia.org/wiki/Schwartzian_transform -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
kj no.em...@please.post writes: The short version of this question is: where can I find the algorithm used by the tuple class's __hash__ method? Surprisingly, in the source: http://google.com/codesearch/p?hl=de#-2BKs-LW4I0/trunk/python/src/Objects/tupleobject.cq=python%20tuplehashsa=Ncd=1ct=rc Now, for the long version of this question, I'm working with some complext Python objects that I want to be able to compare for equality easily. These objects are non-mutable once they are created, so I would like to use a two-step comparison for equality, based on the assumption that I can compute (either at creation time, or as needed and memoized) a hashkey/digest for each object. The test for equality of two of these objects would first compare their hashkeys. If they are different, the two objects are declared different; if they match, then a more stringent test for equality is performed. So the problem is to come up with a reasonable hashkey for each of these objects. The objects have two significant attributes, and two of these objects should be regarded as equal if these attributes are the same in both. The first attribute is a simple dictionary whose keys are integers and values are strings. The second attribute is more complicated. It is a tree structure, represented as a dictionary of dictionaries of dictionaries... until we get to the leaf elements, which are frozensets of strings. The keys at every level of this structure are strings. E.g. a simple example of such an attribute would look like: {'A': {'a': set(['1', '2', '3']), 'b': set(['4', '5'])}, 'B': set(['6', '7', '8'])} I'm looking for a good algorithm for computing a hash key for something like this? (Basically I'm looking for a good way to combine hashkeys.) Creating tuples from dicts, recursively, and stabilized by using sorted on items. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: hashkey/digest for a complex object
On 10/6/10 1:58 PM, kj wrote: The short version of this question is: where can I find the algorithm used by the tuple class's __hash__ method? The function tuplehash() in Objects/tupleobject.c, predictably enough. Now, for the long version of this question, I'm working with some complext Python objects that I want to be able to compare for equality easily. These objects are non-mutable once they are created, so I would like to use a two-step comparison for equality, based on the assumption that I can compute (either at creation time, or as needed and memoized) a hashkey/digest for each object. The test for equality of two of these objects would first compare their hashkeys. If they are different, the two objects are declared different; if they match, then a more stringent test for equality is performed. The most straightforward way to implement __hash__ for a complicated object is to normalize its relevant data to a tuple of hashable objects and then call hash() on that tuple. A straightforward way to compare such objects is to calculate that very same normalized tuple for each one and compare those tuples. Cache it if necessary. Don't bother hashing to implement __eq__ unless if you are really optimizing for space and time. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list