On Wed, Oct 6, 2010 at 11:58 AM, kj <no.em...@please.post> wrote: > > > The short version of this question is: where can I find the algorithm > used by the tuple class's __hash__ method?
>From Objects/tuple.c, line 315 in Python3.2: static long tuplehash(PyTupleObject *v) { register long x, y; register Py_ssize_t len = Py_SIZE(v); register PyObject **p; long mult = 1000003L; x = 0x345678L; p = v->ob_item; while (--len >= 0) { y = PyObject_Hash(*p++); if (y == -1) return -1; x = (x ^ y) * mult; /* the cast might truncate len; that doesn't change hash stability */ mult += (long)(82520L + len + len); } x += 97531L; if (x == -1) x = -2; return x; } > Now, for the long version of this question, I'm working with some > complext Python objects that I want to be able to compare for > equality easily. > > These objects are non-mutable once they are created, so I would > like to use a two-step comparison for equality, based on the > assumption that I can compute (either at creation time, or as needed > and memoized) a hashkey/digest for each object. The test for > equality of two of these objects would first compare their hashkeys. > If they are different, the two objects are declared different; if > they match, then a more stringent test for equality is performed. > > So the problem is to come up with a reasonable hashkey for each of > these objects. The objects have two significant attributes, and > two of these objects should be regarded as equal if these attributes > are "the same" in both. The first attribute is a simple dictionary > whose keys are integers and values are strings. The second attribute > is more complicated. It is a tree structure, represented as a > dictionary of dictionaries of dictionaries... until we get to the > leaf elements, which are frozensets of strings. The keys at every > level of this structure are strings. E.g. a simple example of such > an attribute would look like: > > {'A': {'a': set(['1', '2', '3']), > 'b': set(['4', '5'])}, > 'B': set(['6', '7', '8'])} > > I'm looking for a good algorithm for computing a hash key for > something like this? (Basically I'm looking for a good way to > combine hashkeys.) Sounds like you're trying to hash mutable data structures, which is a no-no, but assuming you've got that part all figured out then the easy way out would be to print the whole thing and take the hash of the resulting string. A (possibly) better way would be to do a Schwartzian transform[1] and freeze everything. Other approaches may be better, but those are the first two out of my head. Geremy Condra [1] http://en.wikipedia.org/wiki/Schwartzian_transform -- http://mail.python.org/mailman/listinfo/python-list