But, I think that the problem with adding `__hash__` to collections.abc.Iterable is that not all iterables are immutable -- And if they aren't immutable, then allowing them to be hashed is likely to be a pretty bad idea...
I'm still having a hard time being convinced that this is very much of an optimization at all ... If you start hashing tuples that are large enough that memory is a concern, then that's going to also take a *really* long time and probably be prohibitive anyway. Just for kicks, I decided to throw together a simple script to time how much penalty you pay for hashing a tuple: class F(object): def __init__(self, arg): self.arg = arg def __hash__(self): return hash(tuple(self.arg)) class T(object): def __init__(self, arg): self.arg = tuple(arg) def __hash__(self): return hash(self.arg) class C(object): def __init__(self, arg): self.arg = tuple(arg) self._hash = None def __hash__(self): if self._hash is None: self._hash = hash(tuple(self.arg)) return self._hash import timeit print(timeit.timeit('hash(f)', 'from __main__ import F; f = F(list(range(500)))')) print(timeit.timeit('hash(t)', 'from __main__ import T; t = T(list(range(500)))')) print(timeit.timeit('hash(c)', 'from __main__ import C; c = C(list(range(500)))')) results = [] for i in range(1, 11): n = i * 100 t1 = timeit.timeit('hash(f)', 'from __main__ import F; f = F(list(range(%d)))' % i) t2 = timeit.timeit('hash(t)', 'from __main__ import T; t = T(list(range(%d)))' % i) results.append(t1/t2) print(results) F is going to create a new tuple each time and then hash it. T already has a tuple, so we'll only pay the cost of hashing a tuple, not the cost of constructing a tuple and C caches the hash value and re-uses it once it is known. C is the winner by a factor of 10 or more (no surprise there). But the real interesting thing is that the the ratio of the timing results from hashing `F` vs. `T` is relatively constant in the range of my test (up to 1000 elements) and that ratio's value is approximately 1.3. For most applications, that seems reasonable. If you really need a speed-up, then I suppose you could recode the thing in Cython and see what happens, but I doubt that will be frequently necessary. If you _do_ code it up in Cython, put it up on Pypi and see if people use it... On Wed, Jan 4, 2017 at 5:04 PM, Neil Girdhar <mistersh...@gmail.com> wrote: > Couldn't you add __hash__ to collections.abc.Iterable ? Essentially, > expose __hash__ there; then all iterables automatically have a default hash > that hashes their ordered contents. > > On Wednesday, January 4, 2017 at 7:37:26 PM UTC-5, Steven D'Aprano wrote: >> >> On Wed, Jan 04, 2017 at 04:38:05PM -0500, j...@math.brown.edu wrote: >> > Instead of the proposals like "hash.from_iterable()", would it make >> sense >> > to allow tuple.__hash__() to accept any iterable, when called as a >> > classmethod? >> >> The public API for calculating the hash of something is to call the >> hash() builtin function on some object, e.g. to call tuple.__hash__ you >> write hash((a, b, c)). The __hash__ dunder method is implementation, not >> interface, and normally shouldn't be called directly. >> >> Unless I'm missing something obvious, your proposal would require the >> caller to call the dunder methods directly: >> >> class X: >> def __hash__(self): >> return tuple.__hash__(iter(self)) >> >> I consider that a poor interface design. >> >> But even if we decide to make an exception in this case, tuple.__hash__ >> is currently an ordinary instance method right now. There's probably >> code that relies on that fact and expects that: >> >> tuple.__hash__((a, b, c)) >> >> is currently the same as >> >> (a, b, c).__hash__() >> >> >> (Starting with the hash() builtin itself, I expect, although that is >> easy enough to fix if needed.) Your proposal will break backwards >> compatibility, as it requires a change in semantics: >> >> (1) (a, b, c).__hash__() must keep the current behaviour, which >> means behaving like a bound instance method; >> >> (2) But tuple.__hash__ will no longer return an unbound method (actually >> a function object, but the difference is unimportant) and instead will >> return something that behaves like a bound class method. >> >> Here's an implementation which does this: >> >> http://code.activestate.com/recipes/577030-dualmethod-descriptor/ >> >> so such a thing is possible. But it breaks backwards-compatability and >> introduces something which I consider to be an unclean API (calling a >> dunder method directly). Unless there's a *really* strong advantage to >> >> tuple.__hash__(...) >> >> over >> >> hash.from_iterable(...) >> >> (or equivalent), I would be against this change. >> >> >> >> > (And similarly with frozenset.__hash__(), so that the fast C >> > implementation of that algorithm could be used, rather than the slow >> > collections.Set._hash() implementation. Then the duplicated >> implementation >> > in _collections_abc.py's Set._hash() could be removed completely, >> > delegating to frozenset.__hash__() instead.) >> >> This is a good point. Until now, I've been assuming that >> hash.from_iterable should consider order. But frozenset shows us that >> sometimes the hash should *not* consider order. >> >> This hints that perhaps the hash.from_iterable() should have its own >> optional dunder method. Or maybe we need two functions: an ordered >> version and an unordered version. >> >> Hmmm... just tossing out a wild idea here... let's get rid of the dunder >> method part of your suggestion, and add new public class methods to >> tuple and frozenset: >> >> tuple.hash_from_iter(iterable) >> frozenset.hash_from_iter(iterable) >> >> >> That gets rid of all the objections about backwards compatibility, since >> these are new methods. They're not dunder names, so there are no >> objections to being used as part of the public API. >> >> A possible objection is the question, is this functionality *actually* >> important enough to bother? >> >> Another possible objection: are these methods part of the sequence/set >> API? If not, do they really belong on the tuple/frozenset? Maybe they >> belong elsewhere? >> >> >> >> > Would this API more cleanly communicate the algorithm being used and >> the >> > implementation, >> >> No. If you want to communicate the algorithm being used, write some >> documentation. >> >> Seriously, the public API doesn't communicate the algorithm used for the >> implementation. How can it? We can keep the same interface and change >> the implementation, or change the interface and keep the implementation. >> The two are (mostly) independent. >> >> >> >> > while making a smaller increase in API surface area >> > compared to introducing a new function? >> >> It's difficult to quantify "API surface area". On the one hand, we have >> the addition of one or two new functions or methods. Contrast with: >> >> * introducing a new kind of method into the built-ins (one which >> behaves like a classmethod when called from the class, and like >> an instance method when called from an instance); >> >> * changing tuple.__hash__ from an ordinary method to one of the >> above special methods; >> >> * and likewise for frozenset.__hash__; >> >> * change __hash__ from "only used as implementation, not as >> interface" to "sometimes used as interface". >> >> >> To me, adding one or two new methods/functions is the smaller, or at >> least less disruptive, change. >> >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> python...@python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: m...@getpattern.com // P: 603.892.7736 We’re looking for beta testers. Go here <https://www.getpattern.com/meetpattern> to sign up!
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/