On 2016-12-30 20:59, Matthias Bussonnier wrote: > On Fri, Dec 30, 2016 at 5:24 PM, Nick Coghlan <ncogh...@gmail.com> wrote: >> >> I understood the "iterhash" suggestion to be akin to itertools.accumulate: >> >> >>> for value, tally in enumerate(accumulate(range(10))): print(value, >> ... > > It reminds me of hmac[1]/hashlib[2], with the API : h.update(...) > before a .digest(). > It is slightly lower level than a `from_iterable`, but would be a bit > more flexible. > If the API were kept similar things would be easier to remember.
Hi, I'm the author of PEP 456 (SipHash24) and one of the maintainers of the hashlib module. Before we come up with a new API or recipe, I would like to understand the problem first. Why does the initial op consider hash(large_tuple) a performance issue? If you have an object with lots of members that affect both __hash__ and __eq__, then __hash__ is really least of your concern. The hash has to be computed just once and then will stay the same over the life time of the object. Once computed the hash can be cached. On the other hand __eq__ is at least called once for every successful hash lookup. On the worst case it is called n-1 for a dict of size n for a match *and* a hashmap miss. Every __eq__ call has to compare between 1 and m member attributes. For a dict with 1,000 elements with 1,000 members each, that's just 1,000 hash computations with roughly 8 bB memory allocation but almost a million comparisons in worst case. A hasher objects adds further overhead, e.g. object allocation, creation of a bound methods for each call etc. It's also less CPU cache friendly than the linear data structure of a tuple. Christian _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/