Re: [Python-ideas] incremental hashing in hash

Ethan Furman Fri, 30 Dec 2016 19:09:07 -0800

On 12/30/2016 06:47 PM, j...@math.brown.edu wrote:

__eq__ only has to be called when a hash bucket is non-empty. In that case,
 it may be O(n) in pathological cases, but it could also be O(1) every time.
 On the other hand, __hash__ has to be called on every lookup, is O(n) on
 the first call, and ideally O(1) after. I'd expect that __eq__ may often
 not dominate, and avoiding an unnecessary large tuple allocation on the
 first __hash__ call could be helpful. If a __hash__ implementation can
 avoid creating an arbitrarily large tuple unnecessarily and still perform
 well, why not save the space? If the resulting hash distribution is worse,
 that's another story, but it seems worth documenting one way or the other,
 since the current docs give memory-conscious Python programmers pause for
 this use case.


A quick list of what we have agreed on:

- __hash__ is called once for every dict/set lookup
- __hash__ is calculated once because we are caching the value
- __eq__ is being called an unknown number of times, but can be quite
  expensive in terms of time
- items with the same hash are probably cheap in terms of __eq__ (?)

So maybe this will work?

    def __hash__(self):
        return hash(self.name) * hash(self.nick) * hash(self.color)

In other words, don't create a new tuple, just use the ones you already have 
and toss in a couple maths operations.  (and have somebody vet that ;)

--
~Ethan~

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] incremental hashing in __hash__

Reply via email to

Re: [Python-ideas] incremental hashing in hash