On Fri, Dec 30, 2016 at 09:47:54PM -0500, j...@math.brown.edu wrote: > __eq__ only has to be called when a hash bucket is non-empty. In that case, > it may be O(n) in pathological cases, but it could also be O(1) every time. > On the other hand, __hash__ has to be called on every lookup, is O(n) on > the first call, and ideally O(1) after. I'd expect that __eq__ may often > not dominate, and avoiding an unnecessary large tuple allocation on the > first __hash__ call could be helpful.
Sorry to be the broken record repeating himself, but this sounds *exactly* like premature optimization here. My suggestion is that you are overthinking things, or at least worrying about issues before you've got any evidence that they are going to be real issues. I expect that the amount of time and effort you've spent in analysing the theorectical problems here and writing to this list is more than the time it would have taken for you to write the simplest __hash__ method that could work (using the advice in the docs to make a tuple). You could have implemented that in five minutes, and be running code by now. Of course I understand that performance issues may not be visible when you have 100 or 100 thousand items but may be horrific when you have 100 million. I get that. But you're aware of the possibility, so you can write a performance test that generates 100 million objects and tests performance. *If* you find an actual problem, then you can look into changing your __hash__ method. You could come back here and talk about actual performance issues instead of hypothetical issues, or you could hire an expert to tune your hash function. (And if you do pay for an expert to solve this, please consider giving back to the community.) Remember that the specific details of __hash__ should be an internal implementation detail for your class. You shouldn't fear changing the hash algorithm as often as you need to, including in bug fix releases. You don't have to wait for the perfect hash function, just a "good enough for now" one to get started. I'm not trying to be dismissive of your concerns. They may be real issues that you have to solve. I'm just saying, you should check your facts first rather than solve hypothetic problems. I have seen, and even written, far too much pessimized code in the name of "this must be better, it stands to reason" to give much credence to theoretical arguments about performance. -- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/