On 29 December 2016 at 19:20, Steven D'Aprano <st...@pearwood.info> wrote:
> > With respect Josh, I feel that this thread is based on premature > optimization. It seems to me that you're *assuming* that anything less > than some theoretically ideal O(1) space O(N) time hash function is > clearly and obviously unsuitable. > > Of course I might be completely wrong. Perhaps you have implemented your > own __hash__ methods as suggested by the docs, as well as Ned's version, > and profiled your code and discovered that __hash__ is a significant > bottleneck. In which case, I'll apologise for doubting you, but in my > defence I'll say that the language you have used in this thread so far > gives no hint that you've actually profiled your code and found the > bottleneck. > In Josh's defence, the initial use case he put forward is exactly the kind of scenario where it's worthwhile optimising ahead of time. Quite often a poorly implemented hash function doesn't manifest as a problem until you scale up massively - and a developer may not have the capability to scale up to a suitable level in-house, resulting in performance issues at customer sites. I had one particular case (fortunately discovered before going to customers) where a field was included in the equality check, but wasn't part of the hash. Unfortunately, the lack of this one field resulted in objects only being allocated to a few buckets (in a Java HashMap), resulting in every access having to walk a potentially very long chain doing equality comparisons - O(N) behaviour from an amortised O(1) data structure. Unit tests - no worries. Small-scale tests - everything looked fine. Once we started our load tests though everything slowed to a crawl. 100% CPU, throughput at a standstill ... it didn't look good. Adding that one field to the hash resulted in the ability to scale up to hundreds of thousands of objects with minimal CPU. I can't remember if it was millions we tested to (it was around 10 years ago ...). Tim Delaney
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/