Re: [Python-Dev] Why not using the hash when comparing strings?

Brett Cannon Fri, 19 Oct 2012 06:59:03 -0700

On Fri, Oct 19, 2012 at 8:36 AM, Victor Stinner <[email protected]>wrote:


> 2012/10/19 Benjamin Peterson <[email protected]>:
> > It would be interesting to see how common it is for strings which have
> > their hash computed to be compared.
>
> I implemented a quick hack. When running "./python -m test test_os":
> Python calls PyUnicode_RichCompare() 15206 times with Py_EQ or Py_NE
> operator. In 41.4% (6295 calls), the hash of the two operands is
> known. In 41.2% (6262 times on 15206), the hash of the two operands
> are known *and are different*!
>
> The hit rate may depend since when the process was started. For
> example, in a fresh interpreter: the hit rate is only 7% (189 hit /
> 2703 calls).
>
> When running the test suite, the hit rate is around 80% (hashs are
> known in 90%) after running 70 tests. At the same time, the average of
> string length is 4.1 characters and quite all strings are pure ASCII.
>
> I create the issue http://bugs.python.org/issue16286 to discuss this
> optimization.
>

If you want to measure the performance impact compared to a clean build
then you can use the unladen benchmarks as it contains several Python
3-compatible benchmarks now.

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why not using the hash when comparing strings?

Reply via email to