Re: [Python-ideas] Complicate str methods

Serhiy Storchaka Thu, 08 Feb 2018 10:06:02 -0800

08.02.18 12:45, Franklin? Lee пише:

On Feb 7, 2018 17:28, "Serhiy Storchaka"<storch...@gmail.com<mailto:storch...@gmail.com>> wrote:
    Even for simple string search a regular expression can be more
    efficient than a str method.
    $ ./python -m timeit -s 'import re; p = re.compile("spam"); s =
    "spa"*100+"m"' -- 'p.search(s)'
    500000 loops, best of 5: 680 nsec per loop

    $ ./python -m timeit -s 's = "spa"*100+"m"' -- 's.find("spam")'
    200000 loops, best of 5: 1.09 usec per loop
That's an odd result. Python regexes use backtracking, not a DFA. I gavea timing test earlier in the thread:
https://mail.python.org/pipermail/python-ideas/2018-February/048879.html
I compared using repeated .find()s against a precompiled regex, thenagainst a pure Python and unoptimized tree-based algorithm.
Could it be that re uses an optimization that can also be used in str?CPython uses a modified Boyer-Moore for str.find:
https://github.com/python/cpython/blob/master/Objects/stringlib/fastsearch.h
http://effbot.org/zone/stringlib.htm
Maybe there's a minimum length after which it's better to precompute atable.

Yes, there is a special optimization in re here. It isn't free, you needto spend some time for preparing it. You need a special object thatkeeps an optimized representation for faster search. This makes it veryunlikely be used in str, because you need either spend the time forcompilation on every search, or use some kind of caching, which is notfree too, adds complexity and increases memory consumption. Note also incase of re the compiler is implemented in Python. This reduces thecomplexity.


Patches that add optimization for other common cases are welcomed.

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Complicate str methods

Reply via email to