08.02.18 12:45, Franklin? Lee пише:
On Feb 7, 2018 17:28, "Serhiy Storchaka" <storch...@gmail.com <mailto:storch...@gmail.com>> wrote:
    Even for simple string search a regular expression can be more
    efficient than a str method.

    $ ./python -m timeit -s 'import re; p = re.compile("spam"); s =
    "spa"*100+"m"' -- 'p.search(s)'
    500000 loops, best of 5: 680 nsec per loop

    $ ./python -m timeit -s 's = "spa"*100+"m"' -- 's.find("spam")'
    200000 loops, best of 5: 1.09 usec per loop


That's an odd result. Python regexes use backtracking, not a DFA. I gave a timing test earlier in the thread:
https://mail.python.org/pipermail/python-ideas/2018-February/048879.html
I compared using repeated .find()s against a precompiled regex, then against a pure Python and unoptimized tree-based algorithm.

Could it be that re uses an optimization that can also be used in str? CPython uses a modified Boyer-Moore for str.find:
https://github.com/python/cpython/blob/master/Objects/stringlib/fastsearch.h
http://effbot.org/zone/stringlib.htm
Maybe there's a minimum length after which it's better to precompute a table.

Yes, there is a special optimization in re here. It isn't free, you need to spend some time for preparing it. You need a special object that keeps an optimized representation for faster search. This makes it very unlikely be used in str, because you need either spend the time for compilation on every search, or use some kind of caching, which is not free too, adds complexity and increases memory consumption. Note also in case of re the compiler is implemented in Python. This reduces the complexity.

Patches that add optimization for other common cases are welcomed.

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to