08.02.18 12:45, Franklin? Lee пише:
On Feb 7, 2018 17:28, "Serhiy Storchaka"
<storch...@gmail.com
<mailto:storch...@gmail.com>> wrote:
Even for simple string search a regular expression can be more
efficient than a str method.
$ ./python -m timeit -s 'import re; p = re.compile("spam"); s =
"spa"*100+"m"' -- 'p.search(s)'
500000 loops, best of 5: 680 nsec per loop
$ ./python -m timeit -s 's = "spa"*100+"m"' -- 's.find("spam")'
200000 loops, best of 5: 1.09 usec per loop
That's an odd result. Python regexes use backtracking, not a DFA. I gave
a timing test earlier in the thread:
https://mail.python.org/pipermail/python-ideas/2018-February/048879.html
I compared using repeated .find()s against a precompiled regex, then
against a pure Python and unoptimized tree-based algorithm.
Could it be that re uses an optimization that can also be used in str?
CPython uses a modified Boyer-Moore for str.find:
https://github.com/python/cpython/blob/master/Objects/stringlib/fastsearch.h
http://effbot.org/zone/stringlib.htm
Maybe there's a minimum length after which it's better to precompute a
table.
Yes, there is a special optimization in re here. It isn't free, you need
to spend some time for preparing it. You need a special object that
keeps an optimized representation for faster search. This makes it very
unlikely be used in str, because you need either spend the time for
compilation on every search, or use some kind of caching, which is not
free too, adds complexity and increases memory consumption. Note also in
case of re the compiler is implemented in Python. This reduces the
complexity.
Patches that add optimization for other common cases are welcomed.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/