[Python-Dev] re performance

Sven R. Kunze Thu, 26 Jan 2017 13:16:17 -0800

Hi folks,

I recently refreshed regular expressions theoretical basics *indulgingin reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html

However, reaching the chart in the lower third of the article, I sawPython 2.4 measured against a naive Thompson matching implementation.And I was surprised about how bad it performed compared to anunoptimized version of an older than dirt algorithm.

So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilà,100% cpu and no results so far. Nothing has changed at all since 2007.


>>> import re
>>> re.match(r'a?'*30 + r'a'*30, 'a'*30)
.... (still waiting)

Quoting from the article: " Some might argue that this test is unfair tothe backtracking implementations, since it focuses on an uncommon cornercase. This argument misses the point: given a choice between animplementation with a predictable, consistent, fast running time on allinputs or one that usually runs quickly but can take years of CPU time(or more) on some inputs, the decision should be easy."

Victor, as the head of Python performance department, and Matthew, asthe maintainer of the new regex module, what is your stance on thisparticular issue?

From my perspective, I can say, that regular expressions might worthoptimizing especially for web applications (url matching usually usesregexes) but also for other applications where I've seen many tightloops using regexes as well. So, I am probing interest on this topic here.


Best,
Sven

_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] re performance

Reply via email to