On 2017-01-26 21:13, Sven R. Kunze wrote:
Hi folks,

I recently refreshed regular expressions theoretical basics *indulging
in reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html

However, reaching the chart in the lower third of the article, I saw
Python 2.4 measured against a naive Thompson matching implementation.
And I was surprised about how bad it performed compared to an
unoptimized version of an older than dirt algorithm.

So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilĂ ,
100% cpu and no results so far. Nothing has changed at all since 2007.

  >>> import re
  >>> re.match(r'a?'*30 + r'a'*30, 'a'*30)
.... (still waiting)

Quoting from the article: " Some might argue that this test is unfair to
the backtracking implementations, since it focuses on an uncommon corner
case. This argument misses the point: given a choice between an
implementation with a predictable, consistent, fast running time on all
inputs or one that usually runs quickly but can take years of CPU time
(or more) on some inputs, the decision should be easy."

Victor, as the head of Python performance department, and Matthew, as
the maintainer of the new regex module, what is your stance on this
particular issue?

  From my perspective, I can say, that regular expressions might worth
optimizing especially for web applications (url matching usually uses
regexes) but also for other applications where I've seen many tight
loops using regexes as well. So, I am probing interest on this topic here.
It's possible to add some checks to reduce the problem, as I've done in the regex module, but I'm not sure how easy it would be to retrofit it into the existing re code. (It wasn't pleasant in the regex module, so...).

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to