On 12/05/2010 10:08 PM, MRAB wrote:
I'm looking for examples of regexes which are slow (especially those
which seem never to finish) but whose results are known. I already have
those reported in the bug tracker, but further ones will be welcome.
This is for testing additional modifications to the new regex
implementation (available on PyPI).
There was a DOS security issue in Django about a year back (fixed
the day it came to light in changeset 11603), triggered by a
regexp with a lot of back-tracking:
http://code.djangoproject.com/changeset/11603
which tried to match
email_re = re.compile(
r"(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*"
# dot-atom
r'|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-011\013\014\016-\177])*"'
# quoted-string
r')@(?:[A-Z0-9]+(?:-*[A-Z0-9]+)*\.)+[A-Z]{2,6}$',
re.IGNORECASE) # domain
against
'viewx3dtextx26q...@yahoo.comx26latlngx3d15854521645943074058'
(should return None rather than a MatchObject).
Folks were reporting that it was taking >20min to run.
-tkc
--
http://mail.python.org/mailman/listinfo/python-list