On 12/05/2010 10:08 PM, MRAB wrote:
I'm looking for examples of regexes which are slow (especially those
which seem never to finish) but whose results are known. I already have
those reported in the bug tracker, but further ones will be welcome.

This is for testing additional modifications to the new regex
implementation (available on PyPI).

There was a DOS security issue in Django about a year back (fixed the day it came to light in changeset 11603), triggered by a regexp with a lot of back-tracking:

http://code.djangoproject.com/changeset/11603

which tried to match

email_re = re.compile(

r"(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*" # dot-atom

r'|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-011\013\014\016-\177])*"' # quoted-string r')@(?:[A-Z0-9]+(?:-*[A-Z0-9]+)*\.)+[A-Z]{2,6}$', re.IGNORECASE) # domain

against

'viewx3dtextx26q...@yahoo.comx26latlngx3d15854521645943074058'

(should return None rather than a MatchObject).

Folks were reporting that it was taking >20min to run.

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to