New submission from L. Peter Deutsch:

I've read a number of reports of exponential-time regexp matching, but this 
regexp uses no unusual features, requires no backtracking, and only loops 
"forever" on certain input strings.

I listed the Python version # as 2.6; I actually observed the behavior in 2.5.1 
and 2.5.2, but I'm almost certain it's still there, because I saw the same 
behavior in a very recent build of Google's V8 interpreter, which I believe 
uses the same regexp engine.

Here's the test case:

import re
re_utf8 = 
r'^([\x00-\x7f]+|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xf0-\xf7][\x80-\xbf][\x80-\xbf][\x80-\xbf])*$'
s = 
"\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x14\x00\x00\x00\x01\x00\x00,`\x00\x00\x004\x00\x01\x8d"
print re.match(re_utf8, s)

If you pass s[0:34] or s[34:35] as the argument of re.match, it returns the 
correct answer, but the code above loops apparently forever.

----------
components: Regular Expressions
messages: 176458
nosy: ezio.melotti, lpd, mrabarnett
priority: normal
severity: normal
status: open
title: re.match loops forever on simple regexp
type: crash
versions: Python 2.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16563>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to