Tim Peters added the comment:

There's actually enormous backtracking here.  Try this much shorter regexp and 
you'll see much the same behavior:

re_utf8 = r'^([\x00-\x7f]+)*$'

That's the original re_utf8 with all but the first alternative removed.

Looks like passing s[0:34] "works" because it eliminates the trailing \x8d that 
prevents the regexp from matching the whole string.  Because the regexp cannot 
match the whole string, it takes a very long time to try all the futile 
combinations implied by the nested quantifiers.  As the much simpler re_utf8 
above shows, it's not the alternatives in the regexp that matter here, it's the 
nested quantifiers.

----------
nosy: +tim_one

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16563>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to