Tim Peters added the comment: There's actually enormous backtracking here. Try this much shorter regexp and you'll see much the same behavior:
re_utf8 = r'^([\x00-\x7f]+)*$' That's the original re_utf8 with all but the first alternative removed. Looks like passing s[0:34] "works" because it eliminates the trailing \x8d that prevents the regexp from matching the whole string. Because the regexp cannot match the whole string, it takes a very long time to try all the futile combinations implied by the nested quantifiers. As the much simpler re_utf8 above shows, it's not the alternatives in the regexp that matter here, it's the nested quantifiers. ---------- nosy: +tim_one _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16563> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com