New submission from pyos: The title says it all: if a regular expression that makes use of backreferences is compiled with `re.I` flag, it will always fail when matched against a string that contains characters outside of U+0000-U+00FF range. I've been unable to further narrow the bug down.
A simple example: >>> import re >>> r = re.compile(r'(a)\1', re.I) # should match "aa", "aA", "Aa", or "AA" >>> r.findall('aa') # works as expected ['a'] >>> r.findall('aa bcd') # still works ['a'] >>> r.findall('aa Ā') # ord('Ā') == 0x0100 [] The same code works as expected in Python 3.2: >>> r.findall('aa Ā') ['a'] ---------- components: Regular Expressions messages: 177518 nosy: ezio.melotti, mrabarnett, pitrou, pyos priority: normal severity: normal status: open title: Backreferences make case-insensitive regex fail on non-ASCII strings. type: behavior versions: Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16688> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com