New submission from pyos:
The title says it all: if a regular expression that makes use of backreferences
is compiled with `re.I` flag, it will always fail when matched against a string
that contains characters outside of U+0000-U+00FF range. I've been unable to
further narrow the bug down.
A simple example:
>>> import re
>>> r = re.compile(r'(a)\1', re.I) # should match "aa", "aA", "Aa", or "AA"
>>> r.findall('aa') # works as expected
['a']
>>> r.findall('aa bcd') # still works
['a']
>>> r.findall('aa Ā') # ord('Ā') == 0x0100
[]
The same code works as expected in Python 3.2:
>>> r.findall('aa Ā')
['a']
----------
components: Regular Expressions
messages: 177518
nosy: ezio.melotti, mrabarnett, pitrou, pyos
priority: normal
severity: normal
status: open
title: Backreferences make case-insensitive regex fail on non-ASCII strings.
type: behavior
versions: Python 3.3
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue16688>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com