New submission from pyos:

The title says it all: if a regular expression that makes use of backreferences 
is compiled with `re.I` flag, it will always fail when matched against a string 
that contains characters outside of U+0000-U+00FF range. I've been unable to 
further narrow the bug down.

A simple example:

    >>> import re
    >>> r = re.compile(r'(a)\1', re.I)  # should match "aa", "aA", "Aa", or "AA"
    >>> r.findall('aa')  # works as expected
    >>> r.findall('aa bcd')  # still works
    >>> r.findall('aa Ā')  # ord('Ā') == 0x0100

The same code works as expected in Python 3.2:

    >>> r.findall('aa Ā')

components: Regular Expressions
messages: 177518
nosy: ezio.melotti, mrabarnett, pitrou, pyos
priority: normal
severity: normal
status: open
title: Backreferences make case-insensitive regex fail on non-ASCII strings.
type: behavior
versions: Python 3.3

Python tracker <>
Python-bugs-list mailing list

Reply via email to