[issue2834] re.IGNORECASE not Unicode-ready

Antoine Pitrou Sat, 28 Jun 2008 13:27:38 -0700

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

Uh, actually, it works if you specify re.UNICODE. If you don't, the
getlower() function in _sre.c falls back to the plain ASCII algorithm.


>>> pat = re.compile('Á', re.IGNORECASE | re.UNICODE)
>>> pat.match('á')
<_sre.SRE_Match object at 0xb7c66c28>
>>> pat.match('Á')
<_sre.SRE_Match object at 0xb7c66cd0>

I wonder if re.UNICODE shouldn't be the default in Py3k, at least when
the pattern is a string and not a bytes object. There may also be a
re.ASCII flag for those cases where people want to fallback to the old
behaviour.

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue2834>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue2834] re.IGNORECASE not Unicode-ready

Reply via email to