[issue24896] It is undocumented that re.UNICODE affects re.IGNORECASE

2016-10-16 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
versions: +Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24896] It is undocumented that re.UNICODE affects re.IGNORECASE

2016-01-03 Thread Ezio Melotti

Changes by Ezio Melotti :


--
components: +Regular Expressions
nosy: +ezio.melotti, mrabarnett
stage:  -> needs patch
type:  -> enhancement
versions: +Python 3.5, Python 3.6 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24896] It is undocumented that re.UNICODE affects re.IGNORECASE

2015-08-19 Thread Leif Arne Storset

New submission from Leif Arne Storset:

A non-ASCII string does not match a regular expression case-insensitively
unless the UNICODE flag is set. This seems reasonable, but the documentation
seems to imply that this is not the case.

The example:

import re
# Does not match
re.compile(uнеоднозначность, re.IGNORECASE) \
.findall(uНеоднозначность) 
# Matches
re.compile(uнеоднозначность, re.IGNORECASE | re.UNICODE) \
.findall(uНеоднозначность)

(In Python 3, it does not match if re.ASCII is given.)

The documentation (2.7) says:

re.UNICODE

Make \w, \W, \b, \B, \d, \D, \s and \S dependent on the Unicode character
properties database.

(https://docs.python.org/2/library/re.html#re.UNICODE)

My regex does not use any of those escapes, yet the regex changes behavior with
the UNICODE flag. This leads to confusion when the regex doesn't match. The 
documentation is very specific about the behavior that changes with the flag,
implying that behavior not mentioned is unaffected.

Of course, it's easy to guess the correct (hopefully) solution.

Still, I suggest changing the documentation to mention that re.IGNORECASE is
affected. Looking at the source code, there seems to be further consequences
(it mentions Unicode locale) which may also warrant a mention. If you do want
to avoid specifics, however, even a hand-wavy reference to something like match
according to Unicode would help, because it implies that not only the escapes
change behavior.



In Python 3, there is a counterpart to the 2.7 problem: re.ASCII makes our
Cyrillic string not match. Again, this behavior makes intuitive sense, but the
documentation seems to indicate something different:

re.ASCII
Make \w, \W, \b, \B, \d, \D, \s and \S perform ASCII-only matching instead
of full Unicode matching. This is only meaningful for Unicode patterns, and
is ignored for byte patterns.

…

re.IGNORECASE
Perform case-insensitive matching; expressions like [A-Z] will match
lowercase letters, too. This is not affected by the current locale and
works for Unicode characters as expected.

re.ASCII does appear to affect re.IGNORECASE. Since this is the non-default
case, however, I'm not sure it's worth calling it out. I'd be happy even if
only the 2.7 docs change.

--
assignee: docs@python
components: Documentation
messages: 248829
nosy: Leif Arne Storset, docs@python
priority: normal
severity: normal
status: open
title: It is undocumented that re.UNICODE affects re.IGNORECASE
versions: Python 2.7, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24896
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24896] It is undocumented that re.UNICODE affects re.IGNORECASE

2015-08-19 Thread R. David Murray

R. David Murray added the comment:

I think it would be reasonable to add re.IGNORECASE to the list of things 
affected, since it obviously does switch between using the unicode database and 
not doing so.

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24896
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com