New submission from Serhiy Storchaka:

Locale-specific case-insensitive regular expression matching works only when 
the pattern was compiled on the same locale as used for matching. Due to 
caching this can cause unexpected result.

Attached script demonstrates this (it requires two locales: ru_RU.koi8-r and 
ru_RU.cp1251). The output is:

locale ru_RU.koi8-r
  b'1\xa3' ('1ё') matches b'1\xb3' ('1Ё')
  b'1\xa3' ('1ё') doesn't match b'1\xbc' ('1╪')
locale ru_RU.cp1251
  b'1\xa3' ('1Ј') doesn't match b'1\xb3' ('1і')
  b'1\xa3' ('1Ј') matches b'1\xbc' ('1ј')
locale ru_RU.cp1251
  b'2\xa3' ('2Ј') doesn't match b'2\xb3' ('2і')
  b'2\xa3' ('2Ј') matches b'2\xbc' ('2ј')
locale ru_RU.koi8-r
  b'2\xa3' ('2ё') doesn't match b'2\xb3' ('2Ё')
  b'2\xa3' ('2ё') matches b'2\xbc' ('2╪')

b'\xa3' matches b'\xb3' on KOI8-R locale if the pattern was compiled on KOI8-R 
locale and matches b'\xb3' if the pattern was compiled on CP1251 locale.

I see three possible ways to solve this issue:

1. Avoid caching of locale-depending case-insensitive patterns. This definitely 
will decrease performance of the use of locale-depending case-insensitive 
regexps (if user don't use own caching) and may be slightly decrease 
performance of the use of other regexps.

2. Clear precompiled regexps cache on every locale change. This can look 
simpler, but is vulnerable to locale changes from extensions.

3. Do not lowercase characters at compile time (in locale-depending 
case-insensitive patterns). This needs to introduce new opcode for 
case-insensitivity matching or at least rewriting implementation of current 
opcodes (less efficient). On other way, this is more correct implementation 
than current one. The problem is that this is incompatible with those 
distributions which updates only Python library but not statically linked 
binary (e.g. Vim with Python support). May be there are some workarounds.

----------
components: Extension Modules, Library (Lib), Regular Expressions
files: re_locale_caching.py
messages: 226874
nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Locale dependent regexps on different locales
type: behavior
versions: Python 2.7, Python 3.4, Python 3.5
Added file: http://bugs.python.org/file36616/re_locale_caching.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22410>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to