STINNER Victor <[email protected]> added the comment:
Ah, I can reproduce the bug on Fedora 29 using "LANG=en_IN ./python -m test -v
test_re".
The problem is that locale.getlocale() is not reliable: it pretends that the
locale encoding is ISO8859-1, whereas the real encoding is UTF-8:
$ LANG=en_IN ./python
Python 3.8.0a2+ (heads/master:4cbea518a0, Feb 28 2019, 18:19:44)
>>> chr(224).encode('ISO8859-1')
b'\xe0'
>>> import _testcapi
>>> _testcapi.DecodeLocaleEx(b'\xe0', 0, 'strict')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: decode error: pos=0, reason=decoding error
>>> import locale
# Wrong encoding
>>> locale.getlocale(locale.LC_CTYPE)
('en_IN', 'ISO8859-1')
>>> locale.setlocale(locale.LC_CTYPE, None)
'en_IN'
>>> locale._parse_localename('en_IN')
('en_IN', 'ISO8859-1')
# Real encoding
>>> locale.getpreferredencoding()
'UTF-8'
>>> locale.nl_langinfo(locale.CODESET)
'UTF-8'
Attached PR 12099 fix the issue.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue29571>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com