New submission from Segev Finer <[email protected]>:
Found by trying to use pip: https://github.com/pypa/pip/issues/5665.
This is likely affected by the console code page.
Python version: 2.7.15 64 bit
OS: Windows 10.0.17134.165 x64
The console locale is set to cp872.
The console font is consolas.
Apparently, msvcrt does charset conversion when writing to its file descriptors
based on the set locale! and it's even special cased to handle the OEM console
code page (You can see this in crt/src/write.c:_write_nolock if you have MSVC
2008).
When the "C" locale is set, no conversion is done. Python encodes to the OEM
code page, and it passes through to the console unscathed. But once you do
setlocale than the CRT expects you to use the ANSI code page, but Python will
be encoding to the OEM code page which will result in this error from fwrite.
file.encoding in Python 2 is also not settable directly from Python (C API
only), it's only used for stdio and set internally on startup:
Python/pythonrun.c:349-378.
I found this describing this: Why printf can display non-ASCII characters when
āCā locale is used?.
#!/usr/bin/env python2
from __future__ import print_function
import locale
print(u' |\u2588') # Works
locale.setlocale(locale.LC_ALL, '')
print(u' |\u2588') # IOError: [Errno 42] Illegal byte sequence
----------
components: Windows
messages: 322683
nosy: Segev Finer, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: Python 2 mishandles console code page after setlocale
type: behavior
versions: Python 2.7
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34283>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com