Eryk Sun <eryk...@gmail.com> added the comment:

> FYI, I expect cp65001 will be used more widely in near future,
[...]
> It seems use `SetConsoleOutputCP(65001)` and `SetConsoleCP(65001)`.

Unless PYTHONLEGACYWINDOWSSTDIO is defined, Python 3.6+ doesn't use the 
console's codepage-based interface (except for low-level os.read and os.write). 
Console files uses the wide-character console API internally, and have a 
"utf-8" encoding. "cp65001" isn't a factor in this context.

This issue probably occurs due to the encoding returned by 
locale.getpreferredencoding(). This calls _locale._getdefaultlocale, which 
returns a tuple that mixes the user locale with the system ANSI codepage. For 
example, with ANSI set to UTF-8 (Windows 10):

    >>> _locale._getdefaultlocale()
    ('en_GB', 'cp65001')

The Universal CRT special cases CP_UTF8 (codepage 65001) as "utf8" and accepts 
"utf-8" as an alias. For example, after setting the ANSI codepage to UTF-8:

    >>> locale.setlocale(locale.LC_CTYPE, '')
    'English_United Kingdom.utf8'

Python could similarly special case CP_UTF8 as "utf-8" in 
_locale._getdefaultlocale.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36778>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to