Eryk Sun <eryk...@gmail.com> added the comment:
> FYI, I expect cp65001 will be used more widely in near future, [...] > It seems use `SetConsoleOutputCP(65001)` and `SetConsoleCP(65001)`. Unless PYTHONLEGACYWINDOWSSTDIO is defined, Python 3.6+ doesn't use the console's codepage-based interface (except for low-level os.read and os.write). Console files uses the wide-character console API internally, and have a "utf-8" encoding. "cp65001" isn't a factor in this context. This issue probably occurs due to the encoding returned by locale.getpreferredencoding(). This calls _locale._getdefaultlocale, which returns a tuple that mixes the user locale with the system ANSI codepage. For example, with ANSI set to UTF-8 (Windows 10): >>> _locale._getdefaultlocale() ('en_GB', 'cp65001') The Universal CRT special cases CP_UTF8 (codepage 65001) as "utf8" and accepts "utf-8" as an alias. For example, after setting the ANSI codepage to UTF-8: >>> locale.setlocale(locale.LC_CTYPE, '') 'English_United Kingdom.utf8' Python could similarly special case CP_UTF8 as "utf-8" in _locale._getdefaultlocale. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue36778> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com