Eryk Sun <eryk...@gmail.com> added the comment:

I assume you're linking to the CRT dynamically, which is shared with 
"python39.dll", which means you're sharing the configured locale with Python. 
Since you're not using an isolated configuration, the LC_CTYPE locale will be 
set to the current user's default locale (configured in "HKCU\Control 
Panel\International"). 

If the STDOUT low I/O file is in ANSI text mode, and the LC_CTYPE locale is not 
the default "C" locale, and it's a console file, then C write() does a double 
translated write. First, the UTF-8 byte string is decoded to wide-character 
UTF-16 using the current LC_CTYPE locale encoding. Then the wide-character 
string is encoded back to a byte string using the console output code page. The 
first step leads to mojibake if the locale encoding isn't UTF-8.

At a minimum, you'll need to add `cfg.configure_locale = 0` in order to prevent 
Python from configuring the LC_CTYPE locale to the default user locale. 

That said, your code should be written to work in locales other than the 
default "C" locale. For the past few years, Windows ucrt has supported UTF-8 as 
a locale encoding, such as via setlocale(LC_CTYPE, ".utf8"). Alternatively, or 
in addition to the latter, you can use std::wcout with wide-character strings 
and switch stdout to UTF-8 Unicode mode via _setmode(_fileno(stdout), 
_O_U8TEXT). In this case, the CRT writes to the console via putwch(), which 
calls the wide-character WinAPI function WriteConsoleW(). If your code uses 
UTF-8 byte strings, you'll have to decode them to UTF-16 wide-character strings 
before writing to stdout.

----------
nosy: +eryksun

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43091>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to