Eryk Sun <eryk...@gmail.com> added the comment: I think this is a locale configuration problem, in which the locale encoding doesn't match the terminal encoding. If so, it can be closed as not a bug.
> export a="中文" In POSIX, the shell reads "中文" from the terminal as bytes encoded in the terminal encoding, which could be UTF-8 or some legacy encoding. The value of `a` is set directly as this encoded text. There is no intermediate decode/encode stage in the shell. For a child process that decodes the value of the environment variable, as Python does, the locale's LC_CTYPE encoding should be the same or compatible with the terminal encoding. > job_name = os.environ['a'] > print(job_name) In POSIX, sys.stdout.errors, as used by print(), will be "surrogateescape" if the default LC_CTYPE locale is a legacy locale -- which in 3.6 is the case for the "C" locale, since it's usually limited to 7-bit ASCII. "surrogateescape" is also the errors handler for decoding bytes os.environb (POSIX) as text os.environ. When decoding, "surrogateescape" handles non-ASCII byte values that can't be decoded by translating the value into the reserved surrogate range U+DC80 - U+DCFF. When encoding, it translates each surrogate code back to the original byte value in the range 0x80 - 0xFF. Given the above setup, byte sequences in os.environb that can't be decoded with the default LC_CTYPE locale encoding will be surrogate escaped in the decoded text The surrogate-escaped values roundtrip back to bytes when printed, presumably as the terminal encoding. > with open('name.txt', 'w', encoding='utf-8')as fw: > fw.write(job_name) The default errors handler for open() is "strict" instead of "surrogateescape", so the surrogate-escaped values in job_name cause the encoding to fail. > Your code runs for me on Windows In Windows, Python uses the wide-character (16-bit wchar_t) environment of the process for os.environ, and, in 3.6+, it uses the console session's wide-character API for console files such as sys.std* when they aren't redirected to a pipe or disk file. Conventionally, wide-character strings should be valid UTF-16LE text. So getting "中文" from os.environ and printing it should 'just work'. The output will even be displayed correctly if the console session uses a font that supports "中文", or if it's a pseudoconsole (conpty) session that's attached to a terminal that supports automatic font fallback, such as Windows Terminal. ---------- components: +IO, Interpreter Core, Library (Lib), Unicode -C API nosy: +eryksun, ezio.melotti, vstinner _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue43576> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com