Re: Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?
On 11/13/22, Jessica Smith <12jessicasmit...@gmail.com> wrote: > Consider the following code ran in Powershell or cmd.exe: > > $ python -c "print('└')" > └ > > $ python -c "print('└')" > test_file.txt > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in > encode > return codecs.charmap_encode(input,self.errors,encoding_table)[0] > UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in > position 0: character maps to If your applications and existing data files are compatible with using UTF-8, then in Windows 10+ you can modify the administrative regional settings in the control panel to force using UTF-8. In this case, GetACP() and GetOEMCP() will return CP_UTF8 (65001), and the reserved code page constants CP_ACP (0), CP_OEMCP (1), CP_MACCP (2), and CP_THREAD_ACP (3) will use CP_UTF8. You can override this on a per-application basis via the ActiveCodePage setting in the manifest: https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activecodepage In Windows 10, this setting only supports "UTF-8". In Windows 11, it also supports "legacy" to allow old applications to run on a system that's configured to use UTF-8. Setting an explicit locale is also supported in Windows 11, such as "en-US", with fallback to UTF-8 if the given locale has no legacy code page. Note that setting the system to use UTF-8 also affects the host process for console sessions (i.e. conhost.exe or openconsole.exe), since it defaults to using the OEM code page (UTF-8 in this case). Unfortunately, a legacy read from the console host does not support reading non-ASCII text as UTF-8. For example: >>> os.read(0, 6) SPĀM b'SP\x00M\r\n' This is a trivial bug in the console host, which stems from the fact that UTF-8 is a multibyte encoding (1-4 bytes per code), but for some reason the console team at Microsoft still hasn't fixed it. You can use chcp.com to set the console's input and output code pages to something other than UTF-8 if you have to read non-ASCII input in a legacy console app. By default, this problem doesn't affect Python's sys.stdin, which internally uses wide-character ReadConsoleW() with the system's native text encoding, UTF-16LE. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?
On 11/13/2022 9:49 AM, Jessica Smith wrote: Consider the following code ran in Powershell or cmd.exe: $ python -c "print('└')" └ $ python -c "print('└')" > test_file.txt Traceback (most recent call last): File "", line 1, in File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in position 0: character maps to Is this a known limitation of Windows + Unicode? I understand that using -x utf8 would fix this, or modifying various environment variables. But is this expected for a standard Python installation on Windows? Jessica This also fails with the same error: $ python -c "print('└')" |clip -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?
> On 13 Nov 2022, at 14:52, Jessica Smith <12jessicasmit...@gmail.com> wrote: > > Consider the following code ran in Powershell or cmd.exe: > > $ python -c "print('└')" > └ > > $ python -c "print('└')" > test_file.txt > Traceback (most recent call last): > File "", line 1, in > File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode >return codecs.charmap_encode(input,self.errors,encoding_table)[0] > UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in > position 0: character maps to > > Is this a known limitation of Windows + Unicode? I understand that > using -x utf8 would fix this, or modifying various environment > variables. But is this expected for a standard Python installation on > Windows? Your other thread has a reply that explained this. It is a problem with windows and character sets. You have to set things up to allow Unicode to work. Barry > > Jessica > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Python 3.7+ cannot print unicode characters when output is redirected to file - is this a bug?
Consider the following code ran in Powershell or cmd.exe: $ python -c "print('└')" └ $ python -c "print('└')" > test_file.txt Traceback (most recent call last): File "", line 1, in File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in position 0: character maps to Is this a known limitation of Windows + Unicode? I understand that using -x utf8 would fix this, or modifying various environment variables. But is this expected for a standard Python installation on Windows? Jessica -- https://mail.python.org/mailman/listinfo/python-list