Re: [Tutor] logging to cmd.exe
On 26/09/2017 12:22, Albert-Jan Roskam wrote: PS: sorry about the missing quote (>>) markers. Hotmail can't do this. Is Gmail better? > Get a decent email client and it'll do the work for you. I use Thunderbird on Windows with hotmail, gmail and yahoo addresses and never have a problem. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email has been checked for viruses by AVG. http://www.avg.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] logging to cmd.exe
On Tue, Sep 26, 2017 at 6:22 AM, Albert-Jan Roskam wrote: > PS: sorry about the missing quote (>>) markers. Hotmail can't do this. Is > Gmail better? Yeah, in Gmail it will handle the quote markers when doing plain text. -- boB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] logging to cmd.exe
Dear Mats, Peter and Eryk, THANK YOU for your replies. What a wealth of information! Have a great weekend! Albert-Jan ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] logging to cmd.exe
Albert-Jan Roskam wrote: [me] > Or you follow the convention and log to stderr: > > $ python3 -c 'import sys; print("\udc85", file=sys.stderr)' > \udc85 > $ $ python3 -c 'import logging; logging.basicConfig(); > logging.getLogger().warn("\udc85")' > to_prove_it_s_not_stdout > WARNING:root:\udc85 [Albert-Jan] > That's perhaps the best choice. But will messages with logging > level warning and lower also be logged to stderr? That are two distinct aspects. You may specify both what is logged and where it is logged. The easiest way to set up the filter is again basicConfig(): $ python3 -c 'from logging import *; basicConfig(); warn("important"); info("nice to know")' WARNING:root:important While the default level is WARNING you may specify something else: $ python3 -c 'from logging import *; basicConfig(level=INFO); warn("important"); info("nice to know")' WARNING:root:important INFO:root:nice to know ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] logging to cmd.exe
From: Tutor on behalf of Peter Otten <__pete...@web.de> Sent: Monday, September 25, 2017 2:59 PM To: tutor@python.org Subject: Re: [Tutor] logging to cmd.exe Albert-Jan Roskam wrote: > Hi, > > > With Python 3.5 under Windows I am using the logging module to log > messages to stdout (and to a file), but this occasionally causes logging > errors because some characters cannot be represented in the codepage used > by cmd.exe (cp850, aka OEM codepage, I think). What is the best way to > prevent this from happening? The program runs fine, but the error is > distracting. I know I can use s.encode(sys.stdout.encoding, 'replace') and > log that, but this is ugly and tedious to do when there are many log > messages. I also don't understand why %r (instead of %s) still causes an > error. I thought that the character representation uses only ascii > characters?! Not in Python 3. You can enforce ascii with "%a": >>> euro = '\u20ac' >>> print("%r" % euro) '€' >>> print("%a" % euro) '\u20ac' > aaahh, I did not know about %a. Thank you! Or you can set an error handler with PYTHONIOENCODING (I have to use something that is not utf-8-encodable for the demo): $ python3 -c 'print("\udc85")' Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\udc85' in position 0: surrogates not allowed $ PYTHONIOENCODING=:backslashreplace python3 -c 'print("\udc85")' \udc85 > Nice to know about this variable, though I prefer not to change the environment because other will need to do the same. For others who would like to read more: https://docs.python.org/3/using/cmdline.html Or you follow the convention and log to stderr: $ python3 -c 'import sys; print("\udc85", file=sys.stderr)' \udc85 $ $ python3 -c 'import logging; logging.basicConfig(); logging.getLogger().warn("\udc85")' > to_prove_it_s_not_stdout WARNING:root:\udc85 > That's perhaps the best choice. But will messages with logging level warning and lower also be logged to stderr? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] logging to cmd.exe
On Tue, Sep 26, 2017 at 7:35 AM, Mats Wichmann wrote: > On 09/26/2017 05:22 AM, Albert-Jan Roskam wrote: > >> Rather than change your code can you change the codepage with the chcp >> command? > > the way chcp takes effect is problematic for this: > > "Programs that you start after you assign a new code page use the new > code page, however, programs (except Cmd.exe) that you started before > assigning the new code page use the original code page. " Some console applications only check the codepage at startup. If you change it while the program is running, they'll encode/decode text for the original codepage, but the console will decode/encode it for its current codepage. That's called mojibake. Prior to 3.6, at startup Python uses the input codepage for sys.stdin, and the output codepage for sys.stdout and sys.stderr. You can of course rebind sys.std* if you change the codepage via chcp.com or SetConsoleCP() and SetConsoleOutputCP(). If you do change the codepage, it's considerate to remember the previous value and restore it in an atexit function. > I think there's also a module you can use for pre-3.6, sorry too lazy to > do a search. It's win_unicode_console [1]. [1]: https://pypi.python.org/pypi/win_unicode_console ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] logging to cmd.exe
> cmd.exe can use cp65001 aka utf8??? CMD is a Unicode application that for the most part uses WinAPI wide-character functions, including the console API functions (as does Python 3.6+). There are a few exceptions. CMD uses the console codepage when decoding batch files (line by line, so you can change the codepage in the middle of a batch script), when writing output from its internal commands (e.g. dir) to pipes and files (the /u option overrides this), and when reading output from programs in a `FOR /F` loop. > Why does cmd.exe still use cp850? In the above cases CMD uses the active console input or output codepage, which defaults to the system locale's OEM codepage. If it's not attached to a console (i.e. when run as a DETACHED_PROCESS), CMD uses the ANSI codepage in these cases. Anyway, you appear to be talking about the Windows console, which people often confuse with CMD. Programs that use command-line interfaces (CLIs) and text user interfaces (TUIs), such as classic system shells, are clients of a given console or terminal interface. A TUI application typically is tightly integrated with the console or terminal interface (e.g. a curses application), while a CLI application typically just uses standard I/O (stdin, stdout, stderr). Both cmd.exe and python.exe are Windows console clients. There's nothing special about cmd.exe in this regard. Now, there are a couple of significant problems with using codepage 65001 in the Windows console. Prior to Windows 8, WriteFile and WriteConsoleA return the number of decoded wide characters written to the console, which is a bug because they're supposed to return the number of bytes written. It's not a problem so long as there's a one-to-mapping between bytes and characters in the console's output codepage. But UTF-8 can have up to 4 bytes per character. This misleads buffered writers such as C FILE streams and Python 3's io module, which in turn causes gibberish to be printed after every write of a string that includes non-ASCII characters. Prior to Windows 10, with codepage 65001, reading input from the console via ReadConsole or ReadConsoleA fails if the input has non-ASCII characters. It gets reported as a successful read of zero bytes. This causes Python to think it's at EOF, so the REPL quits (as if Ctrl+Z had been entered) and input() raises EOFError. Even in Windows 10, while the entire read doesn't fail, it's not much better. It replaces non-ASCII characters with NUL bytes. For example, in Windows 10.0.15063: >>> os.read(0, 100) abcαβγdef b'abc\x00\x00\x00def\r\n' Microsoft is gradually working on fixing UTF-8 support in the console (well, two developers are working on it). They appear to have fixed it at least for the private console APIs used by the new Linux subsystem in Windows 10: Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> s = os.read(0, 100) abcαβγdef >>> s b'abc\xce\xb1\xce\xb2\xce\xb3def\n' >>> s.decode() 'abcαβγdef\n' Maybe it's fixed in the Windows API in an upcoming update. But still, there are a lot of Windows 7 and 8 systems out there, for which codepage 65001 in the console will remain broken. > I always thought 65001 was not a 'real' codepage, even though some locales > (e.g. Georgia) use it [1]. Codepage 65001 isn't used by any system locale as the legacy ANSI or OEM codepage. The console allows it probably because no one thought to prevent using it in the late 1990s. It has been buggy for two decades. Moodle seems to have special support for using UTF-8 with Georgian. But as far as Windows is concerned, there is no legacy codepage for Georgian. For example: import ctypes kernel32 = ctypes.WinDLL('kernel32', use_last_error=True) LD_ACP = LOCALE_IDEFAULTANSICODEPAGE = 0x1004 acp = (ctypes.c_wchar * 6)() >>> kernel32.GetLocaleInfoEx('ka-GE', LD_ACP, acp, 6) 2 >>> acp.value '0' A value of zero here means no ANSI codepage is defined [1]: If no ANSI code page is available, only Unicode can be used for the locale. In this case, the value is CP_ACP (0). Such a locale cannot be set as the system locale. Applications that do not support Unicode do not work correctly with locales marked as "Unicode only". Georgian (ka-GE) is a Unicode-only locale [2] that cannot be set as the system locale. [1]: https://msdn.microsoft.com/en-us/library/dd373761 [2]: https://msdn.microsoft.com/en-us/library/ms930130.aspx ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] logging to cmd.exe
On 09/26/2017 05:22 AM, Albert-Jan Roskam wrote: > Rather than change your code can you change the codepage with the chcp > command? the way chcp takes effect is problematic for this: "Programs that you start after you assign a new code page use the new code page, however, programs (except Cmd.exe) that you started before assigning the new code page use the original code page. " so making the change from inside the code does not seem like it will work. > > Good to keep in mind, but my objection would be similar to that > with specifying the PYTHONIOENCODING variable: one needs to change the > environment first before the script runs without errors. quicktip: try Python 3.6. It's had a change in this area and no longer uses the code page. I think there's also a module you can use for pre-3.6, sorry too lazy to do a search. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] logging to cmd.exe
From: Tutor on behalf of Mark Lawrence via Tutor Sent: Monday, September 25, 2017 4:19 PM To: tutor@python.org Subject: Re: [Tutor] logging to cmd.exe On 25/09/2017 14:20, Albert-Jan Roskam wrote: > Hi, > > > With Python 3.5 under Windows I am using the logging module to log messages > to stdout (and to a file), but this occasionally causes logging errors > because some characters cannot be represented in the codepage used by cmd.exe > (cp850, aka OEM codepage, I think). What is the best way to prevent this > from happening? The program runs fine, but the error is distracting. I know I > can use s.encode(sys.stdout.encoding, 'replace') and log that, but this is > ugly and tedious to do when there are many log messages. I also don't > understand why %r (instead of %s) still causes an error. I thought that the > character representation uses only ascii characters?! > > > import logging > import sys > > assert sys.version_info.major > 2 > logging.basicConfig(filename="d:/log.txt", > level=logging.DEBUG,format='%(asctime)s %(message)s') > handler = logging.StreamHandler(stream=sys.stdout) > logger = logging.getLogger(__name__) > logger.addHandler(handler) > > s = '\u20ac' > logger.info("euro sign: %r", s) > > > > --- Logging error --- > Traceback (most recent call last): > File "c:\python3.5\lib\logging\__init__.py", line 982, in emit > stream.write(msg) > File "c:\python3.5\lib\encodings\cp850.py", line 19, in encode > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in > position 12: character maps to > Call stack: > File "q:\temp\logcheck.py", line 10, in > logger.info("euro sign: %r", s) > Message: 'euro sign: %r' > Arguments: ('\u20ac',) > > > Thanks in advance for your replies! > > > Albert-Jan > Rather than change your code can you change the codepage with the chcp command? > Good to keep in mind, but my objection would be similar to that with specifying the PYTHONIOENCODING variable: one needs to change the environment first before the script runs without errors. C:\Users\Mark\Documents\MyPython>chcp Active code page: 65001 > Wow! cmd.exe can use cp65001 aka utf8??? I always thought 65001 was not a 'real' codepage, even though some locales (e.g. Georgia) use it [1]. Why does cmd.exe still use cp850? [1] https://docs.moodle.org/dev/Table_of_locales PS: sorry about the missing quote (>>) markers. Hotmail can't do this. Is Gmail better? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] logging to cmd.exe
On 25/09/2017 14:20, Albert-Jan Roskam wrote: Hi, With Python 3.5 under Windows I am using the logging module to log messages to stdout (and to a file), but this occasionally causes logging errors because some characters cannot be represented in the codepage used by cmd.exe (cp850, aka OEM codepage, I think). What is the best way to prevent this from happening? The program runs fine, but the error is distracting. I know I can use s.encode(sys.stdout.encoding, 'replace') and log that, but this is ugly and tedious to do when there are many log messages. I also don't understand why %r (instead of %s) still causes an error. I thought that the character representation uses only ascii characters?! import logging import sys assert sys.version_info.major > 2 logging.basicConfig(filename="d:/log.txt", level=logging.DEBUG,format='%(asctime)s %(message)s') handler = logging.StreamHandler(stream=sys.stdout) logger = logging.getLogger(__name__) logger.addHandler(handler) s = '\u20ac' logger.info("euro sign: %r", s) --- Logging error --- Traceback (most recent call last): File "c:\python3.5\lib\logging\__init__.py", line 982, in emit stream.write(msg) File "c:\python3.5\lib\encodings\cp850.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 12: character maps to Call stack: File "q:\temp\logcheck.py", line 10, in logger.info("euro sign: %r", s) Message: 'euro sign: %r' Arguments: ('\u20ac',) Thanks in advance for your replies! Albert-Jan Rather than change your code can you change the codepage with the chcp command? C:\Users\Mark\Documents\MyPython>chcp Active code page: 65001 C:\Users\Mark\Documents\MyPython>type mytest.py import logging import sys assert sys.version_info.major > 2 logging.basicConfig(filename="d:/log.txt", level=logging.DEBUG,format='%(asctime)s %(message)s') handler = logging.StreamHandler(stream=sys.stdout) logger = logging.getLogger(__name__) logger.addHandler(handler) s = '\u20ac' logger.info("euro sign: %r", s) C:\Users\Mark\Documents\MyPython>mytest.py euro sign: '€' -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email has been checked for viruses by AVG. http://www.avg.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] logging to cmd.exe
Albert-Jan Roskam wrote: > Hi, > > > With Python 3.5 under Windows I am using the logging module to log > messages to stdout (and to a file), but this occasionally causes logging > errors because some characters cannot be represented in the codepage used > by cmd.exe (cp850, aka OEM codepage, I think). What is the best way to > prevent this from happening? The program runs fine, but the error is > distracting. I know I can use s.encode(sys.stdout.encoding, 'replace') and > log that, but this is ugly and tedious to do when there are many log > messages. I also don't understand why %r (instead of %s) still causes an > error. I thought that the character representation uses only ascii > characters?! Not in Python 3. You can enforce ascii with "%a": >>> euro = '\u20ac' >>> print("%r" % euro) '€' >>> print("%a" % euro) '\u20ac' Or you can set an error handler with PYTHONIOENCODING (I have to use something that is not utf-8-encodable for the demo): $ python3 -c 'print("\udc85")' Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\udc85' in position 0: surrogates not allowed $ PYTHONIOENCODING=:backslashreplace python3 -c 'print("\udc85")' \udc85 Or you follow the convention and log to stderr: $ python3 -c 'import sys; print("\udc85", file=sys.stderr)' \udc85 $ $ python3 -c 'import logging; logging.basicConfig(); logging.getLogger().warn("\udc85")' > to_prove_it_s_not_stdout WARNING:root:\udc85 > import logging > import sys > > assert sys.version_info.major > 2 > logging.basicConfig(filename="d:/log.txt", > level=logging.DEBUG,format='%(asctime)s %(message)s') handler = > logging.StreamHandler(stream=sys.stdout) logger = > logging.getLogger(__name__) logger.addHandler(handler) > > s = '\u20ac' > logger.info("euro sign: %r", s) > > > > --- Logging error --- > Traceback (most recent call last): > File "c:\python3.5\lib\logging\__init__.py", line 982, in emit > stream.write(msg) > File "c:\python3.5\lib\encodings\cp850.py", line 19, in encode > return codecs.charmap_encode(input,self.errors,encoding_map)[0] > UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in > position 12: character maps to Call stack: > File "q:\temp\logcheck.py", line 10, in > logger.info("euro sign: %r", s) > Message: 'euro sign: %r' > Arguments: ('\u20ac',) > > > Thanks in advance for your replies! > > > Albert-Jan > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] logging to cmd.exe
Hi, With Python 3.5 under Windows I am using the logging module to log messages to stdout (and to a file), but this occasionally causes logging errors because some characters cannot be represented in the codepage used by cmd.exe (cp850, aka OEM codepage, I think). What is the best way to prevent this from happening? The program runs fine, but the error is distracting. I know I can use s.encode(sys.stdout.encoding, 'replace') and log that, but this is ugly and tedious to do when there are many log messages. I also don't understand why %r (instead of %s) still causes an error. I thought that the character representation uses only ascii characters?! import logging import sys assert sys.version_info.major > 2 logging.basicConfig(filename="d:/log.txt", level=logging.DEBUG,format='%(asctime)s %(message)s') handler = logging.StreamHandler(stream=sys.stdout) logger = logging.getLogger(__name__) logger.addHandler(handler) s = '\u20ac' logger.info("euro sign: %r", s) --- Logging error --- Traceback (most recent call last): File "c:\python3.5\lib\logging\__init__.py", line 982, in emit stream.write(msg) File "c:\python3.5\lib\encodings\cp850.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 12: character maps to Call stack: File "q:\temp\logcheck.py", line 10, in logger.info("euro sign: %r", s) Message: 'euro sign: %r' Arguments: ('\u20ac',) Thanks in advance for your replies! Albert-Jan ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor