Re: [Tutor] logging to cmd.exe

2017-10-01 Thread Mark Lawrence via Tutor

On 26/09/2017 12:22, Albert-Jan Roskam wrote:


PS: sorry about the missing quote (>>) markers. Hotmail can't do this. Is Gmail 
better?

>

Get a decent email client and it'll do the work for you.  I use 
Thunderbird on Windows with hotmail, gmail and yahoo addresses and never 
have a problem.


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

---
This email has been checked for viruses by AVG.
http://www.avg.com


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] logging to cmd.exe

2017-09-30 Thread boB Stepp
On Tue, Sep 26, 2017 at 6:22 AM, Albert-Jan Roskam
 wrote:


> PS: sorry about the missing quote (>>) markers. Hotmail can't do this. Is 
> Gmail better?

Yeah, in Gmail it will handle the quote markers when doing plain text.


-- 
boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] logging to cmd.exe

2017-09-29 Thread Albert-Jan Roskam
Dear Mats, Peter and Eryk,

THANK YOU for your replies. What a wealth of information!

Have a great weekend!

Albert-Jan
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] logging to cmd.exe

2017-09-26 Thread Peter Otten
Albert-Jan Roskam wrote:

[me]
> Or you follow the convention and log to stderr:
> 
> $ python3 -c 'import sys; print("\udc85", file=sys.stderr)'
> \udc85
> $ $ python3 -c 'import logging; logging.basicConfig();
> logging.getLogger().warn("\udc85")' > to_prove_it_s_not_stdout
> WARNING:root:\udc85

[Albert-Jan]
> That's perhaps the best choice. But will messages with logging
> level warning and lower also be logged to stderr?

That are two distinct aspects. You may specify both what is logged and where 
it is logged. 

The easiest way to set up the filter is again basicConfig():

$ python3 -c 'from logging import *; basicConfig(); warn("important"); 
info("nice to know")'
WARNING:root:important

While the default level is WARNING you may specify something else:

$ python3 -c 'from logging import *; basicConfig(level=INFO); 
warn("important"); info("nice to know")'
WARNING:root:important
INFO:root:nice to know


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] logging to cmd.exe

2017-09-26 Thread Albert-Jan Roskam
 From: Tutor  on behalf of 
Peter Otten <__pete...@web.de>
 Sent: Monday, September 25, 2017 2:59 PM
 To: tutor@python.org
 Subject: Re: [Tutor] logging to cmd.exe
 
 Albert-Jan Roskam wrote:

 > Hi,
 > 
 > 
 > With Python 3.5 under Windows I am using the logging module to log
 > messages to stdout (and to a file), but this occasionally causes logging
 > errors because some characters cannot be represented in the codepage used
 > by cmd.exe (cp850, aka OEM codepage, I think). What is the best way to
 > prevent this from happening? The program runs fine, but the error is
 > distracting. I know I can use s.encode(sys.stdout.encoding, 'replace') and
 > log that, but this is ugly and tedious to do when there are many log
 > messages. I also don't understand why %r (instead of %s) still causes an
 > error. I thought that the character representation uses only ascii
 > characters?!

 Not in Python 3. You can enforce ascii with "%a":

 >>> euro = '\u20ac'
 >>> print("%r" % euro)
 '€'
 >>> print("%a" % euro)
 '\u20ac'



 > aaahh, I did not know about %a. Thank you! 


 Or you can set an error handler with PYTHONIOENCODING (I have to use 
 something that is not utf-8-encodable for the demo):

 $ python3 -c 'print("\udc85")'
 Traceback (most recent call last):
   File "", line 1, in 
 UnicodeEncodeError: 'utf-8' codec can't encode character '\udc85' in 
 position 0: surrogates not allowed

 $ PYTHONIOENCODING=:backslashreplace python3 -c 'print("\udc85")'
 \udc85


 > Nice to know about this variable, though I prefer not to change the 
environment because other will need to do the same. 
 For others who would like to read more: 
https://docs.python.org/3/using/cmdline.html


 Or you follow the convention and log to stderr:

 $ python3 -c 'import sys; print("\udc85", file=sys.stderr)'
 \udc85
 $ $ python3 -c 'import logging; logging.basicConfig(); 
 logging.getLogger().warn("\udc85")' > to_prove_it_s_not_stdout
 WARNING:root:\udc85


 > That's perhaps the best choice. But will messages with logging level 
warning and lower also be logged to stderr?

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] logging to cmd.exe

2017-09-26 Thread eryk sun
On Tue, Sep 26, 2017 at 7:35 AM, Mats Wichmann  wrote:
> On 09/26/2017 05:22 AM, Albert-Jan Roskam wrote:
>
>> Rather than change your code can you change the codepage with the chcp
>> command?
>
> the way chcp takes effect is problematic for this:
>
> "Programs that you start after you assign a new code page use the new
> code page, however, programs (except Cmd.exe) that you started before
> assigning the new code page use the original code page. "

Some console applications only check the codepage at startup. If you
change it while the program is running, they'll encode/decode text for
the original codepage, but the console will decode/encode it for its
current codepage. That's called mojibake.

Prior to 3.6, at startup Python uses the input codepage for sys.stdin,
and the output codepage for sys.stdout and sys.stderr. You can of
course rebind sys.std* if you change the codepage via chcp.com or
SetConsoleCP() and SetConsoleOutputCP(). If you do change the
codepage, it's considerate to remember the previous value and restore
it in an atexit function.

> I think there's also a module you can use for pre-3.6, sorry too lazy to
> do a search.

It's win_unicode_console [1].

[1]: https://pypi.python.org/pypi/win_unicode_console
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] logging to cmd.exe

2017-09-26 Thread eryk sun
> cmd.exe can use cp65001 aka utf8???

CMD is a Unicode application that for the most part uses WinAPI
wide-character functions, including the console API functions (as does
Python 3.6+). There are a few exceptions. CMD uses the console
codepage when decoding batch files (line by line, so you can change
the codepage in the middle of a batch script), when writing output
from its internal commands (e.g. dir) to pipes and files (the /u
option overrides this), and when reading output from programs in a
`FOR /F` loop.

> Why does cmd.exe still use cp850?

In the above cases CMD uses the active console input or output
codepage, which defaults to the system locale's OEM codepage. If it's
not attached to a console (i.e. when run as a DETACHED_PROCESS), CMD
uses the ANSI codepage in these cases.

Anyway, you appear to be talking about the Windows console, which
people often confuse with CMD. Programs that use command-line
interfaces (CLIs) and text user interfaces (TUIs), such as classic
system shells, are clients of a given console or terminal interface. A
TUI application typically is tightly integrated with the console or
terminal interface (e.g. a curses application), while a CLI
application typically just uses standard I/O (stdin, stdout, stderr).
Both cmd.exe and python.exe are Windows console clients. There's
nothing special about cmd.exe in this regard.

Now, there are a couple of significant problems with using codepage
65001 in the Windows console.

Prior to Windows 8, WriteFile and WriteConsoleA return the number of
decoded wide characters written to the console, which is a bug because
they're supposed to return the number of bytes written. It's not a
problem so long as there's a one-to-mapping between bytes and
characters in the console's output codepage. But UTF-8 can have up to
4 bytes per character. This misleads buffered writers such as C FILE
streams and Python 3's io module, which in turn causes gibberish to be
printed after every write of a string that includes non-ASCII
characters.

Prior to Windows 10, with codepage 65001, reading input from the
console via ReadConsole or ReadConsoleA fails if the input has
non-ASCII characters. It gets reported as a successful read of zero
bytes. This causes Python to think it's at EOF, so the REPL quits (as
if Ctrl+Z had been entered) and input() raises EOFError.

Even in Windows 10, while the entire read doesn't fail, it's not much
better. It replaces non-ASCII characters with NUL bytes. For example,
in Windows 10.0.15063:

>>> os.read(0, 100)
abcαβγdef
b'abc\x00\x00\x00def\r\n'

Microsoft is gradually working on fixing UTF-8 support in the console
(well, two developers are working on it). They appear to have fixed it
at least for the private console APIs used by the new Linux subsystem
in Windows 10:

Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> s = os.read(0, 100)
abcαβγdef
>>> s
b'abc\xce\xb1\xce\xb2\xce\xb3def\n'
>>> s.decode()
'abcαβγdef\n'

Maybe it's fixed in the Windows API in an upcoming update. But still,
there are a lot of Windows 7 and 8 systems out there, for which
codepage 65001 in the console will remain broken.

> I always thought 65001 was not a 'real' codepage, even though some locales 
> (e.g. Georgia) use it [1].

Codepage 65001 isn't used by any system locale as the legacy ANSI or
OEM codepage. The console allows it probably because no one thought to
prevent using it in the late 1990s. It has been buggy for two decades.

Moodle seems to have special support for using UTF-8 with Georgian.
But as far as Windows is concerned, there is no legacy codepage for
Georgian. For example:

import ctypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)

LD_ACP = LOCALE_IDEFAULTANSICODEPAGE = 0x1004
acp = (ctypes.c_wchar * 6)()

>>> kernel32.GetLocaleInfoEx('ka-GE', LD_ACP, acp, 6)
2
>>> acp.value
'0'

A value of zero here means no ANSI codepage is defined [1]:

If no ANSI code page is available, only Unicode can be used for
the locale. In this case, the value is CP_ACP (0). Such a locale
cannot be set as the system locale. Applications that do not
support Unicode do not work correctly with locales marked as
"Unicode only".

Georgian (ka-GE) is a Unicode-only locale [2] that cannot be set as
the system locale.

[1]: https://msdn.microsoft.com/en-us/library/dd373761
[2]: https://msdn.microsoft.com/en-us/library/ms930130.aspx
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] logging to cmd.exe

2017-09-26 Thread Mats Wichmann
On 09/26/2017 05:22 AM, Albert-Jan Roskam wrote:

> Rather than change your code can you change the codepage with the chcp 
> command?

the way chcp takes effect is problematic for this:

"Programs that you start after you assign a new code page use the new
code page, however, programs (except Cmd.exe) that you started before
assigning the new code page use the original code page. "

so making the change from inside the code does not seem like it will work.


> > Good to keep in mind, but my objection would be similar to that 
> with specifying the PYTHONIOENCODING variable: one needs to change the 
> environment first before the script runs without errors.

quicktip: try Python 3.6.  It's had a change in this area and no longer
uses the code page.

I think there's also a module you can use for pre-3.6, sorry too lazy to
do a search.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] logging to cmd.exe

2017-09-26 Thread Albert-Jan Roskam
From: Tutor  on behalf of 
Mark Lawrence via Tutor 
Sent: Monday, September 25, 2017 4:19 PM
To: tutor@python.org
Subject: Re: [Tutor] logging to cmd.exe
    
On 25/09/2017 14:20, Albert-Jan Roskam wrote:
> Hi,
> 
> 
> With Python 3.5 under Windows I am using the logging module to log messages 
> to stdout (and to a file), but this occasionally causes logging errors 
> because some characters cannot be represented in the codepage used by cmd.exe 
> (cp850, aka OEM codepage, I think).  What is the best way to prevent this 
> from happening? The program runs fine, but the error is distracting. I know I 
> can use s.encode(sys.stdout.encoding, 'replace') and log that, but this is 
> ugly and tedious to do when there are many log messages. I also don't  
> understand why %r (instead of %s) still causes an error. I thought that the 
> character representation uses only ascii characters?!
> 
> 
> import logging
> import sys
> 
> assert sys.version_info.major > 2
> logging.basicConfig(filename="d:/log.txt", 
> level=logging.DEBUG,format='%(asctime)s %(message)s')
> handler = logging.StreamHandler(stream=sys.stdout)
> logger = logging.getLogger(__name__)
> logger.addHandler(handler)
> 
> s = '\u20ac'
> logger.info("euro sign: %r", s)
> 
> 
> 
> --- Logging error ---
> Traceback (most recent call last):
>    File "c:\python3.5\lib\logging\__init__.py", line 982, in emit
>  stream.write(msg)
>    File "c:\python3.5\lib\encodings\cp850.py", line 19, in encode
>  return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in 
> position 12: character maps to 
> Call stack:
>    File "q:\temp\logcheck.py", line 10, in 
>  logger.info("euro sign: %r", s)
> Message: 'euro sign: %r'
> Arguments: ('\u20ac',)
> 
> 
> Thanks in advance for your replies!
> 
> 
> Albert-Jan
> 

Rather than change your code can you change the codepage with the chcp 
command?


> Good to keep in mind, but my objection would be similar to that with 
specifying the PYTHONIOENCODING variable: one needs to change the environment 
first before the script runs without errors.


C:\Users\Mark\Documents\MyPython>chcp
Active code page: 65001


> Wow! cmd.exe can use cp65001 aka utf8??? I always thought 65001 was 
not a 'real' codepage, even though some locales (e.g. Georgia) use it [1]. Why 
does cmd.exe still use cp850?

[1] https://docs.moodle.org/dev/Table_of_locales

PS: sorry about the missing quote (>>) markers. Hotmail can't do this. Is Gmail 
better?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] logging to cmd.exe

2017-09-25 Thread Mark Lawrence via Tutor

On 25/09/2017 14:20, Albert-Jan Roskam wrote:

Hi,


With Python 3.5 under Windows I am using the logging module to log messages to 
stdout (and to a file), but this occasionally causes logging errors because 
some characters cannot be represented in the codepage used by cmd.exe (cp850, 
aka OEM codepage, I think). What is the best way to prevent this from 
happening? The program runs fine, but the error is distracting. I know I can 
use s.encode(sys.stdout.encoding, 'replace') and log that, but this is ugly and 
tedious to do when there are many log messages. I also don't understand why %r 
(instead of %s) still causes an error. I thought that the character 
representation uses only ascii characters?!


import logging
import sys

assert sys.version_info.major > 2
logging.basicConfig(filename="d:/log.txt", 
level=logging.DEBUG,format='%(asctime)s %(message)s')
handler = logging.StreamHandler(stream=sys.stdout)
logger = logging.getLogger(__name__)
logger.addHandler(handler)

s = '\u20ac'
logger.info("euro sign: %r", s)



--- Logging error ---
Traceback (most recent call last):
   File "c:\python3.5\lib\logging\__init__.py", line 982, in emit
 stream.write(msg)
   File "c:\python3.5\lib\encodings\cp850.py", line 19, in encode
 return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 12: 
character maps to 
Call stack:
   File "q:\temp\logcheck.py", line 10, in 
 logger.info("euro sign: %r", s)
Message: 'euro sign: %r'
Arguments: ('\u20ac',)


Thanks in advance for your replies!


Albert-Jan



Rather than change your code can you change the codepage with the chcp 
command?


C:\Users\Mark\Documents\MyPython>chcp
Active code page: 65001

C:\Users\Mark\Documents\MyPython>type mytest.py
import logging
import sys

assert sys.version_info.major > 2
logging.basicConfig(filename="d:/log.txt", 
level=logging.DEBUG,format='%(asctime)s %(message)s')

handler = logging.StreamHandler(stream=sys.stdout)
logger = logging.getLogger(__name__)
logger.addHandler(handler)

s = '\u20ac'
logger.info("euro sign: %r", s)
C:\Users\Mark\Documents\MyPython>mytest.py
euro sign: '€'
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

---
This email has been checked for viruses by AVG.
http://www.avg.com


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] logging to cmd.exe

2017-09-25 Thread Peter Otten
Albert-Jan Roskam wrote:

> Hi,
> 
> 
> With Python 3.5 under Windows I am using the logging module to log
> messages to stdout (and to a file), but this occasionally causes logging
> errors because some characters cannot be represented in the codepage used
> by cmd.exe (cp850, aka OEM codepage, I think). What is the best way to
> prevent this from happening? The program runs fine, but the error is
> distracting. I know I can use s.encode(sys.stdout.encoding, 'replace') and
> log that, but this is ugly and tedious to do when there are many log
> messages. I also don't understand why %r (instead of %s) still causes an
> error. I thought that the character representation uses only ascii
> characters?!

Not in Python 3. You can enforce ascii with "%a":

>>> euro = '\u20ac'
>>> print("%r" % euro)
'€'
>>> print("%a" % euro)
'\u20ac'

Or you can set an error handler with PYTHONIOENCODING (I have to use 
something that is not utf-8-encodable for the demo):

$ python3 -c 'print("\udc85")'
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc85' in 
position 0: surrogates not allowed

$ PYTHONIOENCODING=:backslashreplace python3 -c 'print("\udc85")'
\udc85

Or you follow the convention and log to stderr:

$ python3 -c 'import sys; print("\udc85", file=sys.stderr)'
\udc85
$ $ python3 -c 'import logging; logging.basicConfig(); 
logging.getLogger().warn("\udc85")' > to_prove_it_s_not_stdout
WARNING:root:\udc85

> import logging
> import sys
> 
> assert sys.version_info.major > 2
> logging.basicConfig(filename="d:/log.txt",
> level=logging.DEBUG,format='%(asctime)s %(message)s') handler =
> logging.StreamHandler(stream=sys.stdout) logger =
> logging.getLogger(__name__) logger.addHandler(handler)
> 
> s = '\u20ac'
> logger.info("euro sign: %r", s)
> 
> 
> 
> --- Logging error ---
> Traceback (most recent call last):
>   File "c:\python3.5\lib\logging\__init__.py", line 982, in emit
> stream.write(msg)
>   File "c:\python3.5\lib\encodings\cp850.py", line 19, in encode
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
> position 12: character maps to  Call stack:
>   File "q:\temp\logcheck.py", line 10, in 
> logger.info("euro sign: %r", s)
> Message: 'euro sign: %r'
> Arguments: ('\u20ac',)
> 
> 
> Thanks in advance for your replies!
> 
> 
> Albert-Jan
> 
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] logging to cmd.exe

2017-09-25 Thread Albert-Jan Roskam
Hi,


With Python 3.5 under Windows I am using the logging module to log messages to 
stdout (and to a file), but this occasionally causes logging errors because 
some characters cannot be represented in the codepage used by cmd.exe (cp850, 
aka OEM codepage, I think). What is the best way to prevent this from 
happening? The program runs fine, but the error is distracting. I know I can 
use s.encode(sys.stdout.encoding, 'replace') and log that, but this is ugly and 
tedious to do when there are many log messages. I also don't understand why %r 
(instead of %s) still causes an error. I thought that the character 
representation uses only ascii characters?!


import logging
import sys

assert sys.version_info.major > 2
logging.basicConfig(filename="d:/log.txt", 
level=logging.DEBUG,format='%(asctime)s %(message)s')
handler = logging.StreamHandler(stream=sys.stdout)
logger = logging.getLogger(__name__)
logger.addHandler(handler)

s = '\u20ac'
logger.info("euro sign: %r", s)



--- Logging error ---
Traceback (most recent call last):
  File "c:\python3.5\lib\logging\__init__.py", line 982, in emit
stream.write(msg)
  File "c:\python3.5\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 
12: character maps to 
Call stack:
  File "q:\temp\logcheck.py", line 10, in 
logger.info("euro sign: %r", s)
Message: 'euro sign: %r'
Arguments: ('\u20ac',)


Thanks in advance for your replies!


Albert-Jan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor