Eryk Sun added the comment:

Another set of counterexamples are the utilities in the GnuWin32 collection, 
which use ANSI in a pipe:

    >>> call('chcp.com')
    Active code page: 437
    0
    >>> '¡'.encode('1252')
    b'\xa1'
    >>> '\xa1'.encode('437')
    b'\xad'

    >>> os.listdir('.')
    ['¡']
    >>> check_output('ls')
    b'\xa1\r\n'
    >>> check_output('echo.exe ¡')
    b'\xa1\r\n'

Writing ANSI to a pipe or disk file is not as uncommon as you seem to think. 
Microsoft has never dictated a standard. It doesn't even follow a standard for 
this within its own command-line utilities. IMO, it makes more sense for 
programs to use UTF-8, or even UTF-16. Codepages are a legacy that we need to 
move beyond. Internally the console uses UTF-16LE. 

Note that patch 3 requires setting `encoding` for even python.exe as a child 
process, because sys.std* default to ANSI when isatty(fd) isn't true. (The 
CRT's isatty is true for any character-mode file, such as NUL or a console. 
Checking specifically for a console handle requires GetConsoleMode. To check 
for a pipe or disk file, call GetFileType to check for FILE_TYPE_PIPE or 
FILE_TYPE_DISK.)

> I also checked that cmd /u flag is totally useless because it applies
> only to cmd itself not to any other programs

Anything else would be magic. Once a child process inherits its standard 
handles from cmd.exe [1], it can write whatever bytes it wants to them. In 
issue 27048 I proposed using the "/u" switch for shell=True only to facilitate 
getting results back from cmd's internal commands such as `set`. But it doesn't 
change anything if you're using the shell to run other programs.

[1]: Unlike Python's Popen, cmd doesn't use STARTUPINFO for this. It
     temporarily modifies its own standard handles, which works even
     when it falls back on ShellExecuteEx to run files that are 
     neither PE excecutables nor .BAT/.CMD files.

> I looked if there's some function to get used encoding for 
> child process but there isn't, I would have expected something 
> like GetConsoleOutputCP(hThread). So the only way to get it, 
> is by calling GetConsoleOutputCP inside child process with
> CreateRemoteThread and it's not really pretty and quite hacky, 
> but it does work, I tested.

That's not the only way. You can also start a detached Python process (via 
pythonw.exe or DETACHED_PROCESS) to run a script that calls AttachConsole and 
returns the result of calling GetConsoleOutputCP:

    from subprocess import *

    DETACHED_PROCESS   = 0x00000008
    CREATE_NEW_CONSOLE = 0x00000010

    cmd = ('python -c "import ctypes;'
           "kernel32 = ctypes.WinDLL('kernel32');"
           'kernel32.AttachConsole(%d);'
           'print(kernel32.GetConsoleOutputCP())"')

    call('chcp.com 1252')
    p = Popen('python', creationflags=CREATE_NEW_CONSOLE)
    cp = int(check_output(cmd % p.pid, creationflags=DETACHED_PROCESS))

    >>> cp
    437

> anyway even with that would need to change something about 
> TextIOWrapper because we're creating it before process is even 
> started and encoding isn't changeable later.

In this case one can detach() the buffer to wrap it in a new TextIOWrapper.

>    > python -c "import subprocess; 
> print(subprocess.check_output('quser', encoding='cp775'))"
>     USERNAME              SESSIONNAME
>     dāvis                 console
>
> also works correctly with any of console's encoding even if it 
> didn't showed correct encoding inside cmd itself.

A minor point of clarification: quser.exe doesn't run "inside" cmd.exe; it runs 
attached to conhost.exe. The cmd shell is just the parent process.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to