[issue36570] ftplib timeouts for misconfigured server

2019-04-09 Thread Dāvis

Dāvis  added the comment:

The problem is that most of time you have no control over that FTP server but 
you still want to download files from that FTP server using Python. Currently 
that's just not possible. Maybe there could be some flag to enable workarounds 
for these cases.

--

___
Python tracker 
<https://bugs.python.org/issue36570>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36570] ftplib timeouts for misconfigured server

2019-04-09 Thread Dāvis

Change by Dāvis :


--
keywords: +patch
pull_requests: +12662
stage:  -> patch review

___
Python tracker 
<https://bugs.python.org/issue36570>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36570] ftplib timeouts for misconfigured server

2019-04-09 Thread Dāvis

New submission from Dāvis :

It's not uncommon to encounter FTP servers which are misconfigured and return 
unroutable host IP (eg. internal IP) when using passive mode

See: https://superuser.com/a/1195591

Most FTP clients such as FileZilla and WinSCP use a workaround when they 
encounter such servers and connect to user's specified host instead.

> Command: PASV
> Answer: 227 Entering Passive Mode (10,250,250,25,219,237).
> Status:  Server sent passive reply with unroutable address. Using server 
> address instead.

Currently Python's ftplib simply timeouts for these and doesn't work.

--
messages: 339712
nosy: davispuh, giampaolo.rodola
priority: normal
severity: normal
status: open
title: ftplib timeouts for misconfigured server
type: enhancement
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue36570>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6135] subprocess seems to use local encoding and give no choice

2016-12-03 Thread Dāvis

Dāvis added the comment:

And looks like only way to specify console's codepage is with
`encoding=os.device_encoding(2)` which will have to be used for 99% of cases. I 
don't see other way...

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6135>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6135] subprocess seems to use local encoding and give no choice

2016-12-03 Thread Dāvis

Dāvis added the comment:

Still can't specify encoding for getstatusoutput and getoutput. Also it uses 
wrong encoding by default, most of time it will be console's codepage instead 
of ANSI codepage which is used now.

--
nosy: +davispuh

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6135>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1602] windows console doesn't print or input Unicode

2016-09-20 Thread Dāvis

Dāvis added the comment:

Steve Dower (steve.dower)
> [...]
> Anything else requires a real console with a real person with a real keyboard.

FYI, not really, it is possible to fully automatically test console's 
output/input using WinAPI functions like WriteConsoleInput, 
GetConsoleScreenBufferInfo, ReadConsoleOutputCharacter

very recently I wrote such test, you can look at it as example 
http://review.source.kitware.com/gitweb?p=KWSys.git;a=blob;f=testConsoleBuf.cxx;hb=HEAD

it tests all 3 cases when output is actual console, redirected pipe and file.

--
nosy: +davispuh

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1602>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-09-03 Thread Dāvis

Dāvis added the comment:

That is a great PIP, but it will take a lot of time to be implemented and it 
doesn't really solve this issue.

This is different issue than filename/path encoding. Here we need to decode 
binary output from other applications and that for a lot of applications will 
be console's code page but it could be also any other. This isn't issue about 
Unicode paths because application which is located at ASCII path, when we run 
it as a subprocess can return text output in console's code page, OEM, ANSI or 
some other encoding.

My proposed subprocess_fix_encoding_v4fixed.patch fixes this for majority of 
cases and for other cases encoding can be specified.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-09-03 Thread Dāvis

Dāvis added the comment:

ping? Could someone review my patch?

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-08 Thread Dāvis

Changes by Dāvis <davis...@gmail.com>:


Added file: 
http://bugs.python.org/file43316/subprocess_fix_encoding_v4fixed.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-08 Thread Dāvis

Changes by Dāvis <davis...@gmail.com>:


Removed file: http://bugs.python.org/file43315/subprocess_fix_encoding_v4.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-08 Thread Dāvis

Dāvis added the comment:

> Note that patch 3 requires setting `encoding` for even python.exe as a child 
> process, because sys.std* default to ANSI when isatty(fd) isn't true.

I've updated my patch so that Python outputs in consoles encoding for pipes too.

So now in PowerShell

>[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
>python -c "print('ā')" | Out-String
ā
> python -c "import subprocess; print(subprocess.getoutput('python -c 
> ""print(\'ā\')""'))"
ā
>[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(775)
>python -c "print('ā')" | Out-String
ā
> python -c "import subprocess; print(subprocess.getoutput('python -c 
> ""print(\'ā\')""'))"
ā


> What I wish is for Python to default to using UTF-8 for its own pipe and disk 
> file I/O. The old behavior could be selected by setting some hypothetical 
> environment variable, such as PYTHONIOUSELOCALE.

I actually don't really see need for this, specifying PYTHONIOENCODING="UTF-8" 
it will be used for pipes.


> If subprocess defaults to the console's current codepage (when available), it 
> would be nice to have a way to conveniently select the OEM or ANSI codepage. 
> The codecs module could define string constants based on GetOEMCP() and 
> GetACP(), such as codecs.CP_OEMCP (e.g. 'cp437') and codecs.CP_ACP (e.g. 
> 'cp1252'). subprocess could import these constants on Windows.

also updated in my patch and implemented something like this but IMO easier, 
basically "ansi" and "oem" is a valid encoding on Windows and can be used 
anywhere where encoding can be specified as a parameter. Look at patch to see 
how it's implemented.


Ok, so now does my patch look acceptable? What would be issues with it? IMO it 
greatly improves current situation (fixes #27048 and solves #6135) and I don't 
see any issues with it.

Things that are changed:
* "ansi" and "oem" are valid encodings on Windows
* console's code page is used for console and pipe (if there's no console then 
ANSI is used like now)
* subprocess uses "ansi" for DETACHED_PROCESS and "oem" for CREATE_NEW_CONSOLE, 
CREATE_NO_WINDOW
* encoding and errors parameters can be specified for Popen
* custom parameters (including encoding and errors) can be specified for 
subprocess.getstatusoutput and getoutput

Also if it's needed I see how easily can add support for separate encodings and 
errors for stdin/out/err
for example with

if (type(encoding) is str):
encoding_stdin = encoding_stdout = encoding_stderr = encoding
elif (type(encoding) is tuple):
encoding_stdin, encoding_stdout, encoding_stderr = encoding
else:
encoding_stdin = encoding_stdout = encoding_stderr = None

then could use 
subprocess.check_output('', encoding='oem')
and
subprocess.check_output('', encoding=('oem', 'ansi', 'ansi'))



Known issues (present in both cases with and without my patch):
* when using cmd.exe and python is outputting to pipe then for some unknown 
reason error happens

with cmd.exe
> python -c "print('\n')" | echo
ECHO is on.
Exception ignored in: <_io.TextIOWrapper name='' mode='w' 
encoding='cp775'>
OSError: [Errno 22] Invalid argument

It doesn't matter which code page for console is set and what is being 
outputted.
It happens for both released 3.5.1 and repo default branch but it doesn't 
happen when PowerShell is used.

I looked into it but didn't found why it happens, only that
n = write(fd, buf, (int)count);
in _Py_write_impl (fileutils.c) returns -1 and errno is EINVAL
I verified that all parameters are correct fd, buf (it isn't NULL) and count 
(parameters are same as when running without pipe)
so I've no idea what causes it.


* Python corrupts characters when reading from stdin

with PowerShell
> Out-String -InputObject "ā" | python -c "import sys; 
> print(sys.stdin.encoding,sys.stdin.read())"
cp1257 ?

It happens for both released 3.5.1 and repo default branch.
With my patch used encoding will be based on console's code page, but it 
doesn't matter because seems it gets corrupted even before it gets used. I 
tested it when using console encodings: oem, ansi and utf-8 and also these same 
with PYTHONIOENCODING too and in all cases it was corrupted, replaced with "?".

I didn't looked further into this.

--
Added file: http://bugs.python.org/file43315/subprocess_fix_encoding_v4.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-04 Thread Dāvis

Dāvis added the comment:

Of course I agree that proper solution is to use Unicode/Wide API but that's 
much more work to implement and I rather have now this half-fix which works for 
most of cases than nothing till whenever Unicode support is implemented which 
might be nowhere soon.


> IMO, it makes more sense for programs to use UTF-8, or even UTF-16. Codepages 
> are a legacy that we need to move beyond. Internally the console uses 
> UTF-16LE. 

yes that's true, but we can't do anything about current existing programs and 
so if we default to UTF-8 it will be even worse than defaulting to ANSI because 
there aren't many programs on Windows which would use UTF-8, in fact it's quite 
rare because there's not even good UTF-8 support for console itself like you 
mentioned. Also here I'm talking only about ANSI WinAPI programs with 
console/pipe encoding and not internal or file encoding which here we don't 
really care about.


> Note that patch 3 requires setting `encoding` for even python.exe as a child 
> process, because sys.std* default to ANSI when isatty(fd) isn't true.

I think Python is a bit broken here and IMO it should also use console's 
encoding not ANSI when outputting to console pipe and use ANSI if it really is 
a file.


on Windows 10 with Python 3.5.1

>chcp
Active code page: 775
>python -c "print('ā')"
ā

>python -c "print('ā')" | echo
ECHO is on.
Exception ignored in: <_io.TextIOWrapper name='' mode='w' 
encoding='cp1257'>
OSError: [Errno 22] Invalid argument

>chcp 1257
Active code page: 1257
>python -c "print('ā')" | echo
ECHO is on.
Exception ignored in: <_io.TextIOWrapper name='' mode='w' 
encoding='cp1257'>
OSError: [Errno 22] Invalid argument


in PowerShell

>[Console]::OutputEncoding.CodePage
775
>python -c "print('ā')" | Out-String
Ō
>[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
>python -c "print('ā')" | Out-String
�
>[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(1257)
>python -c "print('ā')" | Out-String
ā


> I proposed using the "/u" switch for shell=True only to facilitate getting 
> results back from cmd's internal commands such as `set`. But it doesn't 
> change anything if you're using the shell to run other programs.

but you can only do that if you know that command you execute is cmd's command 
but if it's user passed command then there isn't really reliable way to detect 
if it will execute inside cmd or not, for example "cmd /u /c chcp.exe" will 
return result in UTF-16 because such program doesn't exist and cmd's error 
message will be outputted. Also if user have set.exe in %System32% then "cmd /u 
/c set" won't be in UTF-16 because it will execute that program.



>> by calling GetConsoleOutputCP inside child process with CreateRemoteThread

> That's not the only way. You can also start a detached Python process (via 
> pythonw.exe or DETACHED_PROCESS) to run a script that calls AttachConsole and 
> returns the result of calling GetConsoleOutputCP:

while useful to know it's still messy because I think you would need to prevent 
your target process from exiting before you've called AttachConsole and also 
most likely you want to get GetConsoleOutputCP before program's exit and not at 
start (say with CREATE_SUSPENDED) as it might have changed it somewhere in 
middle of program's execution. so looks like this route isn't worth going for.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-04 Thread Dāvis

Dāvis added the comment:

it makes no sense to not use better encoding default in some cases like my 
patch does. Most programs use console's encoding not ANSI codepage and thus by 
limiting default only to ANSI codepage we force basically everyone to always 
specify encoding. This is current behavior that ANSI codepage is used and 
that's why this issue and also #27048. if we keep it this way then specifying 
encoding will be required for like 99% of cases which means it's useless 
default. Actually I don't even know any Windows program which does input/output 
(not talking about files because that's different) in ANSI codepage because it 
would be broken when displayed in console as that by default uses OEM codepage. 
Anyway my patch doesn't really change default it just uses console encoding in 
most cases and then fallbacks to same current default ANSI codepage.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-03 Thread Dāvis

Dāvis added the comment:

> qprocess.exe (also to console)
> quser.exe (also to console)

these are broken (http://i.imgur.com/0zIhHrv.png)

>chcp 1257
>quser
 USERNAME  SESSIONNAME
 dƒvis console

> chcp 775
> quser
 USERNAME  SESSIONNAME
 dāvis console


we've to decide which codepage to use as default and it should cover most cases 
not some minority of programs so I would say using console's code page when 
it's available makes the most sense and when isn't then fallback to ANSI 
codepage

Now for these special cases where our guess is wrong only user can know which 
encoding would be right and so he must specify that.


I also checked that cmd /u flag is totally useless because it applies only to 
cmd itself not to any other programs and so to use it would need to check if 
returned output is actual UTF-16 or some other encoding which might even pass 
as valid UTF-16

for example:
cmd /u /c "echo ā"
will return
ā in UTF-16

but
cmd /u /c "sc query"

result will be encoded in OEM codepage (775 for me) and no sign of UTF-16


I looked if there's some function to get used encoding for child process but 
there isn't, I would have expected something like GetConsoleOutputCP(hThread)
So the only way to get it, is by calling GetConsoleOutputCP inside child 
process with CreateRemoteThread and it's not really pretty and quite hacky, but 
it does work, I tested.

anyway even with that would need to change something about TextIOWrapper 
because we're creating it before process is even started and encoding isn't 
changeable later.




I updated patch which fixes issues with creationflags and also added option to 
change encoding based on subprocess3.patch (from #6135)

so now with my patch it really works for most cases.

>python -c "import subprocess; subprocess.getstatusoutput('ā')"

works correctly for me with correct encoding when console's code page is set to 
any of 775 (OEM), 1257 (ANSI) and 65001 (UTF-8)

it also works correctly with any of DETACHED_PROCESS, CREATE_NEW_CONSOLE, 
CREATE_NO_WINDOW

>python -c "import subprocess; subprocess.getstatusoutput('ā', 
creationflags=0x0008)"


this also works correctly with console's encodings: 775, 1257, 65001

>python -c "from distutils import _msvccompiler; 
_msvccompiler._get_vc_env('')"



and finally 

   > chcp 1257
   > python -c "import subprocess; print(subprocess.check_output('quser', 
encoding='cp775'))"
USERNAME  SESSIONNAME
dāvis console

also works correctly with any of console's encoding even if it didn't showed 
correct encoding inside cmd itself.

--
Added file: http://bugs.python.org/file43183/subprocess_fix_encoding_v3.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-02 Thread Dāvis

Dāvis added the comment:

if there's no console then os.device_encoding won't fail, it will just return 
None which means that ANSI codepage will be used like it currently is and so 
here it doesn't change anything, current behavior stays.
Also like I showed TextIOWrapper already calls device_encoding even if there's 
no console. And device_encoding doesn't actually use that fd it just checks if 
it's valid fd and then calls GetConsoleCP/GetConsoleOutputCP to get encoding.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-02 Thread Dāvis

Dāvis added the comment:

There is right encoding, it's encoding that's actually used. Here we're inside 
subprocess.Popen which does the actual winapi.CreateProcess call and thus we 
can check for any creationflags and adjust encoding logic accordingly. I would 
say almost all Windows console programs does use console's encoding for 
input/output because otherwise user wouldn't be able to read it. And programs 
which use different encoding it would be caller's responsibly to set used 
encoding because only it could know which encoding to use for that program.

So I think basically Popen should accept encoding parameter which would be then 
passed to TextIOWrapper. Preferably way to set different encoding for 
stdin/out/err

and then if there's no encoding specified, we use our logic to determine 
default encoding which would be by using _Py_device_encoding(fd) and this would 
be right for almost all if not all cases. And if some program changes console's 
encoding after we got consoles encoding, we could get encoding again after 
program's execution and then use this new set console's encoding.


Anyway while looking more into this I found why we get wrong encoding.

looking at subprocess.check_output can see

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
   **kwargs).stdout

that stdout is set to PIPE

and then in subprocess.__init__

if c2pread != -1:
self.stdout = io.open(c2pread, 'rb', bufsize)
if universal_newlines:
self.stdout = io.TextIOWrapper(self.stdout)


there c2pread will be fd for pipe (3)

when looking inside _io_TextIOWrapper___init___impl

fileno = _PyObject_CallMethodId(buffer, _fileno, NULL);
[...]
int fd = _PyLong_AsInt(fileno);
[...]
self->encoding = _Py_device_encoding(fd);
[...]


we'll set encoding with _Py_device_encoding(3);
but there

if (fd == 0)
cp = GetConsoleCP();
else if (fd == 1 || fd == 2)
cp = GetConsoleOutputCP();
else
cp = 0;


so encoding would be correct for stdin/stdout/stderr but not for pipe and 
that's why this issue.

I see 2 ways to fix this and I've added patches for both options.

--
Added file: http://bugs.python.org/file43100/subprocess_fix_encoding_v2_a.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-02 Thread Dāvis

Changes by Dāvis <davis...@gmail.com>:


Added file: http://bugs.python.org/file43101/subprocess_fix_encoding_v2_b.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-02 Thread Dāvis

Changes by Dāvis <davis...@gmail.com>:


Removed file: http://bugs.python.org/file43094/subprocess_fix_encoding.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-01 Thread Dāvis

Dāvis added the comment:

I looked at #27048 and indeed it's affected by this bug, it happens to me too 
(I've non-ASCII symbols in %PATH%) and this my patch fixes that.


on my system without patch

> python -c "from distutils import _msvccompiler; _msvccompiler._get_vc_env('')"
Traceback (most recent call last):
  File "", line 1, in 
  File "P:\Python35\lib\distutils\_msvccompiler.py", line 92, in _get_vc_env
universal_newlines=True,
  File "P:\Python35\lib\subprocess.py", line 629, in check_output
**kwargs).stdout
  File "P:\Python35\lib\subprocess.py", line 698, in run
stdout, stderr = process.communicate(input, timeout=timeout)
  File "P:\Python35\lib\subprocess.py", line 1055, in communicate
stdout = self.stdout.read()
  File "P:\Python35\lib\encodings\cp1257.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x83 in position 50: 
character maps to 

with my patch

> python -c "from distutils import _msvccompiler; _msvccompiler._get_vc_env('')"
>
no error

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-01 Thread Dāvis

Dāvis added the comment:

there's no such "ā" command, it's just used to get non-ASCII output

cmd will return:
'ā' is not recognized as an internal or external command,
operable program or batch file.


and this will be encoded in consoles encoding (UTF8 in my example or whatever 
chcp is set to) which Python will fail to read as it will use 
locale.getpreferredencoding(False) instead of sys.stdout.encoding


see attached patch, it fixes this problem, you can try reproduce yourself.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27179] subprocess uses wrong encoding on Windows

2016-06-01 Thread Dāvis

New submission from Dāvis:

subprocess uses wrong encoding on Windows.


On Windows 10 with Python 3.5.1
from Command Prompt (cmd.exe)
> chcp 65001
> python -c "import subprocess; subprocess.getstatusoutput('ā')"
Traceback (most recent call last):
  File "", line 1, in 
  File "P:\Python35\lib\subprocess.py", line 808, in getstatusoutput
data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT)
  File "P:\Python35\lib\subprocess.py", line 629, in check_output
**kwargs).stdout
  File "P:\Python35\lib\subprocess.py", line 698, in run
stdout, stderr = process.communicate(input, timeout=timeout)
  File "P:\Python35\lib\subprocess.py", line 1055, in communicate
stdout = self.stdout.read()
  File "P:\Python35\lib\encodings\cp1257.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2: 
character maps to 


from PowerShell
> [Console]::OutputEncoding = [System.Text.Encoding]::UTF8
> python -c "import subprocess; subprocess.getstatusoutput('ā')"
Traceback (most recent call last):
  File "", line 1, in 
  File "P:\Python35\lib\subprocess.py", line 808, in getstatusoutput
data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT)
  File "P:\Python35\lib\subprocess.py", line 629, in check_output
**kwargs).stdout
  File "P:\Python35\lib\subprocess.py", line 698, in run
stdout, stderr = process.communicate(input, timeout=timeout)
  File "P:\Python35\lib\subprocess.py", line 1055, in communicate
stdout = self.stdout.read()
  File "P:\Python35\lib\encodings\cp1257.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2: 
character maps to 



As you can see even if consoles encoding is UTF-8 it still uses Windows ANSI 
codepage 1257
this happens because io.TextIOWrapper is used with default encoding which is 
locale.getpreferredencoding(False)
but that's wrong because that's not console's encoding.
I've attached a patch which fixes this by using correct console encoding with 
sys.stdout.encoding

Only note that there's different bug that when python is executed inside 
PowerShell's group expression then sys.stdout.encoding will be wrong

> [Console]::OutputEncoding.EncodingName
Unicode (UTF-8)
> ([Console]::OutputEncoding.EncodingName)
Unicode (UTF-8)
> python -c "import sys; print(sys.stdout.encoding)"
cp65001
> (python -c "import sys; print(sys.stdout.encoding)")
cp1257

it still should be cp65001 and that's why in this case subprocess will still 
fail even with my patch, but this some different bug.

--
components: IO, Library (Lib), Unicode, Windows
files: subprocess_fix_encoding.patch
keywords: patch
messages: 266852
nosy: davispuh, ezio.melotti, haypo, paul.moore, steve.dower, tim.golden, 
zach.ware
priority: normal
severity: normal
status: open
title: subprocess uses wrong encoding on Windows
type: behavior
versions: Python 3.5, Python 3.6
Added file: http://bugs.python.org/file43094/subprocess_fix_encoding.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com