New submission from STINNER Victor <victor.stin...@haypocalc.com>:

On UNIX/BSD systems, Python decodes arguments with the locale encoding, whereas 
subprocess encodes arguments with the fileystem encoding. If both encodings are 
differents, we have a problem.

There was already the issue #4388 but it was closed because it was specific to 
old versions of Mac OS X. With the PYTHONFSENCODING environment variable (added 
to Python 3.2), it is easy to trigger this issue: run Python with a filesystem 
encoding different than the locale encoding. Attached script demonstrates the 
bug.

--

I see two possible encodings to encode and decode command line arguments (with 
surrogateescape error handler):

 (a) filesystem encoding
 (b) locale encoding

Decode Python command line arguments is one of the first operation executed 
when running Python, in the main() function. We don't have import machinery or 
codec API available at this moment. So I don't see how we can use the 
filesystem encoding here. Read issue #9630 to see how complex it is to use the 
filesystem encoding when initializing Python.

Use the locale encoding is easier because we already have _Py_char2wchar() and 
_Py_wchar2char() functions to decode/encode with the locale encoding and the 
surrogateescape error handler. These functions use the wchar_t* type which is 
less pratical than PyUnicodeObject*, but it is an advantage because wchar_t* 
type doesn't need Python to be completly initialized (whereas some PyUnicode 
methods loads modules, eg. encode and decode).

In #8775, I proposed to create a new variable to store the "command line 
encoding": sys.getcmdlineencoding(). But this issue was closed because there 
was only one use case: #4388 (which was closed but not fixed).

I don't know, or don't really care, how sys.getcmdlineencoding() should be 
initialized. The important point is that we have to use the same encoding to 
decode and encode command line arguments.

--

I don't really know if using another encoding is the right solution. The 
problem is maybe that the filesystem encoding should not be controlable by the 
user?

And what about environment variables: should we continue to encode and decode 
them with the filesystem encoding, or should we use the new "command line 
encoding"?

----------
components: Interpreter Core, Unicode
files: locale_fs_encoding.py
messages: 117669
nosy: haypo
priority: normal
severity: normal
status: open
title: Command line arguments are not correctly decoded if locale and fileystem 
encodings are different
versions: Python 3.2
Added file: http://bugs.python.org/file19062/locale_fs_encoding.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9992>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to