[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: ... So Antoine and Martin: which encoding do you prefer? I still propose to drop the fsname encoding. Then this question goes away. You mean that we should use the following encoding for the command line arguments, environment variables and all filenames/paths: - Mac OS X: utf-8 - Windows: unicode for command line/env, mbcs to decode filenames - others OSes: locale encoding To do that, we have to: - others OSes: delete the PYTHONFSENCODING variable - Mac OS X: use utf-8 to decode the command line arguments (we can use PyUnicode_DecodeUTF8()+PyUnicode_AsWideCharString() before Python is initialized) On others OSes, we continue to use the FS encoding to encode command line/env vars, because the FS encoding will always be the locale encoding. And it's more pratical to use sys.getfilesystemencoding() than mbstowcs(), wcstombs(), _Py_wchar2char(), _Py_char2wchar(), etc. because the FS encoding doesn't depend on the current locale, and it uses Python codecs which support more error handlers. I like this solution because it doesn't change a lot of things. I agree to drop PYTHONFSENCODING because it looks like PYTHONFSENCODING introduced more inconsistencies than it solved. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Marc-Andre Lemburg m...@egenix.com added the comment: STINNER Victor wrote: I like this solution because it doesn't change a lot of things. I agree to drop PYTHONFSENCODING because it looks like PYTHONFSENCODING introduced more inconsistencies than it solved. If you remove the PYTHONFSENCODING, then we have to reconsider removal of sys.setfilesystemencoding(). The main argument for removal of the sys function was having the environment variable. If you remove both, Python will get very poor grades for OS interoperability on platforms that often deal with multiple different encodings for file names. I am repeating myself, but please keep in mind that the locale is an application scope setting. It doesn't have anything to do with what's actually stored in file systems or what the OS uses internally. Python therefore has to provide a way to customize the file system encoding and allow to override the locale guessing that's currently happening. You can't just tell people to go with whatever encoding setup you prefer to make Python's guessing easier or more correct. Python has to adapt to what the users actually use, not the other way around. Where that's not easily possible, there have to be ways to explicitly tell Python what to use... telling the user to adjust his or her locale settings just to be able to run Python is not an option. The world is still moving towards Unicode - it's not 100% there yet. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Martin v. Löwis mar...@v.loewis.de added the comment: You mean that we should use the following encoding for the command line arguments, environment variables and all filenames/paths: - Mac OS X: utf-8 - Windows: unicode for command line/env, mbcs to decode filenames No: unicode for filenames also. - others OSes: locale encoding Yes, that is my proposal. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Martin v. Löwis mar...@v.loewis.de added the comment: If you remove both, Python will get very poor grades for OS interoperability on platforms that often deal with multiple different encodings for file names. Why that? It will work very well in such a setting, much better than, say, Java. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Ronald Oussoren ronaldousso...@mac.com added the comment: On 09 Oct, 2010,at 02:07 PM, Antoine Pitrou rep...@bugs.python.org wrote: Antoine Pitrou pit...@free.fr added the comment: For the command line, it would mean that we introduced a new encoding: command line encoding, which will be utf-8 on OSX. Or more generally environment encoding, if it's also used for env vars. This could solve the subprocess issue neatly.  Note that the command-line and environment encoding on OSX is generally UTF-8, even if that is not always reflected in the locale settings. On recent OSX releases LANG will be set to a UTF-8 aware locale (en_US.UTF-8 on my machine) when you start a shell using Terminal.app. The correct locale environment variables are AFAIK not set in two important situations: on OSX 10.4 and when running code from an application bundle, in both cases the environment/command-line encoding should be treated as UTF-8. There is one reason for not wanting to assume that the encoding is always UTF-8: the user might access the system from a non-UTF8 terminal (such as when logging in with an SSH session from a system not using UTF-8, or using an alternate terminal application). IMHO these are minor enough use-cases that we could just enforce that the encoding is UTF-8 on OSX. That would ensure that the filesystem encoding and environment/command-line encoding are consistent and we'd no longer run into the problem that triggered this issue. Ronald -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ -- title: Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent - Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent Added file: http://bugs.python.org/file19184/unnamed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___htmlbodydivbrbrOn 09 Oct, 2010,at 02:07 PM, Antoine Pitrou lt;rep...@bugs.python.orggt; wrote:brbr/divdivblockquote type=citedivdiv class=_stretchbr Antoine Pitrou lt;pit...@free.frgt; added the comment:br br gt; For the command line, it would mean that we br gt; introduced a new encoding: command line encoding, which will be utf-8 on br gt; OSX.br br Or more generally environment encoding, if it's also used for envbr vars. This could solve the subprocess issue neatly./div/div/blockquotespannbsp;/span/divdivbr/divdivspan/spanNote that the command-line and environment encoding on OSX is generally UTF-8, even if that is not always reflected in the locale settings./divdivbr/divdivOn recent OSX releases LANG will be set to a UTF-8 aware locale (en_US.UTF-8 on my machine) when you start a shell using Terminal.app./divdivbr/divdivThe correct locale environment variables are AFAIK not set in two important situations: on OSX 10.4 and when running code from an application bundle, in both cases the environment/command-line encoding should be treated as UTF-8./divdivbr/divdivThere is one reason for not wanting to assume that the encoding is always UTF-8: the user might access the system from a non-UTF8 terminal (such as when logging in with an SSH session from a system not using UTF-8, or using an alternate terminal application). IMHO these are minor enough use-cases that we could just enforce that the encoding is UTF-8 on OSX.nbsp;/divdivbr/divdivThat would ensure that the filesystem encoding and environment/command-line encoding are consistent and we'd no longer run into the problem that triggered this issue./divdivbr/divdivRonald/divdivblockquote type=citedivdiv class=_stretchbr br --br br ___br Python tracker lt;rep...@bugs.python.orggt;br lt;a href=http://bugs.python.org/issue9992; _mce_href=http://bugs.pythonorg/issue9992;http://bugs.python.org/issue9992/agt;br ___br /div/div/blockquote/div/body/html___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Martin v. Löwis mar...@v.loewis.de added the comment: There is one reason for not wanting to assume that the encoding is always UTF-8: the user might access the system from a non-UTF8 terminal (such as when logging in with an SSH session from a system not using UTF-8, or using an alternate terminal application). IMHO these are minor enough use-cases that we could just enforce that the encoding is UTF-8 on OSX. Ok, that's enough of an expert statement for me to settle the OSX case: we will always assume that environment data is UTF-8 on OSX (leaving the rest to the surrogate escape handler). -- title: Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent - Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Marc-Andre Lemburg m...@egenix.com added the comment: Martin v. Löwis wrote: Martin v. Löwis mar...@v.loewis.de added the comment: If you remove both, Python will get very poor grades for OS interoperability on platforms that often deal with multiple different encodings for file names. Why that? It will work very well in such a setting, much better than, say, Java. Well, Java pretty much fails completely in this respect, so being better than Java is not exactly the benchmark I had in mind :-) I think the proper benchmark would be a Python2 application that has no problems with these things, since file names are just bytes that refer to files on the disk, with no associated encoding - at least on Unix and related platforms. Being pedantic about forcing some encoding onto things that don't have an encoding won't really work out in practice. Dealing with file names, OS environments, pipes and sockets is dirty work, so I think we should go with the 80-20 approach in making 80% easy and 20% harder, but still possible. -- title: Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent - Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Martin v. Löwis mar...@v.loewis.de added the comment: Being pedantic about forcing some encoding onto things that don't have an encoding won't really work out in practice. Dealing with file names, OS environments, pipes and sockets is dirty work, so I think we should go with the 80-20 approach in making 80% easy and 20% harder, but still possible. Unix applications can always use the byte-oriented file name APIs if they need to. Then you are back to the state that things have in Python 2. No need to have a user-tunable file system encoding there. However, I completely fail to see the advantage that the PYTHONFSENCODING variable has over the LANG variable. If it's possible to set PTHONFSENCODING in some application, it surely is also possible to set LANG (or LC_CTYPE), no? Setting the latter also gives you the advantage that environment variables and command line arguments use the same encoding as file names. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Antoine Pitrou pit...@free.fr added the comment: However, I completely fail to see the advantage that the PYTHONFSENCODING variable has over the LANG variable. If it's possible to set PTHONFSENCODING in some application, it surely is also possible to set LANG (or LC_CTYPE), no? Setting the latter also gives you the advantage that environment variables and command line arguments use the same encoding as file names. I guess LANG and LC_CTYPE can be used for other purposes such as internationalization. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Marc-Andre Lemburg m...@egenix.com added the comment: Martin v. Löwis wrote: Martin v. Löwis mar...@v.loewis.de added the comment: Being pedantic about forcing some encoding onto things that don't have an encoding won't really work out in practice. Dealing with file names, OS environments, pipes and sockets is dirty work, so I think we should go with the 80-20 approach in making 80% easy and 20% harder, but still possible. Unix applications can always use the byte-oriented file name APIs if they need to. Then you are back to the state that things have in Python 2. No need to have a user-tunable file system encoding there. Right and if you take the position of refusing to guess which we usually do in Python, then interfacing to file names using bytes would be the appropriate way to handle the situation. However, since Python3 has chosen to regard file names as text regardless of platform, we're now in the situation that we have to come up with some educated guess on the encoding. However, I completely fail to see the advantage that the PYTHONFSENCODING variable has over the LANG variable. If it's possible to set PTHONFSENCODING in some application, it surely is also possible to set LANG (or LC_CTYPE), no? Setting the latter also gives you the advantage that environment variables and command line arguments use the same encoding as file names. The advantage is that you can change the Python files system encoding *without* having to change your locale settings. You can't possibly expect a user to switch to using UTF-8 for all his/her applications just because Python needs this to properly decode file names. Users of applications written in Python will most likely not even know how to change the locale encoding. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: MvL - Windows: unicode for command line/env, mbcs to decode filenames MvL No: unicode for filenames also. Yes, I mean unicode for everything, but decode bytes data from the mbcs encoding. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: MAL If you remove the PYTHONFSENCODING, then we have to reconsider MAL removal of sys.setfilesystemencoding(). Plase, Marc, read my comments. You never consider technical problems, you just propose to ensure that Python just works, without answering to my technical questions. I already explained 2 or 3 times that sys.setfilesystemencoding() was completly buggy and not usable in pratical. You proposed PYTHONFSENCODING and I implemented it. But then I explained in an email to python-dev and in this issue, that this environment variable introduced many problems. I don't see how sys.setfilesystemencoding() would solve this issue, it's out of scope. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Martin v. Löwis mar...@v.loewis.de added the comment: You can't possibly expect a user to switch to using UTF-8 for all his/her applications just because Python needs this to properly decode file names. If the user hasn't switched to UTF-8, why would Python need that to properly decode file names? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: MAL You can't just tell people to go with whatever encoding setup MAL you prefer to make Python's guessing easier or more correct. Python doesn't really *guess* the encoding, it just reads the encoding from the locale. What do you mean by more correct? How can Python knowns the right encoding better than the user? Python should not guess anything. If the environment is not correctly configured, it's not Python's fault. The user has to fix its environment. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: I guess LANG and LC_CTYPE can be used for other purposes such as internationalization. That's why there are different environement variables: * LC_MESSAGES for i18n (messages) * LC_CTYPE for the encoding * LC_TIME for time and date * etc. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: issue9992.patch: - Remove PYTHONFSENCODING environment variable - Mac OS X: Use utf-8 to decode command line arguments - Fix issue #9992 (this issue): attached test, locale_fs_encoding.py, pass - Fix issue #9988 - Fix issue #10014 - Fix issue #10039 $ diffstat issue9992.patch Doc/using/cmdline.rst | 12 Doc/whatsnew/3.2.rst|6 -- Lib/test/test_os.py | 30 -- Lib/test/test_subprocess.py |4 Lib/test/test_sys.py| 29 - Modules/main.c |3 --- Modules/python.c| 10 +- Python/pythonrun.c | 22 ++ 8 files changed, 15 insertions(+), 101 deletions(-) I like such patch: it removes more code than it adds, but it fixes 4 different issues! I didn't tested the patch specific to OSX (use utf8 to decode command line arguments). -- Added file: http://bugs.python.org/file19190/issue9992.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: I think that issue9992.patch fixes also #4388 because it uses the same encoding (FS encoding, utf8) on OSX to encode and to decode command line arguments. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: We run into problems because we have two inconsistent encodings, ... What? No. We have problems because we don't use the same encoding to decode and to encode the same data type. It's not a problem to use a different encoding for each data type (stdout, filenames, environment variables, ...). -- About the 3rd encoding: it will be just the locale encoding. Use the locale encoding to encode/decode command line arguments and environment variables is complelty compatible with Python 3.1, because Python 3.1 initializes the filesystem encoding with the locale encoding. Use the locale encoding helps the interoperability because other programs use the same encoding. Mac OS X is a special case. Filesystem encoding is utf-8 on this OS, whereas the locale encoding depends on LANG variable. If I understood MvL proposition correctly, we should not rely on the locale on Mac OS X. So the 3rd encoding and the filesystem encodings should be hardcoded to utf-8? -- The third encoding is no more controlable by a special environment variable, only by classic locale environment variables (LC_ALL, LC_CTYPE, LANG). Is it a problem? I remember a comment from MAL saying that it may be a problem for CGI for the environment variables because some (all?) variables are not encoded with the locale encoding (but the HTML encoding?). I don't know if Python should workaround CGI specific issues. In Python 3.2, we have now os.environb: it's now possible to use a different encoding for each variable. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Martin v. Löwis mar...@v.loewis.de added the comment: Am 10.10.2010 17:51, schrieb STINNER Victor: STINNER Victor victor.stin...@haypocalc.com added the comment: We run into problems because we have two inconsistent encodings, ... What? No. We have problems because we don't use the same encoding to decode and to encode the same data type. It's not a problem to use a different encoding for each data type (stdout, filenames, environment variables, ...). This is exactly the very problem that we face. In particular, the question is what encoding to use if something is *both* a filename and an environment variable value, or both a filename and a command line argument. Mac OS X is a special case. Filesystem encoding is utf-8 on this OS, whereas the locale encoding depends on LANG variable. If I understood MvL proposition correctly, we should not rely on the locale on Mac OS X. Not rely on is perhaps a bit harsh. It's not clear (to me) under what conditions the locale's encoding will be more correct than just assuming UTF-8 - there may actually be use cases for it. However, with the surrogate escapes, we could just always decode using UTF-8, and leave any mojibake problems that may arise from this from this to the application. I do think that these problems will be rare, since a) many OSX installations use UTF-8, anyway, and b) those that don't likely experience the proper round-tripping of the escape mechanism. So the 3rd encoding and the filesystem encodings should be hardcoded to utf-8? That's an option to consider, yes - I'd like an OSX expert to comment. The third encoding is no more controlable by a special environment variable, only by classic locale environment variables (LC_ALL, LC_CTYPE, LANG). Is it a problem? I remember a comment from MAL saying that it may be a problem for CGI for the environment variables because some (all?) variables are not encoded with the locale encoding (but the HTML encoding?). I don't know if Python should workaround CGI specific issues. In Python 3.2, we have now os.environb: it's now possible to use a different encoding for each variable. I think these problems are sufficiently resolved now: either by PEP , PEP 444, PEP 383, or os.environb. I think you misunderstood MAL's comment, though: the environment variables are not encoded in *any* specific encoding. Instead, they are copied literally from the HTTP request, using whatever bytes the browser originally put in there - which may or may not have followed a particular encoding. HTTP is silent on this most of the time, and HTML is out of scope. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: What? No. We have problems because we don't use the same encoding to decode and to encode the same data type. It's not a problem to use a different encoding for each data type (stdout, filenames, environment variables, ...). This is exactly the very problem that we face. In particular, the question is what encoding to use if something is *both* a filename and an environment variable value, or both a filename and a command line argument. The question is: what is the best default encoding for a specific data type? There is no perfect answer (well, except maybe using byte strings :-)). Each solution has its own use cases and disadvantages. If an application knows exactly the encoding of a data, and it is not the default encoding, it can still redecode the data. Using os.environb, it's a little bit better: the application just has to decode (don't have to encode and to know which encoding was used to decode initially the data). For sys.argv, I still want to create sys.argvb (bytes version) ;-) For the command line arguments and environment variables, we don't have a lot of choices: locale or filesystem encodings. So Antoine and Martin: which encoding do you prefer? We should maybe try to find some use cases Here is a dummy script bla.py: --- import sys print(sys.argv) try: open(sys.argv[1]).close() except Exception as err: print(open error: %s % err) else: print(open ok) --- Locale encoding = FS encoding = utf-8: $ ./python bla.py xxxé.txt ['bla.py', 'xxxé.txt'] open ok Locale encoding = utf8, FS encoding = ascii: $ PYTHONFSENCODING=ascii ./python bla.py xxxé.txt ['bla.py', 'xxxé.txt'] open error: 'ascii' codec can't encode character '\xe9' ... The filename is displayed correctly, but we are unable to open the file if PYTHONFSENCODING is used :-/ Should the filename be displayed differently if PYTHONFSENCODING is used? I think these problems are sufficiently resolved now: either by PEP , PEP 444, PEP 383, or os.environb. Ok, cool :-) I think you misunderstood MAL's comment, though: the environment variables are not encoded in *any* specific encoding. Instead, they are copied literally from the HTTP request, using whatever bytes the browser originally put in there - which may or may not have followed a particular encoding. HTTP is silent on this most of the time, and HTML is out of scope. Ah yes, thanks for you explaination. I was unable to find its comment. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Martin v. Löwis mar...@v.loewis.de added the comment: For the command line arguments and environment variables, we don't have a lot of choices: locale or filesystem encodings. So Antoine and Martin: which encoding do you prefer? I still propose to drop the fsname encoding. Then this question goes away. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Antoine Pitrou pit...@free.fr added the comment: Le dimanche 10 octobre 2010 à 18:23 +, Martin v. Löwis a écrit : Martin v. Löwis mar...@v.loewis.de added the comment: For the command line arguments and environment variables, we don't have a lot of choices: locale or filesystem encodings. So Antoine and Martin: which encoding do you prefer? I still propose to drop the fsname encoding. Then this question goes away. I don't know what you mean by dropping, since OS X by construction needs a filesystem encoding (utf-8) different from the locale encoding; and Windows hardwires the decoding/encoding of bytes filenames using mbcs regardless of the current codepage, IIRC. So do you just mean the filesystem encoding should be hidden from the user? What would be the benefit? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Martin v. Löwis mar...@v.loewis.de added the comment: I don't know what you mean by dropping, since OS X by construction needs a filesystem encoding (utf-8) different from the locale encoding; See above. I propose to stop using the locale encoding for command line arguments and environment variables on OSX, and use UTF-8 instead. and Windows hardwires the decoding/encoding of bytes filenames using mbcs regardless of the current codepage, IIRC. I wish byte-oriented file names could be dropped on Windows. But that is probably too incompatible. So do you just mean the filesystem encoding should be hidden from the user? What would be the benefit? That the very issue that this bug report (re-read the title) is about would go away. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: Perhaps. We could also declare that command line arguments and environment variables are always UTF-8-encoded on OSX (which I think would be fairly accurate) Python uses the filesystem encoding to encode/decode environment variables, and OSX, fs encoding is utf-8. For the command line, it would mean that we introduced a new encoding: command line encoding, which will be utf-8 on OSX. -- title: Command line arguments are not correctly decodedif localeand fileystem encodings aredifferent - Command line arguments are not correctly decodediflocaleand fileystem encodingsaredifferent ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Antoine Pitrou pit...@free.fr added the comment: For the command line, it would mean that we introduced a new encoding: command line encoding, which will be utf-8 on OSX. Or more generally environment encoding, if it's also used for env vars. This could solve the subprocess issue neatly. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
STINNER Victor victor.stin...@haypocalc.com added the comment: So perhaps it would be best if Python had two external default encodings: the IO one (command line arguments, environment variables, text files), and the file name encoding (defaulting to the IO encoding if not set) Hum, I prefer to consider the FS encoding as an *internal* encoding. ... But it's not completly true: it is used for the environment variables. Let's consider that FS encoding is only an internal encoding. Wee need 3 encodings: - FS encoding: any operation on the filesystem - IO encoding: text file contents (included stdin, stdout, stderr which are text files) - a 3rd encoding (let's call it the command line encoding): used for the command line arguments and the environment variables For technical reasons (bootstrap: Python initialization issues), I would like that the 3rd encoding is set using the locale encoding. The user can only control it using the classical locale variables (LC_ALL, LC_CTYPE, LANG). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Martin v. Löwis mar...@v.loewis.de added the comment: Am 09.10.2010 14:07, schrieb Antoine Pitrou: Antoine Pitrou pit...@free.fr added the comment: For the command line, it would mean that we introduced a new encoding: command line encoding, which will be utf-8 on OSX. Or more generally environment encoding, if it's also used for env vars. This could solve the subprocess issue neatly. Please no. We run into problems because we have two inconsistent encodings, and now you propose to introduce another one, allowing for even more inconsistencies??? -- title: Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent - Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9992] Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent
Antoine Pitrou pit...@free.fr added the comment: Please no. We run into problems because we have two inconsistent encodings, and now you propose to introduce another one, allowing for even more inconsistencies??? It would not really be a third encoding, since it would replace the locale encoding for all pratical purposes, if I understand Victor's proposal correctly. -- title: Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent - Command line arguments are not correctly decodediflocale and fileystem encodingsaredifferent ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9992 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com