STINNER Victor added the comment: 2013/12/8 Nick Coghlan <rep...@bugs.python.org>: > Yes, that's the point. *Every* case I've seen where the locale encoding has > been reported as ASCII on a modern Linux system has been because the > environment has been configured to use the C locale, and that locale has a > silly, antiquated, encoding setting. > > This is particularly problematic when people remotely access a system with > ssh and get given the C locale instead of something sensible, and then can't > properly read the filesystem on that server.
The solution is to fix the locale, not to fix Python. For example, don't set LANG to C. >From the C locale, you cannot guess the "correct" encoding. In Unicode, the general rule is to never try the encoding. > The idea of using UTF-8 instead in that case is to *change* (and hopefully > reduce) the number of cases where things go wrong. If the OS uses ISO-8859-1, forcing Python (filesystem) encoding to UTF-8 would produce invalid filenames, display mojibake and more generally produce data incompatible with other applicatons (who rely on the C locale, and so the ASCII encoding). > - there may be other cases where ASCII actually *is* the filesystem encoding > (in which case they're going to have trouble anyway), or the real filesystem > encoding is something other than UTF-8 As I wrote before, os.getfilesystemencoding() is *not* the filesystem encoding. It's the "OS" encoding used to decode any kind of data coming for the OS and used to encode back Python data to the OS. Just some examples: - DNS hostnames - Environment variables - Command line arguments - Filenames - user/group entries in the grp/pwd modules - almost all functions of the os module, they return various type of information (ttyname, ctermid, current working directory, login, ...) > We're already approximating things on Linux by assuming every filesystem is > using the *same* encoding, when that's not necessarily the case. Glib > applications also assume UTF-8, regardless of the locale > (http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux). If you use a different encoding but only just for filenames, you will get mojibake when you pass a filename on the command line or in an environment varialble. > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks > Python 3*, and that's not OK. Getting ASCII filesystem encoding is annoying, but I would not say that it fundamentally breaks Python 3. If you want to do something, you should write documentation explaining how to configure properly Linux. ---------- title: Setting LANG=C breaks Python 3 -> print() and write() are relying on sys.getfilesystemencoding() instead of sys.getdefaultencoding() _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19846> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com