Nick Coghlan added the comment:

Yes, that's the point. *Every* case I've seen where the locale encoding has 
been reported as ASCII on a modern Linux system has been because the 
environment has been configured to use the C locale, and that locale has a 
silly, antiquated, encoding setting.

This is particularly problematic when people remotely access a system with ssh 
and get given the C locale instead of something sensible, and then can't 
properly read the filesystem on that server.

The idea of using UTF-8 instead in that case is to *change* (and hopefully 
reduce) the number of cases where things go wrong.

- if no non-ASCII data is encountered, the choice of ASCII vs UTF-8 doesn't 
matter
- if it's a modern Linux distro, then the real filesystem encoding is UTF-8, 
and the setting it provides for LANG=C is just plain *wrong*
- there may be other cases where ASCII actually *is* the filesystem encoding 
(in which case they're going to have trouble anyway), or the real filesystem 
encoding is something other than UTF-8

We're already approximating things on Linux by assuming every filesystem is 
using the *same* encoding, when that's not necessarily the case. Glib 
applications also assume UTF-8, regardless of the locale 
(http://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux).

At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 
3*, and that's not OK.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19846>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to