Toshio Kuratomi added the comment:

Ahh... added to the nosy list and bug closed all before I got up for the day ;-)

A few words:

I do think that python is broken here.

I do not think that translating everything to utf-8 if ascii is the locale's 
encoding is the solution.

As I would state it, the problem is that python's boundary with the OS is not 
yet uniform.  If you set LC_ALL=C (note, LC_ALL=C is just one of multiple ways 
to beak things.  For instance, LC_ALL=en_US.utf8 when dealing with latin-1 data 
will also break) then python will still *read* non-ascii data from the OS 
through some interfaces but it won't output it back to the OS.  ie:

$ mkdir unicode && cd unicode
$ python3 -c 'open("ñ.txt".encode("latin-1"), "w").close()'
$ LC_ALL=en_US.utf8 python3
>>> import os
>>> dir_listing = os.listdir('.')
>>> for entry in dir_listing: print(entry)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf1' in position 
0: surrogates not allowed

Note that currently, input() and sys.stdin.read() won't read undecodable data 
so this is somewhat symmetrical but it seems to me that saying "everything that 
interfaces with the OS except the standard streams will use surrogateescape on 
undecodable bytes" is drawing a line in an unintuitive location.

(A further note to serhiy.storchaka.... Your examples are not showing anything 
broken in other programs.  xterm is refusing both input and output that is 
non-ascii.  This is symmetric behaviour.  ls is doing its best to display a 
*human-readable* representation of bytes that it cannot convert in the current 
encoding.  It also provides the -b switch to see the octal values if you 
actually care.  Think of this like opening a binary file in less or another 
pager.)

(Further note for haypo -- On Fedora, the default of en_US is utf8, not 
ISO8859-1.)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19846>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to