On 21 August 2014 23:58, Marko Rauhamaa <ma...@pacujo.net> wrote:
>
> My point is that the poor programmer cannot ignore the possibility of
> "funny" character sets. If Python tried to protect the programmer from
> that possibility, the result might be even more intractable: how to act
> on a file with an non-UTF-8 filename if you are unable to express it as
> a text string?

That's what the "surrogateescape" codec is for - we use it by default
on most OS interfaces, and it's implicit in the use of "os.fsencode"
and "os.fsdecode". Starting with Python 3, it's also enabled on
sys.stdout by default, so that "print(os.listdir(dirname))" will pass
the original raw bytes through to the terminal the same way Python 2
does.

The docs could use additional details as to which interfaces do and
don't have surrogateescape enabled by default, but for the time being,
the description of the codec error handler just links out to the
original definition in PEP 383.

It may also be useful to have some tools for detecting and cleaning
strings containing surrogate escaped data, but there hasn't been a
concrete proposal along those lines as yet. Personally, I'm currently
waiting to see if the Fedora or OpenStack folks indicate a need for
such tools before proposing any additions.

Regards,
Nick.

>
>
> Marko
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com



-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to