STINNER Victor added the comment:

> We should not overcomplicate this. I suggest that we simply use utf-8 under 
> the C locale.

Please open a new issue if you would prefer UTF-8. You will have to solve 
different technical issues. I tried to list some of them in issues #19846 and 
#19847.

In short, you should always decode and encode "OS data" with the same encoding. 
Python "file system encoding" is the locale encoding because in some places, 
PyUnicode_DecodeLocale[AndSize]() is used (ex: to decode PYTHONWARNINGS 
environment variable). A common location is PyUnicode_DecodeFSDefaultAndSize() 
before the Python codec is loaded. See also _Py_wchar2char() and 
_Py_char2wchar() functions which use the locale encoding and are used in many 
places.

I'm now closing the issue because the initial point (use surrogateescape error 
handler) is implemented in Python 3.5, and backporting such major change in 
Python 3.4 branch is risky right now.

----------
resolution:  -> fixed
status: open -> closed

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19977>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to