[issue23993] Use surrogateescape error handler by default in open() if the locale is C
STINNER Victor added the comment: I am -1 on this. (Or may be more). What's the rationale? See the issue #19977. In many cases you get the C locale by mistake. For example, by setting the LANG environment variable to an empty string to run a program in english (whereas LC_MESSAGES is the appropriate variable). For deamons, in many cases you get the C locale and it's hard to configure all systems to run the daemon with the user locale. I read that systemd runs daemons with the user locale, but I'm not sure. The idea is to reduce the pain caused by this locale. When porting an application from Python 2 to Python 3, it's annoying to start to get unicode errors everywhere. This issue starts to make Python 3 more convinient. I could see using utf-8 by default if the locale is C, This has been proposed many times, but I'm opposed to that. Python must be interoperable with other programs, and other programs use the locale encoding. For example, you get the ASCII locale encoding when the LC_CTYPE is the POSIX locale (C). If Python writes UTF-8, other applications will be unable to decode UTF-8 data. Maybe I'm wrong and you should continue to investigate this option. This issue is very specific to OS data: environment variables, filenames, command line arguments, standard streams (stdin, stdout, stderr). You may do other choices for other kind of data unrelated to the locale encoding. For example, JSON must use UTF-8, it's well defined. XML announces its encoding. etc. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23993 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23993] Use surrogateescape error handler by default in open() if the locale is C
R. David Murray added the comment: I am -1 on this. (Or may be more). What's the rationale? I could see using utf-8 by default if the locale is C, but I don't think we want to encourage going back to a world where people don't pay attention to the encoding of their data. A more productive approach to solving the problem that I think you are trying to solve here would be to work on including chardet in the standard library, something that was brought up, and seemed to receive positive reception (or at least not negative), during the Requests segment of the PyCon language summit. -- nosy: +r.david.murray ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23993 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23993] Use surrogateescape error handler by default in open() if the locale is C
STINNER Victor added the comment: Updated and better patch: version 2. - revert changes on fileutils.c: it's not useful to check for check_force_ascii(), because this function is more strict than checking of the LC_CTYPE is C - fix _pyio.py: add sys import - complete the documentation - tests pass -- Added file: http://bugs.python.org/file39103/default_error_handler-2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23993 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23993] Use surrogateescape error handler by default in open() if the locale is C
New submission from STINNER Victor: As a following of the issue #19977, I propose to use also the surrogateescape error handler in open() by default if the locale is C. Attached issue adds a new sys.getdefaulterrorhandler() function and use it in io.TextIOWrapper (and _pyio.TextIOWrapper). We may use sys.getdefaulterrorhandler() in more places. I don't think that it would be correct to use in for str.encode() or bytes.decode(). -- components: Unicode files: default_error_handler.patch keywords: patch messages: 241405 nosy: ezio.melotti, haypo, ncoghlan priority: normal severity: normal status: open title: Use surrogateescape error handler by default in open() if the locale is C versions: Python 3.5 Added file: http://bugs.python.org/file39100/default_error_handler.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23993 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23993] Use surrogateescape error handler by default in open() if the locale is C
STINNER Victor added the comment: The patch is a work-in-progress, I didn't have time to run unit tests, and the documentation is not completed. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23993 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com