[issue19977] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale

2017-12-18 Thread STINNER Victor
STINNER Victor added the comment: Follow-up: the PEP 538 (bpo-28180) and PEP 540 (bpo-29240) have been accepted and implemented in Python 3.7! -- ___ Python tracker

[issue19977] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale

2017-01-06 Thread STINNER Victor
STINNER Victor added the comment: > But maybe I'm just missing something. This issue fixed exactly one use case: "List a directory into stdout" (similar to the UNIX "ls" or Windows "dir" commands): https://www.python.org/dev/peps/pep-0540/#list-a-directory-into-stdout Your use case is more

[issue19977] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale

2017-01-06 Thread Sworddragon
Sworddragon added the comment: The point is this ticket claims to be using the surrogateescape error handler for sys.stdout and sys.stdin for the C locale. I have never used surrogateescape explicitly before and thus have no experience for it and consulting the documentation mentions throwing

[issue19977] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale

2017-01-06 Thread STINNER Victor
STINNER Victor added the comment: "I thought with the surrogateescape error handler now being used for sys.stdout this would not throw an exception but I'm getting this: (...)" Please see the two recently proposed PEP: Nick's PEP 538 and my PEP 540, both propose (two different) solutions to

[issue19977] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale

2017-01-06 Thread Sworddragon
Sworddragon added the comment: Bug #28180 has caused me to make a look at the "encoding" issue this and the tickets before have tried to solve more or less. Being a bit unsure what the root cause and intention for all this was I'm now at a point to actually check this ticket. Here is an

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-28 Thread Antoine Pitrou
Antoine Pitrou added the comment: We should not overcomplicate this. I suggest that we simply use utf-8 under the C locale. Do you mean utf8/strict or utf8/surrogateescape? utf8/strict doesn't work (os.listdir raises an unicode error) if your system is configured to use latin1 (ex:

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-28 Thread Nick Coghlan
Nick Coghlan added the comment: Victor was referring to code like print(os.listdir()). Those are the motivating cases for ensuring round trips from system APIs to the standard streams work correctly. There's also the problem that sys.argv currently relies on the locale encoding directly,

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-28 Thread Antoine Pitrou
Antoine Pitrou added the comment: The conclusion I have come to is that any further decoupling of Python 3 from the locale encoding will actually depend on getting the PEP 432 bootstrapping changes implemented, reviewed and the PEP approved, so we have more interpreter infrastructure in

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-28 Thread Nick Coghlan
Nick Coghlan added the comment: Antoine Pitrou added the comment: Yeah. My proposal had more to do with the fact that we should some day switch to utf-8 by default on all POSIX systems, regardless of what the system advertises as best encoding. Yeah, that seems like a plausible future to me

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread Nick Coghlan
Nick Coghlan added the comment: Additional environments where the system misreports the encoding to use (courtesy of Armin Ronacher Graham Dumpleton on Twitter): upstart, Salt, mod_wsgi. Note that for more complex applications (e.g. integrated web UIs, socket servers, sending email), round

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread Nick Coghlan
Nick Coghlan added the comment: Issue 21368 now suggests looking for /etc/locale.conf before falling back to ASCII+surrogateescape. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19977

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread Antoine Pitrou
Antoine Pitrou added the comment: We should not overcomplicate this. I suggest that we simply use utf-8 under the C locale. -- versions: +Python 3.5 -Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19977

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread Nick Coghlan
Nick Coghlan added the comment: If you can convince Stephen Turnbull that's a good idea, sure. It's probably more likely to be the right thing than ASCII or ASCII + surrogateescape, but in the absence of hard data, he's in a better position than we are to judge the likely impact of that, at

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread STINNER Victor
STINNER Victor added the comment: We should not overcomplicate this. I suggest that we simply use utf-8 under the C locale. Do you mean utf8/strict or utf8/surrogateescape? utf8/strict doesn't work (os.listdir raises an unicode error) if your system is configured to use latin1 (ex:

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-27 Thread STINNER Victor
STINNER Victor added the comment: We should not overcomplicate this. I suggest that we simply use utf-8 under the C locale. Please open a new issue if you would prefer UTF-8. You will have to solve different technical issues. I tried to list some of them in issues #19846 and #19847. In

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-09 Thread Nick Coghlan
Nick Coghlan added the comment: The default locale on Fedora is indeed UTF-8 these days - the problem is that *users* are used to being able to use LANG=C to force the POSIX locale (whether for testing purposes or other reasons), and that currently means system utilities written in Python may

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-09 Thread STINNER Victor
STINNER Victor added the comment: The default locale on Fedora is indeed UTF-8 these days - the problem is that *users* are used to being able to use LANG=C to force the POSIX locale (whether for testing purposes or other reasons), and that currently means system utilities written in

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-04-08 Thread STINNER Victor
STINNER Victor added the comment: However, I'd still like to discuss the idea of backporting this to 3.4.1. THe idea of doing this change in Python 3.5 is that I have no idea of the risk of regression. To backport such change in a minor version (3.4.1), I would feel more confident with user

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-28 Thread Nick Coghlan
Nick Coghlan added the comment: This seems to be working on the buildbots for 3.5 now (buildbot failures appear to be due to other issues). However, I'd still like to discuss the idea of backporting this to 3.4.1. From a Fedora point of view, it's still *very* easy to flip an environment

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-19 Thread Atsuo Ishimoto
Changes by Atsuo Ishimoto ishim...@gembook.org: -- nosy: +ishimoto ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19977 ___ ___ Python-bugs-list

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-17 Thread Roundup Robot
Roundup Robot added the comment: New changeset bc06f67234d0 by Victor Stinner in branch 'default': Issue #19977: When the ``LC_TYPE`` locale is the POSIX locale (``C`` locale), http://hg.python.org/cpython/rev/bc06f67234d0 -- nosy: +python-dev ___

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-17 Thread STINNER Victor
STINNER Victor added the comment: Test failing on x86 OpenIndiana 3.x buildbot: http://buildbot.python.org/all/builders/x86%20OpenIndiana%203.x/builds/7939/steps/test/logs/stdio == FAIL: test_forced_io_encoding

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-17 Thread STINNER Victor
STINNER Victor added the comment: New behaviour: $ mkdir z $ touch z/abcé $ LC_CTYPE=C ./python -c 'import os; print(os.listdir(z)[0])' abcé Old behaviour, before the change (test with Python 3.3): $ LC_CTYPE=C python3 -c 'import os; print(os.listdir(z)[0])' Traceback (most recent call last):

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-17 Thread Roundup Robot
Roundup Robot added the comment: New changeset 3589980c98de by Victor Stinner in branch 'default': Issue #19977, #19036: Always include locale.h in pythonrun.c http://hg.python.org/cpython/rev/3589980c98de New changeset 94d5025c70a3 by Victor Stinner in branch 'default': Issue #19977: Enable

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-03-17 Thread Roundup Robot
Roundup Robot added the comment: New changeset c9905e802042 by Victor Stinner in branch 'default': Issue #19977: Fix test_capi when LC_CTYPE locale is POSIX http://hg.python.org/cpython/rev/c9905e802042 -- ___ Python tracker rep...@bugs.python.org

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-02-11 Thread STINNER Victor
STINNER Victor added the comment: Reintroducing moji-bake intentionally doesn't sound like a particularly good idea, wasn't that what python3 was supposed to help prevent? Sometimes practicality beats purity :-( I tried to convince users that their computer was not well configured, they

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-01-05 Thread Bohuslav Slavek Kabrda
Bohuslav Slavek Kabrda added the comment: Nick: Sure, once there is an upstream solution that people have agreed on, I'll look into backporting it, NP. Thanks for letting me know about this. -- ___ Python tracker rep...@bugs.python.org

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-01-04 Thread Larry Hastings
Larry Hastings added the comment: Yeah, unless there was a *huge* amount of support for changing this, it's way too late for 3.4. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19977 ___

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2014-01-02 Thread Nick Coghlan
Nick Coghlan added the comment: Larry: I'm assuming it's way too late to make a change like this for the 3.4 release? Slavek: assuming this change is made for 3.5 upstream, we may want to look at backporting it as a 3.4 patch in Fedora (as part of the Python-3-by-default project). Otherwise

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-21 Thread Jakub Wilk
Changes by Jakub Wilk jw...@jwilk.net: -- nosy: +jwilk ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19977 ___ ___ Python-bugs-list mailing list

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-19 Thread Martin Panter
Changes by Martin Panter vadmium...@gmail.com: -- nosy: +vadmium ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19977 ___ ___ Python-bugs-list

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread STINNER Victor
STINNER Victor added the comment: Oh, in fact, sys.stdin is also modified by the patch (as I expected). -- title: Use surrogateescape error handler for sys.stdout on UNIX for the C locale - Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread R. David Murray
Changes by R. David Murray rdmur...@bitdance.com: -- nosy: +r.david.murray ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19977 ___ ___

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Sworddragon
Sworddragon added the comment: What would happen if we call this example script with LANG=C on the patch?: --- import os for name in sorted(os.listdir('ä')): print(name) --- Would it throw an exception on os.listdir('ä')? -- ___ Python

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread STINNER Victor
STINNER Victor added the comment: test_ls.py: test script producing invalid filenames and then trying to display them into stdout. Output with UTF-8 locale, UTF-8 terminal and Python 3.3 (or unpatched 3.4, it's the same): ascii.txt UnicodeError 'invalid_utf8:\udcff.txt' UnicodeError

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- versions: +Python 3.5 -Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19977 ___ ___

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread STINNER Victor
STINNER Victor added the comment: os.fsencode(text) always fail if text cannot be encoded to sys.getfilesystemencoding(). surrogateescape doesn't help here. Your example is artificial, you should not get 'ä'. All OS data is decoded from the filesystem encoding using the surrogateescape error

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Antoine Pitrou
Antoine Pitrou added the comment: When LANG=C is used to get the english language (which is a mistake, LC_CTYPE=C should be used instead) I think you mean LC_MESSAGES=C here. (but it's not only about the English language; it's also about other locale parameters such as number formatting) I

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread R. David Murray
R. David Murray added the comment: Reintroducing moji-bake intentionally doesn't sound like a particularly good idea, wasn't that what python3 was supposed to help prevent? It does seem like a utf-8 default is the Way of the Future. Or even the present, most places. --

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Toshio Kuratomi
Toshio Kuratomi added the comment: My impression was that python3 was supposed to help get rid of UnicodeError tracebacks, not mojibake. If mojibake was the problem then we should never have gone down the surrogateescape path for input. -- ___

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Mojibake in input can cause decoding error in other application which consumes output of Python script. In some cases this can be even worse thin UnicodeError in producer. But for C locale this makes sense. I think we should try this experiment in 3.5.

[issue19977] Use surrogateescape error handler for sys.stdin and sys.stdout on UNIX for the C locale

2013-12-13 Thread Nick Coghlan
Nick Coghlan added the comment: Getting rid of mojibake was the goal, surrogateescape was about dealing with cases where the avoid mojibake checks were spuriously breaking round-tripping between OS APIs due to other configuration errors (with LANG=C being set, or LANG not being set at all