Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

INADA Naoki Wed, 11 Jan 2017 16:24:07 -0800

> My PEP 540 is different than Nick's PEP 538, even for the POSIX
> locale. I propose to always use the surrogateescape error handler,
> whereas Nick wants to keep the strict error handler for inputs and
> outputs.
> https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler
>
> The surrogateescape error handler is useful to write programs which
> work as pipes, as cat, grep, sed, ... UNIX program:
> https://www.python.org/dev/peps/pep-0540/#producer-consumer-model-using-pipes
>
> You can get the behaviour of Nick's PEP 538 using my UTF-8 Strict
> mode. Compare "UTF-8 mode" and "UTF-8 Strict mode" lines in the tables
> of my use case. The UTF-8 mode always works, but can produce mojibake,
> whereas UTF-8 Strict doesn't produce mojibake but can fail depending
> on data and the locale.
>
> IMHO most users prefers usability ("just work") over correctness
> (prevent mojibake).
>


I'm ±0 to surrogateescape by default.  I feel +1 for stdout and -1 for stdin.

In output case, surrogateescape is weaker than strict, but it only allows
surrgateescaped binary.  If program carefully use surrogateescaped decode,
surrogateescape on stdout is safe enough.

On the other hand, surrogateescape is very weak for input.  It accepts
arbitrary bytes.
It should be used carefully.

But I agree different encoding handler between stdin/stdout is not beautiful.
That's why I'm ±0.


FYI, when http://bugs.python.org/issue15216 is merged, we can change
error handler easily: ``sys.stdout.set_encoding(errors='surrogateescape')``

So it's controllable from Python.  Some program which handles filenames may
prefer surrogateescape, and some program like CGI may prefer strict
UTF-8 because
JSON and HTML5 shouldn't contain arbitrary bytes.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] PEP 540: Add a new UTF-8 mode

Reply via email to