> My PEP 540 is different than Nick's PEP 538, even for the POSIX > locale. I propose to always use the surrogateescape error handler, > whereas Nick wants to keep the strict error handler for inputs and > outputs. > https://www.python.org/dev/peps/pep-0540/#encoding-and-error-handler > > The surrogateescape error handler is useful to write programs which > work as pipes, as cat, grep, sed, ... UNIX program: > https://www.python.org/dev/peps/pep-0540/#producer-consumer-model-using-pipes > > You can get the behaviour of Nick's PEP 538 using my UTF-8 Strict > mode. Compare "UTF-8 mode" and "UTF-8 Strict mode" lines in the tables > of my use case. The UTF-8 mode always works, but can produce mojibake, > whereas UTF-8 Strict doesn't produce mojibake but can fail depending > on data and the locale. > > IMHO most users prefers usability ("just work") over correctness > (prevent mojibake). >
I'm ±0 to surrogateescape by default. I feel +1 for stdout and -1 for stdin. In output case, surrogateescape is weaker than strict, but it only allows surrgateescaped binary. If program carefully use surrogateescaped decode, surrogateescape on stdout is safe enough. On the other hand, surrogateescape is very weak for input. It accepts arbitrary bytes. It should be used carefully. But I agree different encoding handler between stdin/stdout is not beautiful. That's why I'm ±0. FYI, when http://bugs.python.org/issue15216 is merged, we can change error handler easily: ``sys.stdout.set_encoding(errors='surrogateescape')`` So it's controllable from Python. Some program which handles filenames may prefer surrogateescape, and some program like CGI may prefer strict UTF-8 because JSON and HTML5 shouldn't contain arbitrary bytes. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/