> https://www.python.org/dev/peps/pep-0540/
I read the PEP 538, PEP 540, and issues related to switching to UTF-8. At least, I can say one thing: people have different points of view :-) To understand why people disagree, I tried to categorize the different point of views and Python expectations: "UNIX mode": Python 2 developers and long UNIX users expect that their code "just works". They like Python 3 features, but Python 3 annoy them with various encoding errors. The expectation is to be able to read data encoded to various incompatible encodings and write it into stdout or a text file. In short, mojibake is not a bug but a feature! "Strict Unicode mode" for real Unicode fans: Python 3 is strict and it's a good thing! Strict codec helps to detect very early bugs in the code. These developers understand very well Unicode and are able to fix complex encoding issues. Mojibake is a no-no for them. Python 3.6 is not exactly in the first or the later category: "it depends". To read data from the operating system, Python 3.6 behaves in "UNIX mode": os.listdir() *does* return invalid filenames, it uses a funny encoding using surrogates. To write data back to the operating system, Python 3.6 wears its "Unicode nazi" hat and becomes strict. It's no more possible to write data from from the operating system back to the operating system. Writing a filename read from os.listdir() into stdout or into a text file fails with an encode error. Subtle behaviour: since Python 3.6, with the POSIX locale, Python 3.6 uses the "UNIX mode" but only to write into stdout. It's possible to write a filename into stdout, but not into a text file. In its current shame, my PEP 540 leaves Python default unchanged, but adds two modes: UTF-8 and UTF-8 strict. The UTF-8 mode is more or less the UNIX mode generalized for all inputs and outputs: mojibake is a feature, just pass bytes unchanged. The UTF-8 strict mode is more extreme that the current "Strict Unicode mode" since it fails on *decoding* data from the operating system. Now that I have a better view of what we have and what we want, the question is if the default behaviour should be changed and if yes, how. Nick's PEP 538 does exactly move to the "UNIX mode" (open() doesn't use surrogateescape) nor the "Strict Unicode mode" (fsdecode() still uses surrogateescape), it's still in a grey area. Maybe Nick can elaborate the use case or update his PEP? I guess that all users and most developers are more in the "UNIX mode" camp. *If* we want to change the default, I suggest to use the "UNIX mode" by default. The question is if someone relies/likes on the current Python 3.6 behaviour: reading "just works", writing is strict. If you like this behaviour, what do you think of the tiny Python 3.6 change: use surrogateescape for stdout when the locale is POSIX. Victor _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/