On Sun, Jan 24, 2021 at 01:32:28AM +0000, MRAB wrote: > On 2021-01-24 01:14, Guido van Rossum wrote: > >I have definitely seen BOMs written by Notepad on Windows 10. > > > >Why can’t the future be that open() in text mode guesses the encoding? > > > "In the face of ambiguity, refuse the temptation to guess."
"Although practicality beats purity." The Zen is like scripture: there's a koan for any position you wish to take :-) If you want to be pedantic, and I certainly do *wink*, providing any default for the encoding parameter is a guess. The encoding of all text files is ambiguous (the intended encoding is metadata which is not recorded in the file format). Most text files on Linux and Mac OS use UTF-8, and many on Windows too, but not *all* so setting the default to UTF-8 is just a guess. I understand that there are good heuristics for auto-detection of encodings which are reliable and used in many other software. If auto-detection is a "guess", its an *educated* guess and not much different from the status quo, which usually guesses correctly on Linux and Mac but often guesses wrongly on Windows. This proposal is to improve the quality of the guess by inspecting the file's contents. For example, a file opened in text mode where every second character is a NULL is *almost certainly* UTF-16. The chances that somebody actually intended to write: H\0e\0l\0l\0o\O \OW\0o\0r\0l\0d\0 rather than "Hello World" is negligible. Before we consider changing the default encoding to "auto-detect", I would like to see some estimate of how many UTF-8 encoded files will be misclassified as something else. That is, if we make this change, how much software that currently guesses UTF-8 correctly (the default encoding is the actual intended encoding) will break because it guesses something else? That surely won't happen with mostly-ASCII files, but I suppose it could happen with some non-English languages? -- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/U2T4JSKOUGSEXVVW3Y7LTXR7HQ5UJUKI/ Code of Conduct: http://python.org/psf/codeofconduct/