On Wed, Aug 10, 2016, at 14:10, Steve Dower wrote: > To summarise the proposals (remembering that these would only affect > Python 3.6 on Windows): > > * change sys.getfilesystemencoding() to return 'utf-8' > * automatically decode byte paths assuming they are utf-8 > * remove the deprecation warning on byte paths
Why? What's the use case? > * make the default open() encoding check for a BOM or else use utf-8 > * [ALTERNATIVE] make the default open() encoding check for a BOM or else > use sys.getpreferredencoding() For reading, I assume. When opened for writing, it should probably be utf-8-sig [if it's not mbcs] to match what Notepad does. What about files opened for appending or updating? In theory it could ingest the whole file to see if it's valid UTF-8, but that has a time cost. Notepad, if there's no BOM, checks the first 256 bytes of the file for whether it's likely to be utf-16 or mbcs [utf-8 isn't considered AFAIK], and can get it wrong for certain very short files [i.e. the infamous "this app can break"] What to do on opening a pipe or device? [Is os.fstat able to detect these cases?] Maybe the BOM detection phase should be deferred until the first read. What should encoding be at that point if this is done? Is there a "utf-any" encoding that can handle all five BOMs? If not, should there be? how are "utf-16" and "utf-32" files opened for appending or updating handled today? > * force the console encoding to UTF-8 on initialize and revert on > finalize Why not implement a true unicode console? What if sys.stdin/stdout are pipes (or non-console devices such as a serial port)? _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/