Re: [Python-ideas] Fix default encodings on Windows

Steve Dower Tue, 16 Aug 2016 08:57:50 -0700

I just want to clearly address two points, since I feel like multipleposts have been unclear on them.

1. The bytes API was deprecated in 3.3 and it is listed inhttps://docs.python.org/3/whatsnew/3.3.html. Lack of mention in the docsis an unfortunate oversight, but it was certainly announced and thewarning has been there for three released versions. We can freely changeor remove the support now, IMHO.

2. Windows file system encoding is *always* UTF-16. There's no "assumingmbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS whatencoding it is". We know exactly what the encoding is on every supportedversion of Windows. UTF-16.

This discussion is for the developers who insist on using bytes forpaths within Python, and the question is, "how do we best representUTF-16 encoded paths in bytes?"


The choices are:

* don't represent them at all (remove bytes API)
* convert and drop characters not in the (legacy) active code page
* convert and fail on characters not in the (legacy) active code page
* convert and fail on invalid surrogate pairs
* represent them as UTF-16-LE in bytes (with embedded '\0' everywhere)

Currently we have the second option.

My preference is the fourth option, as it will cause the least breakageof existing code and enable the most amount of code to just work in thepresence of non-ACP characters.


The fifth option is the best for round-tripping within Windows APIs.

The only code that will break with any change is code that was using analready deprecated API. Code that correctly uses str to represent"encoding agnostic text" is unaffected.

If you see an alternative choice to those listed above, feel free tocontribute it. Otherwise, can we focus the discussion on these (or anynew) choices?


Cheers,
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fix default encodings on Windows

Reply via email to