>> On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower <steve.do...@python.org> >> wrote: > > and using the *W APIs exclusively is the right way to go.
My proposal was to use the wide-character APIs, but transcoding CP_ACP without best-fit characters and raising a warning whenever the default character is used (e.g. substituting Katakana middle dot when creating a file using a bytes path that has an invalid sequence in CP932). This proposal was in response to the case made by Stephen Turnbull. If using UTF-8 is getting such heavy pushback, I thought half a solution was better than nothing, and it also sets up the infrastructure to easily switch to UTF-8 if that idea eventually gains acceptance. It could raise exceptions instead of warnings if that's preferred, since bytes paths on Windows are already deprecated. > *Any* encoding that may silently lose data is a problem, which basically > leaves utf-16 as the only option. However, as that causes other problems, > maybe we can accept the tradeoff of returning utf-8 and failing when a > path contains invalid surrogate pairs Are there any common sources of illegal UTF-16 surrogates in Windows filenames? I see that WTF-8 (Wobbly) was developed to handle this problem. A WTF-8 path would roundtrip back to the filesystem, but it should only be used internally in a program. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/