Re: [Python-ideas] Fix default encodings on Windows

Steve Dower Wed, 17 Aug 2016 09:40:30 -0700

On 17Aug2016 0901, Nick Coghlan wrote:

On 17 August 2016 at 02:06, Chris Barker <chris.bar...@noaa.gov> wrote:

So the Solution is to either:


 (A) get everyone to use Unicode  "properly", which will work on all
platforms (but only on py3.5 and above?)

or

(B) kludge some *nix-compatible support for byte paths into Windows, that
will work at least much of the time.

It's clear (to me at least) that (A) it the "Right Thing", but real world
experience has shown that it's unlikely to happen any time soon.

Practicality beats Purity and all that -- this is a judgment call.

Have I got that right?


Yep, pretty much. Based on Stephen Turnbull's concerns, I wonder if we
could make a whitelist of universal encodings that Python-on-Windows
will use in preference to UTF-8 if they're configured as the current
code page. If we accepted GB18030, GB2312, Shift-JIS, and ISO-2022-*
as overrides, then problems would be significantly less likely.

Another alternative would be to apply a similar solution as we do on
Linux with regards to the "surrogateescape" error handler: there are
some interfaces (like the standard streams) where we only enable that
error handler specifically if the preferred encoding is reported as
ASCII. In 2016, we're *very* skeptical about any properly configured
system actually being ASCII-only (rather than that value showing up
because the POSIX standards mandate it as the default), so we don't
really believe the OS when it tells us that.

The equivalent for Windows would be to disbelieve the configured code
page only when it was reported as "mbcs" - for folks that had
configured their system to use something other than the default,
Python would believe them, just as we do on Linux.

The problem here is that "mbcs" is not configurable - it's ameta-encoder that uses whatever is configured as the "language (systemlocale) to use when displaying text in programs that do not supportUnicode" (quote from the dialog where administrators can configurethis). So there's nothing to disbelieve here.

And even on machines where the current code page is "reliable", UTF-16is still the actual encoding, which means UTF-8 is still a better choicefor representing the path as a blob of bytes. Currently we haveinconsistent encoding between different Windows machines and couldeither remove that inconsistency completely or simply reduce it for(approx.) English speakers. I would rather an extreme here - either makeit consistent regardless of user configuration, or make it so brokenthat nobody can use it at all. (And note that the correct way to support*some* other FS encodings would be to change the return value fromsys.getfilesystemencoding(), which breaks people who currently ignorethat just as badly as changing it to utf-8 would.)


Cheers,
Steve

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fix default encodings on Windows

Reply via email to