Re: [Python-Dev] File system path encoding on Windows

Steve Dower Mon, 22 Aug 2016 09:06:58 -0700

On 22Aug2016 0247, Stephen J. Turnbull wrote:

Nick Coghlan writes:
 > On 21 August 2016 at 06:31, Steve Dower <steve.do...@python.org> wrote:


 > > My biggest concern is that it then falls onto users to know how
 > > to start Python with that flag.

The users I'm most worried about belong to organizations where
concerted effort has been made to "purify" the environment so that
they *can* use bytes-oriented code.  That is, getfilesystemencoding()
== getpreferredencoding() == what is actually used throughout the
system.  Such organizations will be able to choose the flag correctly,
and implement it organization-wide, I'm pretty sure.  I doubt that all
will choose UTF-8 at this point in time, though I wish they would.

I think that these are also the people who are likely to read a PEP andenable an environment variable to preserve the current behaviour. I'mmore concerned about uncontrolled environments where a library breaks ona random user's machine because random user downloaded a file from aforeign website.

I don't recall whether I mentioned an environment variable (i.e.PYTHONUSELEGACYENCODING or similar) to switch back to mbcs:ignore bydefault, but it was always my intent to have one.

Python itself is already ready for UTF-8, except that on Windows
getfilesystemencoding and getpreferredencoding can't honestly return
'utf-8', AIUI.  I understand that that is exactly what Steve wants to
change, but "honestly" is the rub.  What happens if Python 3.6 is only
part of a bytes-oriented system, receives a filename forced to UTF-8-
encoded bytes, and passes that over a pipe or in shared memory or in a
file to a non-Python-3.6 application that trusts the system defaults?
"Boom!", no?  Is there any experience anywhere in any implementation
language with systems used on Windows that use this approach of
pretending the Windows world is UTF-8?  If not, why is it a good idea
for Python to go first?

The Windows world is Unicode. Mostly represented in UTF-16, but UTF-8 isentirely equivalent.

All MSVC users have been pushed towards Unicode for many years. The .NETFramework has defaulted to UTF-8 its entire existence. The use of codepages has been discouraged for decades. We're not going first :)

 > > On the other hand, having code opt-in or out of the new handling
 > > requires changing code (which is presumably not going to happen,
 > > or we wouldn't consider keeping the old behaviour and/or letting
 > > the user control it),

I don't understand why this argument doesn't cut both ways equally.
If you believe that, you should also believe that the same people who
won't change code to opt in also won't use a Python containing fix #1,
and may not install it at all.  Doesn't that matter?

People already do this (e.g. Python 2.7). I don't think it should matterenough to prevent us from making changes in new versions of Python.Otherwise, why would we ever release new versions?

So I guess the question here is: for organisations who have already(incorrectly) assumed that the file system encoding and the active codepage are always the same, have built solid infrastructure around thisusing bytes (including ensuring that their systems never encounterexternal paths in glob/listdir/etc.), are currently using 3.5 and wantto migrate to 3.6 - is an environment variable to change back to mbcssufficient to meet their needs?


Cheers,
Steve
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] File system path encoding on Windows

Reply via email to