On 22Aug2016 0247, Stephen J. Turnbull wrote:
Nick Coghlan writes:
 > On 21 August 2016 at 06:31, Steve Dower <steve.do...@python.org> wrote:

 > > My biggest concern is that it then falls onto users to know how
 > > to start Python with that flag.

The users I'm most worried about belong to organizations where
concerted effort has been made to "purify" the environment so that
they *can* use bytes-oriented code.  That is, getfilesystemencoding()
== getpreferredencoding() == what is actually used throughout the
system.  Such organizations will be able to choose the flag correctly,
and implement it organization-wide, I'm pretty sure.  I doubt that all
will choose UTF-8 at this point in time, though I wish they would.

I think that these are also the people who are likely to read a PEP and enable an environment variable to preserve the current behaviour. I'm more concerned about uncontrolled environments where a library breaks on a random user's machine because random user downloaded a file from a foreign website.

I don't recall whether I mentioned an environment variable (i.e. PYTHONUSELEGACYENCODING or similar) to switch back to mbcs:ignore by default, but it was always my intent to have one.

Python itself is already ready for UTF-8, except that on Windows
getfilesystemencoding and getpreferredencoding can't honestly return
'utf-8', AIUI.  I understand that that is exactly what Steve wants to
change, but "honestly" is the rub.  What happens if Python 3.6 is only
part of a bytes-oriented system, receives a filename forced to UTF-8-
encoded bytes, and passes that over a pipe or in shared memory or in a
file to a non-Python-3.6 application that trusts the system defaults?
"Boom!", no?  Is there any experience anywhere in any implementation
language with systems used on Windows that use this approach of
pretending the Windows world is UTF-8?  If not, why is it a good idea
for Python to go first?

The Windows world is Unicode. Mostly represented in UTF-16, but UTF-8 is entirely equivalent.

All MSVC users have been pushed towards Unicode for many years. The .NET Framework has defaulted to UTF-8 its entire existence. The use of code pages has been discouraged for decades. We're not going first :)

 > > On the other hand, having code opt-in or out of the new handling
 > > requires changing code (which is presumably not going to happen,
 > > or we wouldn't consider keeping the old behaviour and/or letting
 > > the user control it),

I don't understand why this argument doesn't cut both ways equally.
If you believe that, you should also believe that the same people who
won't change code to opt in also won't use a Python containing fix #1,
and may not install it at all.  Doesn't that matter?

People already do this (e.g. Python 2.7). I don't think it should matter enough to prevent us from making changes in new versions of Python. Otherwise, why would we ever release new versions?

So I guess the question here is: for organisations who have already (incorrectly) assumed that the file system encoding and the active code page are always the same, have built solid infrastructure around this using bytes (including ensuring that their systems never encounter external paths in glob/listdir/etc.), are currently using 3.5 and want to migrate to 3.6 - is an environment variable to change back to mbcs sufficient to meet their needs?

Cheers,
Steve
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to