On 29Aug2016 1810, Nick Coghlan wrote:
On 30 August 2016 at 08:38, Victor Stinner <victor.stin...@gmail.com> wrote:
Hi,

tl; dr: just drop byte support and help developers to use Unicode in
their application!

My view (and Steve's) is that this approach is likely to result in
Linux-centric projects just dropping even nominal native Windows
support, rather than more Python software that handles Unicode on
Windows (/the CLR/the JVM) correctly.

Yeah, this basically sums it up. If I could be sure that the Python developers who are 99% Linux/1% Windows (i.e. run unit tests once and then release) weren't going to see dropping byte support completely as a hostile action, I'd much rather go that way.

But let's definitely take note that platform-specific deprecation warnings are probably not a good idea for cross-platform functionality.

What Steve is proposing here is essentially a way of providing more
*nix like CPython behaviour on Windows

Yep. What actually spurred me into action on this was a Twitter rant from one of Twisted's developers about paths on Windows. So I presume that Twisted is probably okay *now* (and hopefully because they explicitly decode from network traffic into str before accessing the file system...)

Using bytes has essentially always been using an arbitrarily-encoded str on Windows. The active code page is not an equivalent of "give me the path as raw bytes" as it is on POSIX, but my change will make it so that it is. There'll be a performance penalty, but otherwise using bytes for paths will become reliable.

Unfortunately, any implicitly-encoded cross-version interoperability will have to be broken by such a change. There's just no way around it. But I've seen no evidence that it's common, and there are two workarounds available (set the environment variable, or change your code to specify the encoding used).

However, this view is also why I don't agree with being aggressive in
making this behaviour the default on Windows - I think we should make
it readily available as a provisional feature through a single
cross-platform command line switch and environment setting (e.g. "-X
utf8" and "PYTHONASSUMEUTF8") so folks that need it can readily opt in
to it, but we can defer making it the default until 3.7 after folks
have had a full release cycle's worth of experience with it in the
wild.

Given the people who would need to opt-in to the behaviour are merely the recipients of a library written by someone else, I don't think this is the right approach. Stephen Turnbull in an earlier post referred to organisations that fully control their systems in order to ensure that the implicit encodings all match. These are also the people who can apply an environment variable to avoid a behaviour change.

However, someone who just installed an HTTP library that was developed on POSIX and perhaps not even tested on Windows should not have to flick the switch themselves. In contrast, if it is known that 3.6 *definitely* changed something here, we will certainly see more effort applied to making sure libraries are updated. (Compare these two bug reports: "your library breaks on Python 3.6" vs "your library breaks on Python 3.6 when I set this environment variable". The fix for the latter is quite reasonably going to be "don't do that".)

The other discussion about OpenSSL and LTS systems is also interesting. Do we really expect users to take their fully functioning systems and blindly upgrade to a new major version of Python expecting everything to just work? That seems very unlikely to me, and also doesn't match my experience (but I can't quantify that in any useful way, so take it as you wish).

Cheers,
Steve

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to