On 28Aug2016 2043, Stephen J. Turnbull wrote:
tritium-l...@sdamon.com writes:

 > Once you get to var lengths like that, arcane single character flags start
 > looking preferable.  How about "PYTHONWINLEGACY" to just turn it all on or
 > off.  If the code breaks on one thing, it obviously isn't written to use the
 > other two, so might as well shut them all off.

Since Steve is thinking about three separate PEPs (among other things,
they might be implemented on different timelines), that's not really
possible (placing the features under control of one switch at
different times would be an unacceptable compatibility break).

Yeah, the likelihood of different timelines basically means three PEPs are going to be necessary. But I think we can have a single "PYTHONWINDOWSANSI" (or ...MBCS) flag to cover all three whenever they come in without it being a compatibility break, especially if (as Nick suggested) there are _PYTHONWINDOWSANSI(CONSOLE|PATH|LOCALE) flags too. But it does give us the ability to say "all ANSI or all UTF-8 are supported; mix-and-match at your own risk".

Anyway, it's not *obvious* that your premise is true, because code
isn't written to do any of those things.  It's written to process
bytes agnostically.  The question is what does the environment look
like.  Steve obviously has a perspective on environment which suggests
that these aspects are often decoupled because in Windows the actual
filesystem is never bytes-oriented.  I don't know if it's possible to
construct a coherent environment where these aspects are decoupled,
but I can't say it's impossible, either.

Actually, the three items are basically completely decoupled, though it isn't obvious.

* stdin/stdout/stderr are text wrappers by default (under my changes, using the console encoding when it's a console and the locale encoding when it's a file/pipe). There's no point reading bytes from the console, and redirected files or pipes are unaffected by the change. * the file system encoding only affects paths passed into/returned from the OS as bytes, and...
* the locale encoding affects files opened in text mode, which means...
* if you open('rb') and read paths, the locale encoding has no effect on whether the bytes are the right encoding to be used as paths

So while there are scenarios that use multiple pieces of this, there should only be one change impacting any scenario:
* reading str paths from a file - locale encoding
* reading bytes paths from a file - filesystem encoding
* reading str paths from a pipe/redirected file - locale encoding
* reading bytes paths from a pipe/redirected file - filesystem encoding
* reading str paths from the console - console encoding
* reading bytes paths from the console (i.e. sys.stdin.buffer.raw.read()) - filesystem encoding

The last case doesn't make sense anyway right now, as sys.stdin.buffer.raw has no specified encoding and you can't reliably read paths from it. Perhaps there exist examples of where this is put to good use (bearing in mind it must be an actual console - not a redirection or pipe) - I would love to hear about them.

As far as I can tell, any other combination requires the Python developer to convert between str and bytes themselves, which may lead to errors if they have assumed that the encoding of the bytes would never change, but code that ignores encodings and uses bytes or str exclusively is only going to encounter one (bytes) or two (str) of the changes.

Cheers,
Steve

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to