Re: [Python-ideas] add a hash to .pyc to don't mess between .py and .pyc

2016-08-15 Thread Victor Stinner
The purpose of .pyc is to optmize python. With your proposed change, the
number of syscalls is doubled (open, read, close) and you add extra work
(compute hash) when .pyc is used.

If your filesystem works correctly, you should not have to bother.

Victor

Le 15 août 2016 01:06, "Xavier Combelle"  a
écrit :

> I have stumbled upon several time with the following problem.
> I delete a module and the .pyc stay around. and by "magic", python still
> use the .pyc
> A similar error happen (but less often) when by some file system
> manipulation the .pyc happen to be
> newer than the .py but correspond to an older version of .py. It is not
> a major problem but it is still an existing problem.
>
> I'm not the first one to have this problem. A stack overflow search lead
> to quite a lot of relevant answers
> http://stackoverflow.com/search?q=old+pyc and google search too
> https://www.google.fr/search?q=old+pyc
> moreover several result of google result in bug tracking of various
> project. (There is also in these result the fact that .pyc
> are stored in VCS repositories but this is another problem not related)
> I even found a blog post using .pyc as a backdoor
> http://secureallthethings.blogspot.fr/2015/11/
> backdooring-python-via-pyc-pi-wa-si_9.html
>
> My idea to kill both bird in one stone would be to add a hash (likely to
> be cryptographic) of the .py file in the .pyc file and read the .py file
> and check the hash
> The additional cost of first startup cost will be just the hash
> calculation which I think is cheap comparing to other factors
> (especially input output)
> The additional second startup cost of a program the main cost will be
> the additional read of .py files and the cheap hash calculations.
>
> I believe the removing of the bugs would worth the performance cost.
>
> I know that some use case makes a use of just using .pyc and not keeping
> .py  around, for example by not distribute the source file.
> But in my vision, this uses case should be solved per opt-in decision
> and not as a default. Several opt-in mechanisms could be envisioned:
> environment variables, command line switches, special compilation of
> .pyc which explicitly ask to not check for the hash.
>
> --
> Xavier
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] From mailing list to GitHub issues

2016-08-15 Thread Arek Bulski
But we do not care if the experimental animal dies, that is the point of
doing the experiment.

I registered at Discuss and kina like it. Then tried to create a new thread
and my Android keyboard shows over the fields. Discuss As it is now doesnt
work for mobile.

14 sie 2016 9:35 PM "Arek Bulski"  napisał(a):

> As i think Donald pointed it out, it doesnt take a laptop to contribute.
> Did you all notice that Guido replied from a phone?
>
> Currently half of the mailing list mail is large auto quotes or subject
> date info. Lines are never broken the way they should be. Who wants to keep
> their mail toolchains, keep it. Dont make the rest of us put up with this
> shit.
>
> There is no definition of fails just As i dont have a definition of
> consensus. People will stick to it or not. No voting, just participation.
>
> 14 sie 2016 1:57 PM "Arek Bulski"  napisał(a):
>
> ​I throw a proposal on the table: lets create a "python-ideas" repo under
> "python" account on GitHub and move this and only this thread onto it. If
> it fails, nothing but this thread is lost (not persisted in the mailing
> list) which would make no difference anyway. People made many points that
> are purely abstract. We need some hands on experience to see if it works
> for us or doesnt.
>
> Guido, could you create a repo for us?​
>
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] From mailing list to GitHub issues

2016-08-15 Thread Arek Bulski
For those talking abstract points, here is a screenshot of GitHub. Works
like a charm on mobile.
https://s3.postimg.org/e30lc3tk3/Screenshot_2016_08_15_13_00_59_com_browser_inter.png

15 sie 2016 10:34 AM "Arek Bulski"  napisał(a):

> But we do not care if the experimental animal dies, that is the point of
> doing the experiment.
>
> I registered at Discuss and kina like it. Then tried to create a new
> thread and my Android keyboard shows over the fields. Discuss As it is now
> doesnt work for mobile.
>
> 14 sie 2016 9:35 PM "Arek Bulski"  napisał(a):
>
>> As i think Donald pointed it out, it doesnt take a laptop to contribute.
>> Did you all notice that Guido replied from a phone?
>>
>> Currently half of the mailing list mail is large auto quotes or subject
>> date info. Lines are never broken the way they should be. Who wants to keep
>> their mail toolchains, keep it. Dont make the rest of us put up with this
>> shit.
>>
>> There is no definition of fails just As i dont have a definition of
>> consensus. People will stick to it or not. No voting, just participation.
>>
>> 14 sie 2016 1:57 PM "Arek Bulski"  napisał(a):
>>
>> ​I throw a proposal on the table: lets create a "python-ideas" repo under
>> "python" account on GitHub and move this and only this thread onto it. If
>> it fails, nothing but this thread is lost (not persisted in the mailing
>> list) which would make no difference anyway. People made many points that
>> are purely abstract. We need some hands on experience to see if it works
>> for us or doesnt.
>>
>> Guido, could you create a repo for us?​
>>
>>
>>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] From mailing list to GitHub issues

2016-08-15 Thread Arek Bulski
One person saying "this thread doesnt belong here" doesnt make it so. I
have met too often in my life with a situation where people were chasing
others away because they simply didnt like the particular topic. This
thread still Has its participants. You are replying on it As well. And
until Discuss is no longer broken, no posting for me there
​. Will file a bug report tho, as suggested.

Sorry for etiquette, will do better.

I meant that people not using mailing lists are a considerable group and I
was asking on their behalf. And I shall add "thou not Ask Guido directly"
to the commandments. The repo should be under official python account, i
dont see much sense in starting a private one. But fine, there you go. We
can migrate it later if you
​want.

​https://github.com/arekbulski/python-ideas/issues/1​
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Steve Dower
I guess I'm not sure what your question is then.

Using text internally is of course the best way to deal with it. But for those 
who insist on using bytes, this change at least makes Windows a feasible target 
without requiring manual encoding/decoding at every boundary.

Top-posted from my Windows Phone

-Original Message-
From: "Stephen J. Turnbull" 
Sent: ‎8/‎14/‎2016 22:06
To: "Steve Dower" 
Cc: "Victor Stinner" ; "python-ideas" 
; "Random832" 
Subject: RE: [Python-ideas] Fix default encodings on Windows

Steve Dower writes:

 > I plan to use only Unicode to interact with the OS and then utf8
 > within Python if the caller wants bytes.

This doesn't answer Victor's questions, or mine.

This proposal requires identifying and transcoding bytes that
represent text in encodings other than UTF-8.

1.  How do you propose to identify "bytes that represent text (and
might be filenames)" if they did *not* originate in a filesystem or
console API?

2.  How do you propose to identify the non-UTF-8 encoding, if you have
forced all variables signifying bytes encodings to UTF-8?

Additional considerations:

As far as I can see, this is just a recipe for a different way to get
mojibake.  *The* way to avoid mojibake is to "let text be text"
*internally*.  Developers who insist on processing text as bytes are
going to get what they deserve *in edge cases*.  But mostly (ie, in
the mono-encoding environments of most users) it just (barely ;-) works.

And there are many use cases where you *can* process bytes that happen
to encode text as "just bytes" (eg, low-level networking code).  These
cases have performance issues if the bytes-text-bytes-text-bytes
double-round-trip implied for *stream content* (vs the OS APIs you're
concerned with, which effectively round-trip text-bytes-text) is
imposed on them.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Random832
On Mon, Aug 15, 2016, at 09:23, Steve Dower wrote:
> I guess I'm not sure what your question is then.
> 
> Using text internally is of course the best way to deal with it. But for
> those who insist on using bytes, this change at least makes Windows a
> feasible target without requiring manual encoding/decoding at every
> boundary.

Why isn't it already? What's "not feasible" about requiring manual
encoding/decoding?

Basically your assumption is that people using Python on windows and
having to deal with files that contain filename data encoded as bytes
are more likely to be dealing with data that is either UTF-8 anyway
(coming from Linux or some other platform) or came from the current
version of Python (which will encode things in UTF-8 under the change)
than they are to deal with data that came from other Windows programs
that encoded things in the codepage used by them and by other Windows
users in the same country / who speak the same language.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Steve Dower
I'm still not sure we're talking about the same thing right now.

For `open(path_as_bytes).read()`, are we talking about the way path_as_bytes is 
passed to the file system? Or the codec used to decide the returned string?

Top-posted from my Windows Phone

-Original Message-
From: "Random832" 
Sent: ‎8/‎15/‎2016 6:41
To: "Steve Dower" ; "Stephen J. Turnbull" 

Cc: "Victor Stinner" ; "python-ideas" 

Subject: Re: [Python-ideas] Fix default encodings on Windows

On Mon, Aug 15, 2016, at 09:23, Steve Dower wrote:
> I guess I'm not sure what your question is then.
> 
> Using text internally is of course the best way to deal with it. But for
> those who insist on using bytes, this change at least makes Windows a
> feasible target without requiring manual encoding/decoding at every
> boundary.

Why isn't it already? What's "not feasible" about requiring manual
encoding/decoding?

Basically your assumption is that people using Python on windows and
having to deal with files that contain filename data encoded as bytes
are more likely to be dealing with data that is either UTF-8 anyway
(coming from Linux or some other platform) or came from the current
version of Python (which will encode things in UTF-8 under the change)
than they are to deal with data that came from other Windows programs
that encoded things in the codepage used by them and by other Windows
users in the same country / who speak the same language.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Random832
On Mon, Aug 15, 2016, at 12:35, Steve Dower wrote:
> I'm still not sure we're talking about the same thing right now.
> 
> For `open(path_as_bytes).read()`, are we talking about the way
> path_as_bytes is passed to the file system? Or the codec used to decide
> the returned string?

We are talking about the way path_as_bytes is passed to the filesystem,
and in particular what encoding path_as_bytes is *actually* in, when it
was obtained from a file or other stream opened in binary mode.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Steve Dower

On 15Aug2016 0954, Random832 wrote:

On Mon, Aug 15, 2016, at 12:35, Steve Dower wrote:

I'm still not sure we're talking about the same thing right now.

For `open(path_as_bytes).read()`, are we talking about the way
path_as_bytes is passed to the file system? Or the codec used to decide
the returned string?


We are talking about the way path_as_bytes is passed to the filesystem,
and in particular what encoding path_as_bytes is *actually* in, when it
was obtained from a file or other stream opened in binary mode.


Okay good, we are talking about the same thing.

Passing path_as_bytes in that location has been deprecated since 3.3, so 
we are well within our rights (and probably overdue) to make it a 
TypeError in 3.6. While it's obviously an invalid assumption, for the 
purposes of changing the language we can assume that no existing code is 
passing bytes into any functions where it has been deprecated.


As far as I'm concerned, there are currently no filesystem APIs on 
Windows that accept paths as bytes.



Given that, I'm proposing adding support for using byte strings encoded 
with UTF-8 in file system functions on Windows. This allows Python users 
to omit switching code like:


if os.name == 'nt':
f = os.stat(os.listdir('.')[-1])
else:
f = os.stat(os.listdir(b'.')[-1])

Or simply using the bytes variant unconditionally because they heard it 
was faster (sacrificing cross-platform correctness, since it may not 
correctly round-trip on Windows).


My proposal is to remove all use of the *A APIs and only use the *W 
APIs. That completely removes the (already deprecated) use of bytes as 
paths. I then propose to change the (unused on Windows) 
sys.getfsdefaultencoding() to 'utf-8' and handle bytes being passed into 
filesystem functions by transcoding into UTF-16 and calling the *W APIs.


This completely removes the active codepage from the chain, allows paths 
returned from the filesystem to correctly roundtrip via bytes in Python, 
and allows those bytes paths to be manipulated at '\' characters. 
(Frankly I don't mind what encoding we use, and I'd be quite happy to 
force bytes paths to be UTF-16-LE encoded, which would also round-trip 
invalid surrogate pairs. But that would prevent basic manipulation which 
seems to be a higher priority.)


This does not allow you to take bytes from an arbitrary source and 
assume that they are correctly encoded for the file system. Python 3.3, 
3.4 and 3.5 have been warning that doing that is deprecated and the path 
needs to be decoded to a known encoding first. At this stage, it's time 
for us to either make byte paths an error, or to specify a suitable 
encoding that can correctly round-trip paths.



If this does not answer the question, I'm going to need the question to 
be explained more clearly for me.


Cheers,
Steve

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Steve Dower

On 15Aug2016 1126, Steve Dower wrote:

My proposal is to remove all use of the *A APIs and only use the *W
APIs. That completely removes the (already deprecated) use of bytes as
paths. I then propose to change the (unused on Windows)
sys.getfsdefaultencoding() to 'utf-8' and handle bytes being passed into
filesystem functions by transcoding into UTF-16 and calling the *W APIs.


Of course, I meant sys.getfilesystemencoding() here. The C functions 
have "FSDefault" in many of the names, which is why I guessed the wrong 
Python variant.


Cheers,
Steve

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Digests (was Re: From mailing list to GitHub issues)

2016-08-15 Thread Barry Warsaw
On Aug 14, 2016, at 02:01 PM, Chris Angelico wrote:

>The biggest problem I'm seeing is with digests. Can that feature be
>flagged off as "DO NOT USE THIS UNLESS YOU KNOW WHAT YOU ARE ASKING
>FOR"? So many people seem to select digest mode, then get extremely
>confused by it.

Yes, we can turn off digests for python-ideas, or any Mailman mailing list.

I was tempted to JFDI, but it would mean that ~25% of list members would no
longer get messages.  That's because 254 out of 979 members are currently
receiving digests.

Let's give people a grace period, say of one week.  You have until Monday
22-Aug-2016 to switch to non-digest delivery or read the mailing list through
some other outlet (e.g. Gmane's NNTP interface) if you still want to get
messages for python-ideas.

Cheers,
-Barry

P.S. I am refraining from responding to other topics in this thread, since I
think the proper place to do that is overload-sig.


pgpLNxFDJTJnc.pgp
Description: OpenPGP digital signature
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread eryk sun
On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower  wrote:
>
> (Frankly I don't mind what encoding we use, and I'd be quite happy to force 
> bytes
> paths to be UTF-16-LE encoded, which would also round-trip invalid surrogate
> pairs. But that would prevent basic manipulation which seems to be a higher
> priority.)

The CRT manually decodes and encodes using the private functions
__acrt_copy_path_to_wide_string and __acrt_copy_to_char. These use
either the ANSI or OEM codepage, depending on the value returned by
WinAPI AreFileApisANSI. CPython could follow suit. Doing its own
encoding and decoding would enable using filesystem functions that
will never get an [A]NSI version (e.g. GetFileInformationByHandleEx),
while still retaining backward compatibility.

Filesystem encoding could use WC_NO_BEST_FIT_CHARS and raise a warning
when lpUsedDefaultChar is true. Filesystem decoding could use
MB_ERR_INVALID_CHARS and raise a warning and retry without this flag
for ERROR_NO_UNICODE_TRANSLATION (e.g. an invalid DBCS sequence). This
could be implemented with a new "warning" handler for
PyUnicode_EncodeCodePage and PyUnicode_DecodeCodePageStateful. A new
'fsmbcs' encoding could be added that checks AreFileApisANSI to choose
betwen CP_ACP and CP_OEMCP.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Chris Barker - NOAA Federal
> Given that, I'm proposing adding support for using byte strings encoded with 
> UTF-8 in file system functions on Windows. This allows Python users to omit 
> switching code like:
>
> if os.name == 'nt':
>f = os.stat(os.listdir('.')[-1])
> else:
>f = os.stat(os.listdir(b'.')[-1])

REALLY? Do we really want to encourage using bytes as paths? IIUC,
anyone that wants to platform-independentify that code just needs to
use proper strings (or pat glib) for paths everywhere, yes?

I understand that pre-surrogate-escape, there was a need for bytes
paths, but those days are gone, yes?

So why, at this late date, kludge what should be a deprecated pattern
into the Windows build???

-CHB

> My proposal is to remove all use of the *A APIs and only use the *W APIs. 
> That completely removes the (already deprecated) use of bytes as paths.

Yes, this is good.

> I then propose to change the (unused on Windows) sys.getfsdefaultencoding() 
> to 'utf-8' and handle bytes being passed into filesystem functions by 
> transcoding into UTF-16 and calling the *W APIs.

I'm really not sure utf-8 is magic enough to do this. Where do you
imagine that utf-8 is coming from as bytes???

AIUI, while utf-8 is almost universal in *nix for file system names,
folks do not want to count on it -- hence the use of bytes. And it is
far less prevalent in the Windows world...

> , allows paths returned from the filesystem to correctly roundtrip via bytes 
> in Python,

That you could do with native bytes (UTF-16, yes?)

> . But that would prevent basic manipulation which seems to be a higher 
> priority.)

Still think Unicode is the answer to that...

> At this stage, it's time for us to either make byte paths an error,

+1.  :-)

CHB
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Steve Dower

On 15Aug2016 1819, eryk sun wrote:

On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower  wrote:


(Frankly I don't mind what encoding we use, and I'd be quite happy to force 
bytes
paths to be UTF-16-LE encoded, which would also round-trip invalid surrogate
pairs. But that would prevent basic manipulation which seems to be a higher
priority.)


The CRT manually decodes and encodes using the private functions
__acrt_copy_path_to_wide_string and __acrt_copy_to_char. These use
either the ANSI or OEM codepage, depending on the value returned by
WinAPI AreFileApisANSI. CPython could follow suit. Doing its own
encoding and decoding would enable using filesystem functions that
will never get an [A]NSI version (e.g. GetFileInformationByHandleEx),
while still retaining backward compatibility.

Filesystem encoding could use WC_NO_BEST_FIT_CHARS and raise a warning
when lpUsedDefaultChar is true. Filesystem decoding could use
MB_ERR_INVALID_CHARS and raise a warning and retry without this flag
for ERROR_NO_UNICODE_TRANSLATION (e.g. an invalid DBCS sequence). This
could be implemented with a new "warning" handler for
PyUnicode_EncodeCodePage and PyUnicode_DecodeCodePageStateful. A new
'fsmbcs' encoding could be added that checks AreFileApisANSI to choose
betwen CP_ACP and CP_OEMCP.


None of that makes it less complicated or more reliable. Warnings based 
on values are bad (they should be based on types) and using the *W APIs 
exclusively is the right way to go. The question then is whether we 
allow file system functions to return bytes, and if so, which encoding 
to use. This then directly informs what the functions accept, for the 
purposes of round-tripping.


*Any* encoding that may silently lose data is a problem, which basically 
leaves utf-16 as the only option. However, as that causes other 
problems, maybe we can accept the tradeoff of returning utf-8 and 
failing when a path contains invalid surrogate pairs (which is extremely 
rare by comparison to characters outside of CP_ACP)?


If utf-8 is unacceptable, we're back to the current situation and should 
be removing the support for bytes that was deprecated three versions ago.


Cheers,
Steve

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread Nick Coghlan
On 16 August 2016 at 11:34, Chris Barker - NOAA Federal
 wrote:
>> Given that, I'm proposing adding support for using byte strings encoded with 
>> UTF-8 in file system functions on Windows. This allows Python users to omit 
>> switching code like:
>>
>> if os.name == 'nt':
>>f = os.stat(os.listdir('.')[-1])
>> else:
>>f = os.stat(os.listdir(b'.')[-1])
>
> REALLY? Do we really want to encourage using bytes as paths? IIUC,
> anyone that wants to platform-independentify that code just needs to
> use proper strings (or pat glib) for paths everywhere, yes?

The problem is that bytes-as-paths actually *does* work for Mac OS X
and systemd based Linux distros properly configured to use UTF-8 for
OS interactions. This means that a lot of backend network service code
makes that assumption, especially when it was originally written for
Python 2, and rather than making it work properly on Windows, folks
just drop Windows support as part of migrating to Python 3.

At an ecosystem level, that means we're faced with a choice between
implicitly encouraging folks to make their code *nix only, and finding
a way to provide a more *nix like experience when running on Windows
(where UTF-8 encoded binary data just works, and either other
encodings lead to mojibake or else you use chardet to figure things
out).

Steve is suggesting that the latter option is preferable, a view I
agree with since it lowers barriers to entry for Windows based
developers to contribute to primarily *nix focused projects.

> I understand that pre-surrogate-escape, there was a need for bytes
> paths, but those days are gone, yes?

No, UTF-8 encoded bytes are still the native language of network
service development: http://utf8everywhere.org/

It also helps with cases where folks are switching back and forth
between Python and other environments like JavaScript and Go where the
UTF-8 assumption is more prevalent.

> So why, at this late date, kludge what should be a deprecated pattern
> into the Windows build???

Promoting cross-platform consistency often leads to enabling patterns
that are considered a bad idea from a native platform perspective, and
this strikes me as an example of that (just as the binary/text
separation itself is a case where Python 3 diverged from the POSIX
text model to improve consistency across *nix, Windows, JVM and CLR
environments).

Cheers,
Nick.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fix default encodings on Windows

2016-08-15 Thread eryk sun
>> On Mon, Aug 15, 2016 at 6:26 PM, Steve Dower 
>> wrote:
>
> and using the *W APIs exclusively is the right way to go.

My proposal was to use the wide-character APIs, but transcoding CP_ACP
without best-fit characters and raising a warning whenever the default
character is used (e.g. substituting Katakana middle dot when creating
a file using a bytes path that has an invalid sequence in CP932). This
proposal was in response to the case made by Stephen Turnbull. If
using UTF-8 is getting such heavy pushback, I thought half a solution
was better than nothing, and it also sets up the infrastructure to
easily switch to UTF-8 if that idea eventually gains acceptance. It
could raise exceptions instead of warnings if that's preferred, since
bytes paths on Windows are already deprecated.

> *Any* encoding that may silently lose data is a problem, which basically
> leaves utf-16 as the only option. However, as that causes other problems,
> maybe we can accept the tradeoff of returning utf-8 and failing when a
> path contains invalid surrogate pairs

Are there any common sources of illegal UTF-16 surrogates in Windows
filenames? I see that WTF-8 (Wobbly) was developed to handle this
problem. A WTF-8 path would roundtrip back to the filesystem, but it
should only be used internally in a program.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/