On approximately 4/29/2009 1:28 PM, came the following characters from the keyboard of Martin v. Löwis:
C. File on disk with the invalid surrogate code, accessed via the
str interface, no decoding happens, matches in memory the file on disk
with the byte that translates to the same surrogate, accessed via the
bytes interface.  Ambiguity.
What does that mean? What specific interface are you referring to to
obtain file names?
os.listdir("")

os.listdir(b"")

So I guess I'd better suggest that a specific, equivalent directory name
be passed in either bytes or str form.

[Leaving the issue of the empty string apparently having different
meanings aside ...]

Ok. Now I understand the example. So you do

os.listdir("c:/tmp")
os.listdir(b"c:/tmp")

and you have a file in c:/tmp that is named "abc\uDC10".

So what you are saying here is that Python doesn't use the "A" forms of
the Windows APIs for filenames, but only the "W" forms, and uses lossy
decoding (from MS) to the current code page (which can never be UTF-8 on
Windows).

Actually, it does use the A form, in the second listdir example. This,
in turn (inside Windows), uses the lossy CP_ACP encoding. You get back
a byte string; the listdirs should give

["abc\uDC10"]
[b"abc?"]

(not quite sure about the second - I only guess that CP_ACP will replace
the half surrogate with a question mark).

So where is the ambiguity here?

None. But not everyone can read all the Python source code to try to understand it; they expect the documentation to help them avoid that. Because the documentation is lacking in this area, it makes your concisely stated PEP rather hard to understand.

Thanks for clarifying the Windows behavior, here. A little more clarification in the PEP could have avoided lots of discussion. It would seem that a PEP, proposed to modify a poorly documented (and therefore likely poorly understood) area, should be educational about the status quo, as well as presenting the suggested change. Or is it the Python philosophy that the PEPs should be as incomprehensible as possible, to generate large discussions?


--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to