On 21Aug2014 09:20, Antoine Pitrou <anto...@python.org> wrote:
Le 21/08/2014 00:52, Cameron Simpson a écrit :
The "bytes in some arbitrary encoding where at least the slash character
(and
maybe a couple others) is ascii compatible" notion is completely bogus.
There's only one special byte, the slash (code 47). There's no OS-level
need that it or anything else be ASCII compatible.

Of course there is. Try to split an UTF-16-encoded file path on the byte 47 and you'll get a lot of garbage. So, yes, POSIX implicitly mandates an ASCII-compatible encoding for file paths.

[Rolls eyes.] Looking at the UTF-16 encoding, it looks like it also embeds NUL bytes for various codes below 32768. How are they handled? As remarked, codes 0 (NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings.

If you imagine you can embed bare UTF-16 freely even excluding code 47, I think one of us is missing something.

That's not "ASCII compatible". That's "not all byte codes can be freely used without thought", and any multibyte coding will have to consider such things when embedding itself in another coding scheme.

Cheers,
Cameron Simpson <c...@zip.com.au>

Microsoft:  Committed to putting the "backward" into "backward compatibility."
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to