Re: [Python-Dev] Bytes path support

Cameron Simpson Wed, 20 Aug 2014 21:54:07 -0700

On 20Aug2014 16:04, Chris Barker - NOAA Federal <[email protected]> wrote:

 but disallowing them in higher level

> explicitly cross platform abstractions like pathlib.

I think the trick here is that posix-using folks claim that filenames are
just bytes, and indeed they can be passed around with a char*, so they seem
to be.

but you can't possible do anything other than pass them around if you
REALLY think they are just bytes.

So really, people treat them as
"bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-(and
maybe a couple others)-is-ascii-compatible"

As someone who fought long and hard in the surrogate-escape listdir() wars, andwas won over once the scheme was thoroughly explained to me, I take issue withthese assertions: they are bogus or misleading.

Firstly, POSIX filenames _are_ just byte strings. The only forbidden characteris the NUL byte, which terminates a C string, and the only special character isthe slash, which separates pathanme components.

Second, a bare low level program cannot do _much_ more than pass them around.It certainly can do things like compute their basename, or other path relatedoperations.


The "bytes in some arbitrary encoding where at least the slash character (and

maybe a couple others) is ascii compatible" notion is completely bogus. There'sonly one special byte, the slash (code 47). There's no OS-level need that it oranything else be ASCII compatible. I think characterisations such as the onequoted are activately misleading.

The way you get UTF-8 (or some other encoding, fortunately getting less andless common) is by convention: you decide in your environment to work in someencoding (say utf-8) via the locale variables, and all your user-facing textgets used in UTF-8 encoding form when turned into bytes for the filename callsbecause your text<->bytes methods say to do so.

I think we'd all agree it is nice to have a system where filenames are allUnicode, but since POSIX/UNIX predates it by decades it is a bit late to ignorethe reality for such systems. I certainly think the Window-side Babel of codepages and multiple code systems is far far worse. (Disclaimer: not a Windowsprogrammer, just based on hearing them complain.)

I'm +1000 on systems where the filesystem enforces Unicode (eg Plan 9 or MacOSX, which forces a specific UTF-8 encoding in the bytes POSIX APIs - theunderlying filesystems reject invalid byte sequences).


[...]

Antoine Pitrou wrote:

To elaborate specifically about pathlib, it doesn't handle bytes paths
but allows you to generate them if desired:
https://docs.python.org/3/library/pathlib.html#operators


but that uses
os.fsencode:  Encode filename to the filesystem encoding

As I understand it, the whole problem with some posix systems is that there
is NO filesystem encoding -- i.e. you can't know for sure what encoding a
filename is in. So you need to be able to pass the bytes through as they
are.


Yes and no. I made that argument too.

There's no _external_ "filesystem encoding" in the sense of something recordedin the filesystem that anyone can inspect. But there is the expressed localesettings, available at runtime to any program that cares to pay attention. Itis a workable situation.

Oh, and I reject Nick's characterisation of POSIX as "broken". It's perfectlyinternally consistent. It just doesn't match what he wants. (Indeed, what Iwant, and I'm a long time UNIX fanboy.)


Cheers,
Cameron Simpson <[email protected]>

God is real, unless declared integer.   - Johan Montald, [email protected]
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bytes path support

Reply via email to