Nick Coghlan wrote: > Toshio Kuratomi wrote: >>> >> Nonsense. A program can do tons of things with a non-decodable >> filename. Where it's limited is non-decodable filedata. > > You can't display a non-decodable filename to the user, hence the user > will have no idea what they're working on. Non-filesystem related apps > have no business trying to deal with insane filenames. > This is where we disagree. There are many ways to display the non-decodable filename to the user because the user is not a machine. The computer must know the unique sequence of bytes in order to access a file. The user, OTOH, usually only needs to know that the file exists. In most GUI-based end-user oriented desktop apps, it's enough to do str(filename, errors='replace'). For instance, the GNOME file manager displays: "? (Invalid encoding)" and Konqueror, the KDE file manager just displays: "?"
The file can still be displayed this way, accessed via the raw bytes that the program keeps internally, and operated upon by applications. For applications in which the user needs more information to differentiate the files the program has the option to display the raw byte sequences as if they were the filename. The *NIX shell and command line tools have this ability. $ LANG=en_US.utf8 ls -b á í $ LANG=C ls -b . .. \303\241 \303\255 $ mv $'\303\241' $'\303\263' $ LANG=C ls -b \303\255 \303\263 $ LANG=en_US.utf8 ls -b í ó > Linux is moving towards a standard of UTF-8 for filenames, and once we > get to the point where the idea of encoding filenames and environment > variables any other way is seen as crazy, then the Python 3 approach > will work seamlessly. > <nod> With the caveat that I haven't seen movement by Linux and other Unix variants to enforce UTF-8. What I have seen are statements by kernel programmers that having the filesystem use bytes and not know about encoding is the correct thing to do. This means that utf-8 will be a convention rather than a necessity for a very long time and consequently programs will need to worry about the problems of mixed encoding systems for an equally long time. (Remember, encoding is something that can be changed per user and per file. So on a multiuser OS, mixed encodings can be out of the control of the system administrator for perfectly valid reasons.) > In the meantime, raw bytes APIs will provide an alternative for those > that disagree with that philosophy. > Oh I agree with the UTF-8 everywhere philosophy. I just know that there's tons of real-world systems out there that don't conform to my expectations for sanity and my code has to account for those :-) -Toshio
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com