On Wed, 2010-12-01 at 02:14 +0000, MRAB wrote: > If the filenames are to be shown to a user then there needs to be a > mapping between bytes and glyphs. That's an encoding. If different > users use different encodings then exchange of textual data becomes > difficult.
That's presentation, that's separate. Indeed, I have my user encoding set to UTF-8, and if there is a filename that's not valid utf-8 then my GUI (GNOME will show "(invalid encoding)" and even allow me to rename it and my shell (bash) will show '?' next to the invalid "characters" (and make it a little more challenging to rename ;)). And I can freely copy these "invalid" files across different (Unix) systems, because the OS doesn't care about encoding. But that's completely different from the actual name of the file. Unix doesn't care about presentation in filenames. It just cares about the data. There are not "glyphs" in Unix, only in the UI that runs on top of it. Or to put it another way, Unix's filename encoding is RAW-DATA. It's not "textual" data. The fact that most filenames contain mainly human-readable text is a convenient convention, but not required or enforced by the OS. > That's where encodings which can be used globally come in. > By the time Python 4 is released I'd be surprised if Unix hadn't > standardised on a single encoding like UTF-8. I have serious doubts about that. At least in the Linux world the kernel wants to stay out of encoding debates (except where it has to like Window filesystems). But the point is that: The world does not revolve around Python. Unix filenames have been encoding-agnostic long before Python was around. If Python3 does not support this then it's a regression on Python's part. -- http://mail.python.org/mailman/listinfo/python-list