On Mon, Sep 29, 2008 at 11:22 PM, Georg Brandl <[EMAIL PROTECTED]> wrote: > No, that was not what I meant (although it is another possibility). As I > wrote, > Martin's proposal that I support here is using the modified UTF-8 codec that > successfully roundtrips otherwise invalid UTF-8 data.
I thought that the "successful rountripping" pretty much stopped as soon as the unicode data is exported to somewhere else -- doesn't it contain invalid surrogate sequences? In general, I'm very reluctant to use utf-8b given that it doesn't seem to be well documented as a standard anywhere. Providing some minimal APIs that can process raw-bytes filenames still makes more sense -- it is mostly analogous of our treatment of text files, where the underlying binary data is also accessible. > You seem to forget that (disregarding OSX here, since it already enforces > UTF-8) the majority of file names on Posix systems will be encoded correctly. Apparently under certain circumstances (external FS mounted) OSX can also have non-UTF-8 filenames. [...] > With the filenames decoded by UTF-8, your files named têste, ô, dossié will > be displayed and handled correctly. The others are *invalid* in the filesystem > encoding UTF-8 and therefore would be represented by something like > > u'dir\uXXffname' where XX is some private use Unicode namespace. It won't look > pretty when printed, but then, what do other applications do? They e.g. > display > a question mark as you show above, which is not better in terms of > readability. > > But it will work when given to a filename-handling function. Valid filenames > can be compared to Unicode strings. > > A real-world example: OpenOffice can't open files with invalid bytes in their > name. They are displayed in the "Open file" dialog, but trying to open fails. > This regularly drives me crazy. Let's not make Python not work this way too, > or, even worse, not even display those filenames. How can it *regularly* drive you crazy when "the majority of fie names [...] encoded correctly" (as you assert above)? -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com