Simon Marlow <marlo...@gmail.com> writes: >> Why only on Windows?
> Just because it's a lot easier on Windows - all the OS APIs take > Unicode file paths, so it's obvious what to do. In contrast on Unix I > don't have a clear idea of how to proceed. > On Unix, all file APIs take [Word8] rather than [Char]. By > convention, the [Word8] is usually assumed to be a string in the > locale encoding, but that's only a user-space convention. If we want to incorporate a translation layer, I think it's fair to only support UTF-8 (ignoring locales), but provide a workaround for invalid characters. >From http://en.wikipedia.org/wiki/UTF-8: | Therefore many modern UTF-8 converters translate errors to | something "safe". Only one byte is changed into the error | replacement and parsing starts again at the next byte, otherwise | concatenating strings could change good characters into | errors. Popular replacements for each byte are: | | * nothing (the bytes vanish) | * '?' or '�' | * The replacement character (U+FFFD) | * The byte from ISO-8859-1 or CP1252 | * An invalid Unicode code point, usually U+DCxx where xx is the byte's value How about using the last one? This would allow 'readFile' to work on FilePaths provided by 'getDirectoryContents', while allowing for real Unicode string literals. -k -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe