Re: [Haskell-cafe] Re: Unicode workaround for getDirectoryContents under Windows?

Ketil Malde Wed, 17 Jun 2009 06:36:47 -0700

Simon Marlow <marlo...@gmail.com> writes:

>> Why only on Windows?


> Just because it's a lot easier on Windows - all the OS APIs take
> Unicode file paths, so it's obvious what to do.  In contrast on Unix I
> don't have a clear idea of how to proceed.

> On Unix, all file APIs take [Word8] rather than [Char].  By
> convention, the [Word8] is usually assumed to be a string in the
> locale encoding, but that's only a user-space convention.

If we want to incorporate a translation layer, I think it's fair to
only support UTF-8 (ignoring locales), but provide a workaround for
invalid characters. 

>From http://en.wikipedia.org/wiki/UTF-8:

|  Therefore many modern UTF-8 converters translate errors to
|  something "safe". Only one byte is changed into the error
|  replacement and parsing starts again at the next byte, otherwise
|  concatenating strings could change good characters into
|  errors. Popular replacements for each byte are: 
|
|    * nothing (the bytes vanish)
|    * '?' or '�'
|    * The replacement character (U+FFFD)
|    * The byte from ISO-8859-1 or CP1252
|    * An invalid Unicode code point, usually U+DCxx where xx is the byte's 
value

How about using the last one? This would allow 'readFile' to work on
FilePaths provided by 'getDirectoryContents', while allowing for
real Unicode string literals.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Unicode workaround for getDirectoryContents under Windows?

Reply via email to