Andrea Rossato wrote:
Hi,

supposed that, in a Linux system, in an utf-8 locale, you create a file
with non ascii characters. For instance:
touch abèèè

Now, I would expect that the output of a shell command such as "ls ab*"
would be a string/list of 5 chars. Instead I find it to be a list of 8
chars...;-)

The file name may have five *characters*, but if it's encoded as UTF-8, then it has eight *bytes*.

It appears that in spite of the locale definition, hGetContents is treating each byte as a separate character without translating the multi-byte sequences *from* UTF-8, and then putStrLn sends each of those bytes to standard output without translating the non-ASCII characters *to* UTF-8. So the second line of your program's output is correct...but only by accident.

Futzing around a little bit in ghci, I see that I can define a string "\1488", but if I send that string to putStrLn, I get nothing, when I should get א (the Hebrew letter aleph).

I � Unicode.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to