Andrea Rossato wrote:
Hi,
supposed that, in a Linux system, in an utf-8 locale, you create a file
with non ascii characters. For instance:
touch abèèè
Now, I would expect that the output of a shell command such as
"ls ab*"
would be a string/list of 5 chars. Instead I find it to be a list of 8
chars...;-)
The file name may have five *characters*, but if it's encoded as UTF-8,
then it has eight *bytes*.
It appears that in spite of the locale definition, hGetContents is
treating each byte as a separate character without translating the
multi-byte sequences *from* UTF-8, and then putStrLn sends each of those
bytes to standard output without translating the non-ASCII characters
*to* UTF-8. So the second line of your program's output is
correct...but only by accident.
Futzing around a little bit in ghci, I see that I can define a string
"\1488", but if I send that string to putStrLn, I get nothing, when I
should get א (the Hebrew letter aleph).
I � Unicode.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe