For debugging the error, we'll need to know what your locale's encoding is. You can see this by echoing the $LANG environment variable. For example:
$ echo $LANG en_US.UTF-8 means my encoding is UTF-8. Haskell doesn't currently have any decoding libraries with good error handling (that I know of), so you might need to use an external library or program. My preference is Python, since it has very descriptive errors. I'll load a file, attempt to decode it with my locale encoding, and then see what errors pop up: $ python >>> content = open("testfile", "rb").read() >>> text = content.decode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf8' codec can't decode byte 0x9d in position 1: unexpected code byte The exact error will help us generate a test file to reproduce the problem. If you don't see any error, then the bug will be more difficult to track down. Compile your program into a binary (ghc --make ReadFiles.hs) and then run it with gdb, setting a breakpoint in the malloc_error_break procedure: $ ghc --make ReadFiles.hs $ gdb ./ReadFiles (gdb) break malloc_error_break (gdb) run testfile ... program runs ... BREAKPOINT (gdb) bt <stack trace here, copy and paste it into an email for us> The stack trace might help narrow down where the memory corruption is occuring. ----------------- If you don't care much about debugging, and just want to read the file: First step is to figure out what encoding the file's in. Data.Text.IO is intended for decoding files in the system's local encoding (typically UTF-8), not general-purpose "this file has letters in it" IO. Web browsers are pretty good at auto-detecting encodings. For example, if you load the file into Firefox and then look at the (View -> Character Encoding) menu, which option is selected? Next, you'll need to read the file in as bytes and then decode it. Use Data.ByteString.hGetContents to read it in. If it's encoded in one of the common UTF encodings (UTF-8, UTF-16, UTF-32), then you can use the functions in Data.Text.Encoding to convert from the file's bytes to text. If it's an unusual encoding (windows-1250, shift_jis, gbk, etc) then you'll need a decoding library like "text-icu". Create the proper decoder, feed in the bytes, receive text. If all else fails, you can use this function to decode the file as iso8859-1, but it'll be too slow to use on any file larger than a few dozen megabytes. Furthermore, it will likely cause any special characters in the file to become corrupted. import Data.ByteString.Char8 as B8 import Data.Text as T iso8859_1 :: ByteString -> Text iso8859_1 = T.pack . B8.unpack If any corruption occurs, please reply with *what* characters were corrupted; this might help us reproduce the error. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe