Glynn wrote (about my binary library, snipped): > This is similar to UTF-8; however, UTF-8 is a standard format which > can be read and written by a variety of other programs. > > If we want a mechanism for encoding arbitrary Haskell strings as octet > lists, and we have a free choice as to the encoding, UTF-8 is > definitely the way to go.
No I don't think so. UTF8 is a good choice if you want a way of storing Unicode files on an 8-bit file-system, but it is not as efficient an encoding for characters in general. Thus with UTF8 you can represent character codes less than 2^11 in two bytes; with my system you can represent codes less than 2^14. In 3 bytes, UTF8 can represent codes less 2^16; I can do anything less than 2^21. This is not an error in UTF8's design; I think it's because UTF8 includes extra bits which make it much easier to use UTF8-encoded files with tools, such as "grep", which were only written with 8-bit characters in mind. That is not a design aim for us.
Furthermore UTF8's encoding is in fact rather more complicated to program than mine, and the implementor will need an encoding like mine in any case to encode arbitrary-size integers (something UTF8 encoding can't do by the way).
If we think for a moment what a Haskell system using UTF8 would be like, I think it's easiest to imagine that in future there will be a way of specifying that a file contains character data stored in UTF8 format, either as a flag stored in the filing system, or as an option given to functions like Haskell's openFile. Or perhaps openFile would assume UTF8, and there will be an openBinaryFile which does not. However it's done, this is entirely orthogonal to the question of how to encode character data as binary *within* Haskell.
> However, that isn't the issue which was being discussed in this > thread. The issue is that we need a standard mechanism for reading and > writing *octets*, so that Haskell programs can communicate with the > rest of the world.
Yes we do. At the moment my binary library does of course have to use standard character input, plus a couple of internal GHC functions (for writing blocks of data), and I hope that there will someday be standard functions I can use instead.
_______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell