George Russell wrote: > > OTOH, existing implementations (at least GHC and Hugs) currently read > > and write "8-bit binary", i.e. characters 0-255 get read and written > > "as-is" and anything else breaks, and changing that would probably > > break a fair amount of existing code. > > The binary library I posted to the libraries list: > > http://haskell.org/pipermail/libraries/2003-June/001227.html > > which is for GHC, does this properly. All characters are encoded > using a standard encoding for unsigned integers, which uses the > bottom 7 bits of each character as data, and the top bit to signal > that the encoding is not yet complete. Characters 0-127 (which > include the standard ASCII ones) get encoded as themselves.
This is similar to UTF-8; however, UTF-8 is a standard format which can be read and written by a variety of other programs. If we want a mechanism for encoding arbitrary Haskell strings as octet lists, and we have a free choice as to the encoding, UTF-8 is definitely the way to go. However, that isn't the issue which was being discussed in this thread. The issue is that we need a standard mechanism for reading and writing *octets*, so that Haskell programs can communicate with the rest of the world. As things stand, if you want to read/write files which were written by another program, you have to rely either upon extensions, or upon behaviour which isn't mandated by the report. -- Glynn Clements <[EMAIL PROTECTED]> _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell