Wolfgang Jeltsch wrote: > > Right now, values of type Char are, in reality, ISO Latin-1 codepoints > > padded out to 4 bytes per char. > > No, because this would mean that you wouldn't have chars with codes greater > than 255 which is not the case with GHC.
However, the behaviour of codes greater than 255 is undefined. Well, effectively undefined; I can't imagine anyone wanting to explicitly define the current behaviour, particularly the fact that: putChar c and: putChar (chr (ord c + n * 256)) are equivalent for all integral n. > But, of course, I agree with you that currently the main part of Unicode > support is missing. I think that it goes much deeper than that. Fixing the Char functions (to{Upper,Lower}, is*) is the easy part. The hard part is dealing with the legacy of the I/O "fiction", i.e. the notion that the gap (or, rather, gulf) between characters and octets can just be waved away, or at least made simple enough that it can be effectively hidden. For practical purposes, you need binary I/O, and you need I/O of text in arbitrary encodings. The correct encoding may be different for different parts of a program, and for different parts of data obtained from a single source. The correct encoding may not be known at the point that I/O occurs (at least, not for input), so you need to be able to read octets then translate them to Chars once you actually know the encoding. You also need to be able to handle data where the encoding is unknown, or which isn't correctly encoded. This isn't something which can be hidden; at least, not without reducing Haskell to a toy language (e.g. only handles UTF-8, or only handles the encoding specified by the locale etc). -- Glynn Clements <[EMAIL PROTECTED]> _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell