At 2002-08-08 23:10, Ken Shan wrote: > 1. Octets. > 2. C "char". > 3. Unicode code points. > 4. Unicode code values, useful only for UTF-16, which is seldom used. > 5. "What handles handle". ... >I suggest that the following Haskell types be used for the five items >above: > > 1. Word8 > 2. CChar > 3. CodePoint > 4. Word16 > 5. Char
I disagree, they should be: 1. Word8 2. CChar 3. Char 4. Word16 5. Word8 >Let me elaborate. Files are funny because the information units they >contain can be treated as both numbers and characters. No, a file is always a list of octets. Nothing else (ignoring metadata, forks etc.). Of course, you can interpret those octets as text using "ASCII" or "UTF-8" or whatever, equally, you can interpret those octets as an image using "PNG", "JPEG" etc. But those are secondary transformations, separate from the business of reading from and writing to a file. We should have Word8-based interfaces to file and network handles. Whether or not the old Char-based ones should be deprecated, or whatever, I don't know. As for Unicode codepoints, if there's to be an internationalisation effort for Haskell, the type of character literals, Char, should be fixed as the type for Unicode codepoints, much as it already is in GHC. -- Ashley Yakeley, Seattle WA _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
