On Mon, Mar 26, 2012 at 7:29 AM, Christian Siefkes <christ...@siefkes.net> wrote: > On 03/26/2012 01:26 PM, Gabriel Dos Reis wrote: >> It is not the precision of Char or char that is the issue here. >> It has been clarified at several points that Char is not a Unicode character, >> but a Unicode code point. Not every Unicode code point represents a >> Unicode code character, and not every sequence of Unicode code points >> represents a character or a sequence of Unicode character. > > What do you mean? Every Unicode character corresponds to one code point,
Yes, but this correspondence is not a bijection -- a great source of confusion that permeates lot of discussions about Unicode characters and texts, including this one (and a previous regarding the Haskell Report.) Very much heart breaking :-( > and > every code point in the range 0 to 0x10FFFF (excluding the range 0xD800 to > 0xDFFF which is reserved for surrogate pairs in UTF-16, and a handful of > "noncharacters", see > http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Special_code_points > ) corresponds to one character. > > Maybe your criticism is that Char does not explicitly prevent these special > code points from being assigned? While true, that seems a relatively minor > matter. Moreover, a future revision of the Haskell standard could easily > declare that a assigning a "forbidden" character results in an error/bottom > if that is so desired. It is not just a matter of clarification that certain things are forbidden. I believe it would be a great mistake to qualify it as minor. How do you handle normalization if you expose the texts as sequence of unrelated code points that can be freely taken apart and combined? - Gaby _______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime