On Sat, Mar 24, 2012 at 5:54 PM, Gabriel Dos Reis <g...@integrable-solutions.net> wrote: > I think there is a confusion here. A Unicode character is an abstract > entity. For it to exist in some concrete form in a program, you need > an encoding. The fact that char16_t is 16-bit wide is irrelevant to > whether it can be used in a representation of a Unicode text, just like > uint8_t (e.g. 'unsigned char') can be used to encode Unicode string > despite it being only 8-bit wide. You do not need to make the > character type exactly equal to the type of the individual element > in the text representation.
Well, if you have a >21-bit type you can declare its value to be a Unicode code point (which are numbered.) Using a char* that you claim contain utf-8 encoded data is bad for safety, as there is no guarantee that that's indeed the case. > Note also that an encoding itself (whether UTF-8, UTF-16, etc.) is > insufficient > as far as text processing goes; you also need a localization at the > minimum. It is the > combination of the two that gives some meaning to text representation > and operations. text does that via ICU. Some operations would be possible without using the locale, if it wasn't for those Turkish i:s. :/ -- Johan _______________________________________________ Haskell-prime mailing list Haskell-prime@haskell.org http://www.haskell.org/mailman/listinfo/haskell-prime