At 2001-09-30 07:29, Marcin 'Qrczak' Kowalczyk wrote: >Some time ago the Unicode Consortium slowly began switching to the >point of view that abstract characters are denoted by numbers in the >range U+0000..10FFFF.
It's worth mentioning that these are 'codepoints', not 'characters'. Sometimes a character will be made up of two codepoints, for instance an 'a' with a dot above is a single character that can be made from the codepoints LATIN SMALL LETTER A and COMBINING DOT ABOVE. Perhaps this makes the UTF-16 'surrogate' problem a bit less serious, since there never was a one-to-one correspondence between any kind of n-bit unit and displayed characters. -- Ashley Yakeley, Seattle WA _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell