"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> writes: > Fri, 5 Oct 2001 02:29:51 -0700 (PDT), Krasimir Angelov <[EMAIL PROTECTED]> pisze: > > > Why Char is 32 bit. UniCode characters is 16 bit.
> No, Unicode characters have 21 bits (range U+0000..10FFFF). We've been through all this, of course, but here's a quote: > "Unicode" originally implied that the encoding was UCS-2 and it > initially didn't make any provisions for characters outside the BMP > (U+0000 to U+FFFF). When it became clear that more than 64k > characters would be needed for certain special applications > (historic alphabets and ideographs, mathematical and musical > typesetting, etc.), Unicode was turned into a sort of 21-bit > character set with possible code points in the range U-00000000 to > U-0010FFFF. The 2×1024 surrogate characters (U+D800 to U+DFFF) were > introduced into the BMP to allow 1024×1024 non-BMP characters to be > represented as a sequence of two 16-bit surrogate characters. This > way UTF-16 was born, which represents the extended "21-bit" Unicode > in a way backwards compatible with UCS-2. The term UTF-32 was > introduced in Unicode to mean a 4-byte encoding of the extended > "21-bit" Unicode. UTF-32 is the exact same thing as UCS-4, except > that by definition UTF-32 is never used to represent characters > above U-0010FFFF, while UCS-4 can cover all 231 code positions up to > U-7FFFFFFF. from a/the Unicode FAQ at http://www.cl.cam.ac.uk/~mgk25/unicode.html Does Haskell's support of "Unicode" mean UTF-32, or full UCS-4? Recent messages seem to indicate the former, but I don't see any reason against the latter. -kzm -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users