Re: UniCode

Marcin 'Qrczak' Kowalczyk Fri, 05 Oct 2001 10:53:06 -0700

Fri, 5 Oct 2001 23:23:50 +1000, Andrew J Bromage <[EMAIL PROTECTED]> pisze:


> There is a set of one million (more correctly, 1M) Unicode characters
> which are only accessible using surrogate pairs (i.e. two UTF-16
> codes).  There are currently none of these codes assigned,

This information is out of date. AFAIR about 40000 of them is assigned.
Most for Chinese (current, not historic).

> So rare, in fact, that the cost of strings taking up twice the
> space that the currently do simply isn't worth the cost.

In Haskell strings already have high overhead. In GHC a Char# value
(inside Char object) always takes the same size as the pointer
(32 or 64 bits), no matter how much of it is used.

> It just goes to show that strings are not merely arrays of characters
> like some languages would have you believe.

In Haskell String = [Char]. It's true that Char values don't
necessarily correspond to glyphs, but Strings are composed of Chars.

-- 
 __("<  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZAST�PCZA
QRCZAK


_______________________________________________
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: UniCode

Reply via email to