On 9/26/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
Paul Prescod schrieb:
> There is at least one big difference between surrogate pairs and
> decomposed characters. The user can typically normalize away
> decompositions. How do you normalize away decompositions in a language
> that only supports 16-bit representations?
I don't see the problem: You use UTF-16; all normal forms (NFC, NFD,
NFKC, NFKD) can be represented in UTF-16 just fine.
It is somewhat tricky to implement a normalization algorithm in
UTF-16, since you must combine surrogate pairs first in order to
find out what the canonical decomposition of the code point is;
but it's just more code, and no problem in principle.
Regards,
Martin
_______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
