On 9/25/06, Jim Jewett <[EMAIL PROTECTED]> wrote:
As David Hopwood pointed out, to be fully correct, you already have to
create a custom function even with bmp characters, because of
decomposed characters.  (Example:  Representing a c-cedilla as a c and
a combining cedilla, rather than as a single code point.)  Separating
those two would be wrong.  Counting them as two characters for slicing
purposes would usually be wrong.

Even 32-bit representations are permitted to use surrogate pairs; it
just doesn't often make sense.

 There is at least one big difference between surrogate pairs and decomposed characters. The user can typically normalize away decompositions. How do you normalize away decompositions in a language that only supports 16-bit representations?

 Paul Prescod

_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to