Philipp Reichmuth <[EMAIL PROTECTED]> writes:
| LGB> One glyph that thakes 64 bits to encode...
|
| But not for any *technical* purpose. For all purposes of string
| processing, such as indexing, concatenation etc., this is *two*
| characters, not one.
Finding the length of the string...
| "Glyph length" can be rather arbitrary. But then you have examples of
| zero-width control characters in ISO encodings, so there is no real
| difference. The question is then how editing is understood. I don't
| think you can assume editing to work safely on the glyph level,
| because then you can't add/delete/insert combining accents.
|
| LGB> | UTF-8 has a maximum character width of 4 bytes.
|
| LGB> 6, but only 4 are allowed as this stage since no unicode char points
| LGB> above 0x10ffff
|
| Yup, and the Unicode consortium says something like they never will,
| because Unicode operates in a 20-bit character space. :-)
I thought 31-bit...
Just wait until they begin doing eastern languages for real.
--
Lgb