Philipp Reichmuth <[EMAIL PROTECTED]> writes:

| LGB> One glyph that thakes 64 bits to encode...
| 
| But not for any *technical* purpose. For all purposes of string
| processing, such as indexing, concatenation etc., this is *two*
| characters, not one.

Finding the length of the string...
 
| "Glyph length" can be rather arbitrary. But then you have examples of
| zero-width control characters in ISO encodings, so there is no real
| difference. The question is then how editing is understood. I don't
| think you can assume editing to work safely on the glyph level,
| because then you can't add/delete/insert combining accents.
| 
| LGB> | UTF-8 has a maximum character width of 4 bytes.
| 
| LGB> 6, but only 4 are allowed as this stage since no unicode char points
| LGB>  above 0x10ffff
| 
| Yup, and the Unicode consortium says something like they never will,
| because Unicode operates in a 20-bit character space. :-)

I thought 31-bit...

Just wait until they begin doing eastern languages for real.


-- 
        Lgb

Reply via email to