Philipp Reichmuth <[EMAIL PROTECTED]> writes: | LGB> One glyph that thakes 64 bits to encode... | | But not for any *technical* purpose. For all purposes of string | processing, such as indexing, concatenation etc., this is *two* | characters, not one.
Finding the length of the string... | "Glyph length" can be rather arbitrary. But then you have examples of | zero-width control characters in ISO encodings, so there is no real | difference. The question is then how editing is understood. I don't | think you can assume editing to work safely on the glyph level, | because then you can't add/delete/insert combining accents. | | LGB> | UTF-8 has a maximum character width of 4 bytes. | | LGB> 6, but only 4 are allowed as this stage since no unicode char points | LGB> above 0x10ffff | | Yup, and the Unicode consortium says something like they never will, | because Unicode operates in a 20-bit character space. :-) I thought 31-bit... Just wait until they begin doing eastern languages for real. -- Lgb