On 16/08/2017 13:37, Alexey via Lazarus wrote:
On 16.08.2017 15:30, Martin Frb via Lazarus wrote:
A char can be composed of several combining code points (each of them
afaik, in the 32 bit range).
So a char can have 96 or more bits. (And not all of them have a
combined form).
See my prev post: i see that each S[i] good to be like QWord
(sizeof(one char)= sizeof(Qword)). It can be TextChar. And type can be
TextString. internally it can be compressed to utf8. TextString is
good if i want to parse text by "chars". If "char" needs more bytes-
lets take more (internally it is same utf8)
Have a look at
https://www.reddit.com/r/Unicode/comments/4yie0a/tallest_longest_unicode_character/
There is ONE character, that comprises more than 200 codepoints.
Only way to store such a char is in a type of dynamic size (aka string)
Well I couldn't find an official doc what makes the boundaries of a char.
But as far as I can see: if รค is one character, and it can be encoded as
"none combining codepoint" + "combining codepoint", then a character is
any sequence of one "none combining codepoint" + zero or more "combining
codepoints" (AFAIK Arabic scripts has chars, that have several
"combining codepoints", so this is happening in actual languages.
The example as far as I checked fulfils this definition.
--
_______________________________________________
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus