Torsten Bögershausen:

Some of the code points which have "0 length on the display" are called
"combining", others are called "vowels" or "accents".
E.g. 5BF is not marked any of them, but if you look at the glyph, it should
be combining (please correct me if that is wrong).

All combining characters has a non-zero combining class in http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (fourth field, called Canonical_Combining_Class in http://www.unicode.org/reports/tr44/ ). For instance, the aforementioned U+05BF is defined as follows:

  05BF;HEBREW POINT RAFE;Mn;23;NSM;;;;;N;;;;;

The combining class is 23, so this is a combining character.

There is a difference between non-spacing combining marks ("Mn" in the third column (General_Category)) and others ("Mc" for spacing marks and "Me" for enclosing marks), so they might need specifial handling. Additionally, you have the "zero-width" characters, such as U+200B Zero Width Space. These have the "Cf" class, although it also contains visible characters IIRC.

--
\\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to