On 01/14/2011 04:50 PM, Michel Fortin wrote:
This might be a good time to see whether we need to address graphemes
systematically. Could you please post a few links that would educate
me and others in the mysteries of combining characters?

As usual, Wikipedia offers a good summary and a couple of references.
Here's the part about combining characters:
<http://en.wikipedia.org/wiki/Combining_character>.

There's basically four ranges of code points which are combining:
- Combining Diacritical Marks (0300–036F)
- Combining Diacritical Marks Supplement (1DC0–1DFF)
- Combining Diacritical Marks for Symbols (20D0–20FF)
- Combining Half Marks (FE20–FE2F)

A code point followed by one or more code points in these ranges is
conceptually a single character (a grapheme).

Unfortunatly, things are complicated by _prepend_ combining marks that happen in a code sequence _before_ the base mark. The Unicode algorithm is described here: http://unicode.org/reports/tr29/ section 3 (humanly readable ;-). See esp the first table in section 3.1.

Denis
_________________
vita es estrany
spir.wikidot.com

Reply via email to