On Sunday, 29 May 2016 at 11:25:11 UTC, Chris wrote:
Unicode graphemes are not always the same as graphemes in natural (written) languages. If <é> is composed in Unicode, it is still one grapheme in a written language, not two distinct characters. However, in natural languages two characters can be one grapheme, as in English <sh>, it represents the sound in `shower, shop, fish`. In German the same sound is represented by three characters <sch> as in `Schaf` ("sheep"). A bit nit-picky but we should make clear that we talk about "Unicode graphemes" that map to single characters on the written page. But is that at all possible across all languages?

To avoid confusion and misunderstandings we should agree on the terminology first.

No, this is well established terminology, you are confusing several things here:

- A grapheme is a "character" as written on the page
- A phoneme is a spoken "character"
- A codepoint is the fundamental "unit" of unicode

Graphemes are built from one or more codepoints.
Phonemes are a different topic and not really covered by the unicode standard AFAIK. Except for the IPA notation, but these are again graphemes that represent phonemes.

Reply via email to