On Sunday, 29 May 2016 at 11:25:11 UTC, Chris wrote:
Unicode graphemes are not always the same as graphemes in
natural (written) languages. If <é> is composed in Unicode, it
is still one grapheme in a written language, not two distinct
characters. However, in natural languages two characters can be
one grapheme, as in English <sh>, it represents the sound in
`shower, shop, fish`. In German the same sound is represented
by three characters <sch> as in `Schaf` ("sheep"). A bit
nit-picky but we should make clear that we talk about "Unicode
graphemes" that map to single characters on the written page.
But is that at all possible across all languages?
To avoid confusion and misunderstandings we should agree on the
terminology first.
No, this is well established terminology, you are confusing
several things here:
- A grapheme is a "character" as written on the page
- A phoneme is a spoken "character"
- A codepoint is the fundamental "unit" of unicode
Graphemes are built from one or more codepoints.
Phonemes are a different topic and not really covered by the
unicode standard AFAIK. Except for the IPA notation, but these
are again graphemes that represent phonemes.