Re: The Case Against Autodecode

Tobias Müller via Digitalmars-d Sun, 29 May 2016 04:51:07 -0700

On Sunday, 29 May 2016 at 11:25:11 UTC, Chris wrote:

Unicode graphemes are not always the same as graphemes innatural (written) languages. If <é> is composed in Unicode, itis still one grapheme in a written language, not two distinctcharacters. However, in natural languages two characters can beone grapheme, as in English <sh>, it represents the sound in`shower, shop, fish`. In German the same sound is representedby three characters <sch> as in `Schaf` ("sheep"). A bitnit-picky but we should make clear that we talk about "Unicodegraphemes" that map to single characters on the written page.But is that at all possible across all languages?
To avoid confusion and misunderstandings we should agree on theterminology first.

No, this is well established terminology, you are confusingseveral things here:


- A grapheme is a "character" as written on the page
- A phoneme is a spoken "character"
- A codepoint is the fundamental "unit" of unicode

Graphemes are built from one or more codepoints.

Phonemes are a different topic and not really covered by theunicode standard AFAIK. Except for the IPA notation, but theseare again graphemes that represent phonemes.

Re: The Case Against Autodecode

Reply via email to