Re: The Case Against Autodecode

default0 via Digitalmars-d Sun, 29 May 2016 05:11:53 -0700

On Sunday, 29 May 2016 at 11:47:30 UTC, Tobias Müller wrote:

On Sunday, 29 May 2016 at 11:25:11 UTC, Chris wrote:
Unicode graphemes are not always the same as graphemes innatural (written) languages. If <é> is composed in Unicode, itis still one grapheme in a written language, not two distinctcharacters. However, in natural languages two characters canbe one grapheme, as in English <sh>, it represents the soundin `shower, shop, fish`. In German the same sound isrepresented by three characters <sch> as in `Schaf` ("sheep").A bit nit-picky but we should make clear that we talk about"Unicode graphemes" that map to single characters on thewritten page. But is that at all possible across all languages?
To avoid confusion and misunderstandings we should agree onthe terminology first.
No, this is well established terminology, you are confusingseveral things here:
- A grapheme is a "character" as written on the page
- A phoneme is a spoken "character"
- A codepoint is the fundamental "unit" of unicode

Graphemes are built from one or more codepoints.
Phonemes are a different topic and not really covered by theunicode standard AFAIK. Except for the IPA notation, but theseare again graphemes that represent phonemes.

I am pretty sure that a single grapheme in unicode does notcorrespond to your notion of "character". I am pretty sure thatwhat you think of as a "character" is officially called "GraphemeCluster" not "Grapheme".


See here: http://www.unicode.org/glossary/#grapheme_cluster

Re: The Case Against Autodecode

Reply via email to