On Saturday, 28 May 2016 at 22:29:12 UTC, Andrew Godfrey wrote:
[snip]


From all the detail in this thread, I wonder now if "a grapheme" is even an unambiguous concept across different environments.

Unicode graphemes are not always the same as graphemes in natural (written) languages. If <é> is composed in Unicode, it is still one grapheme in a written language, not two distinct characters. However, in natural languages two characters can be one grapheme, as in English <sh>, it represents the sound in `shower, shop, fish`. In German the same sound is represented by three characters <sch> as in `Schaf` ("sheep"). A bit nit-picky but we should make clear that we talk about "Unicode graphemes" that map to single characters on the written page. But is that at all possible across all languages?

To avoid confusion and misunderstandings we should agree on the terminology first.

Reply via email to