On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote:
On 05/27/2016 03:39 PM, Dmitry Olshansky wrote:
No, this is not the point of normalization.

What is? -- Andrei

1) A grapheme may include several combining characters (such as diacritics) whose order is not supposed to be semantically significant. Normalization sorts them in a standardized way so that string comparisons return the expected result for graphemes which differ only by the internal order of their constituent combining code points.

2) Some graphemes (like accented latin letters) can be represented by a single code point OR a letter followed by a combining diacritic. Normalization either splits them all apart (NFD), or combines them whenever possible (NFC). Again, this is primarily intended to make things like string comparisons work as expected, and perhaps to simplify low-level tasks like graphical rendering of text.

(Disclaimer: This is an oversimplification, because nothing about Unicode is ever simple.)

Reply via email to