On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote:
On 05/27/2016 03:39 PM, Dmitry Olshansky wrote:
No, this is not the point of normalization.
What is? -- Andrei
1) A grapheme may include several combining characters (such as
diacritics) whose order is not supposed to be semantically
significant. Normalization sorts them in a standardized way so
that string comparisons return the expected result for graphemes
which differ only by the internal order of their constituent
combining code points.
2) Some graphemes (like accented latin letters) can be
represented by a single code point OR a letter followed by a
combining diacritic. Normalization either splits them all apart
(NFD), or combines them whenever possible (NFC). Again, this is
primarily intended to make things like string comparisons work as
expected, and perhaps to simplify low-level tasks like graphical
rendering of text.
(Disclaimer: This is an oversimplification, because nothing about
Unicode is ever simple.)