Re: The Case Against Autodecode

tsbockman via Digitalmars-d Fri, 27 May 2016 15:07:55 -0700

On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote:

On 05/27/2016 03:39 PM, Dmitry Olshansky wrote:

No, this is not the point of normalization.


What is? -- Andrei

1) A grapheme may include several combining characters (such asdiacritics) whose order is not supposed to be semanticallysignificant. Normalization sorts them in a standardized way sothat string comparisons return the expected result for graphemeswhich differ only by the internal order of their constituentcombining code points.

2) Some graphemes (like accented latin letters) can berepresented by a single code point OR a letter followed by acombining diacritic. Normalization either splits them all apart(NFD), or combines them whenever possible (NFC). Again, this isprimarily intended to make things like string comparisons work asexpected, and perhaps to simplify low-level tasks like graphicalrendering of text.

(Disclaimer: This is an oversimplification, because nothing aboutUnicode is ever simple.)

Re: The Case Against Autodecode

Reply via email to