Unicode Normalization (and graphemes and locales)

Walter Bright via Digitalmars-d Thu, 02 Jun 2016 17:16:40 -0700

On 6/2/2016 4:29 PM, Jonathan M Davis via Digitalmars-d wrote:
> How do you suggest that we handle the normalization issue? Should we just
> assume NFC like std.uni.normalize does and provide an optional template
> argument to indicate a different normalization (like normalize does)? Since
> without providing a way to deal with the normalization, we're not actually
> making the code fully correct, just faster.


The short answer is, we don't.

1. D is a systems programming language. Baking normalization, graphemes andUnicode locales in at a low level will have a disastrous negative effect onperformance and size.


2. Very little systems programming work requires level 2 or 3 Unicode support.

3. Are they needed? Pedantically, yes. Practically, not necessarily.

4. What we must do is, for each algorithm, document how it handles Unicode.

5. Normalization, graphemes, and locales should all be explicitly opt-in withcorresponding library code.


Normalization: s.normalize.algorithm()
Graphemes: may require separate algorithms, maybe std.grapheme?
Locales: I have no idea, given that I have not studied that issue

6. std.string has many analogues for std.algorithms that are specific to thepeculiarities of strings. I think this is a perfectly acceptable approach. Forexample, there are many ways to sort Unicode strings, and many of them do notfit in with std.algorithm.sort's ways. Having special std.string.sort's for themwould be the most practical solution.

7. At some point, as the threads on autodecode amply illustrate, working withlevel 2 or level 3 Unicode requires a certain level of understanding on the partof the programmer writing the code, because there simply is no overarchingcorrect way to do things. The programmer is going to have to understand what heis trying to accomplish with Unicode and select the code/algorithms accordingly.

Unicode Normalization (and graphemes and locales)

Reply via email to