On 6/2/2016 4:29 PM, Jonathan M Davis via Digitalmars-d wrote:
> How do you suggest that we handle the normalization issue? Should we just
> assume NFC like std.uni.normalize does and provide an optional template
> argument to indicate a different normalization (like normalize does)? Since
> without providing a way to deal with the normalization, we're not actually
> making the code fully correct, just faster.

The short answer is, we don't.

1. D is a systems programming language. Baking normalization, graphemes and Unicode locales in at a low level will have a disastrous negative effect on performance and size.

2. Very little systems programming work requires level 2 or 3 Unicode support.

3. Are they needed? Pedantically, yes. Practically, not necessarily.

4. What we must do is, for each algorithm, document how it handles Unicode.

5. Normalization, graphemes, and locales should all be explicitly opt-in with corresponding library code.

Normalization: s.normalize.algorithm()
Graphemes: may require separate algorithms, maybe std.grapheme?
Locales: I have no idea, given that I have not studied that issue

6. std.string has many analogues for std.algorithms that are specific to the peculiarities of strings. I think this is a perfectly acceptable approach. For example, there are many ways to sort Unicode strings, and many of them do not fit in with std.algorithm.sort's ways. Having special std.string.sort's for them would be the most practical solution.

7. At some point, as the threads on autodecode amply illustrate, working with level 2 or level 3 Unicode requires a certain level of understanding on the part of the programmer writing the code, because there simply is no overarching correct way to do things. The programmer is going to have to understand what he is trying to accomplish with Unicode and select the code/algorithms accordingly.

Reply via email to