On Thursday, 8 March 2018 at 17:35:11 UTC, H. S. Teoh wrote:
Yeah, the only reason autodecoding survived in the beginning
was because Andrei (wrongly) thought that a Unicode code point
was equivalent to a grapheme. If that had been the case, the
cost associated with auto-decoding may have been justifiable.
Unfortunately, that is not the case, which greatly diminishes
most of the advantages that autodecoding was meant to have. So
it ended up being something that incurred a significant
performance hit, yet did not offer the advantages it was
supposed to. To fully live up to Andrei's original vision, it
would have to include grapheme segmentation as well.
Unfortunately, graphemes are of arbitrary length and cannot in
general fit in a single dchar (or any fixed-size type), and
grapheme segmentation is extremely costly to compute, so doing
it by default would kill D's string manipulation performance.
I remember something a bit different last time it was discussed:
- removing auto-decoding was breaking a lot of code, it's used
in lots of place
- performance loss could be mitigated with .byCodeUnit everytime
- Andrei correctly advocating against breakage
Personally I do use auto-decoding, often iterating by codepoint,
and uses it for fonts and parsers. It's correct for a large
subset of languages. You gave us a feature and now we are using
it ;)