On Wed, Mar 07, 2018 at 04:33:25PM +0000, Seb via Digitalmars-d wrote: > On Wednesday, 7 March 2018 at 15:26:40 UTC, Jon Degenhardt wrote: [...] > > Auto-decoding is a significant issue for the applications I work on > > (search engines). There is a lot of string manipulation in these > > environments, and performance matters. Auto-decoding is a meaningful > > performance hit. Otherwise, Phobos has a very nice collection of > > algorithms for string manipulation. It would be great to have a way > > to turn auto-decoding off in Phobos. [...] > Well you can use byCodeUnit, which disables auto-decoding > > Though it's not well-known and rather annoying to explicitly add it > almost everywhere.
And therein lies the rub: because it's *auto* decoding, rather than just decoding, it's implicit everywhere, adding to the performance hit without the coder being necessarily aware of it. You have to put in the effort to add .byCodeUnit everywhere. Worse yet, it gives the false sense of security that you're doing Unicode "right", when actually that is *not* true at all, because a code point is not equal to a grapheme (what people normally know as a "character"). But because operating at the code point level *appears* to be correct 80% of the time, bugs in string handling often go unnoticed, unlike operating at the code unit level, where any Unicode handling bugs are immediately obvious as soon as your string contains non-ASCII characters. So you're essentially paying the price of a significant performance hit for the dubious benefit of non-100%-correct code, but with bugs conveniently obscured so that it's harder to notice them. Kill autodecoding, I say. Kill it with fire!! T -- MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs