https://issues.dlang.org/show_bug.cgi?id=14519
--- Comment #14 from Vladimir Panteleev <thecybersha...@gmail.com> --- (In reply to Walter Bright from comment #13) > Vladimir, you bring up good points. I'll try to address them. First off, why > do this? > > 1. much faster If I understand correctly, throwing Error instead of Exception will also solve the performance issues > 2. string processing can be @nogc and nothrow. If you follow external > discussions on the merits of D, the "D is no good because Phobos requires > the GC" ALWAYS comes up, and sucks all the energy out of the conversation. Ditto, but the @nogc aspect can also be solved with the refcounted exceptions spec, which will fix the problem in general. > So, on to your points: > > 1. Replacement only happens when doing a UTF decoding. S+R doesn't have to > do conversion, and that's one of the things I want to fix in std.algorithm. > The string fixes I've done in std.string avoid decoding as much as possible. Inevitably it is still very easy to to accidentally use something that auto-decodes. There is no way to statically make sure that you don't (except for using a non-string type for text, which is impractical), and with this proposed change, there will be no run-time way to handle this either. > 2. Same thing. (Running normalization on passwords? What the hell?) I did not mean Unicode normalization - it was a joke (std.algorithm will "normalize" invalid UTF characters to the replacement character). But since .front on strings autodecodes, feeding a string to any generic range function in std.algorithm will cause auto-decoding (and thus, character substitution). > The replacement char thing was not invented by me, it is commonplace as > users don't like their documents being wholly rejected for one or two bad > encodings. I know, I agree it's useful, but it needs to be opt-in. > I know that many programs try to guess the encoding of random text they get. > Doing this by only reading a few characters, and assuming the rest, is a > strange method if one cares about the integrity of the data. I don't see how this is relevant, sorry. > Having to constantly re-sanitize data, at every step in the pipeline, is > going to make D programs uncompetitive speed-wise. I don't understand what you mean by this. You could say that any way to handle invalid UTF can be seen as a way of sanitizing data: there will always be a code path for what to do when invalid UTF is encountered. I would interpret "no sanitization" as not handling invalid UTF in any way (i.e. treating it in an undefined way). --