Re: std.algorithm.remove and principle of least astonishment

Michel Fortin Sun, 21 Nov 2010 18:31:04 -0800

On 2010-11-21 20:21:27 -0500, Andrei Alexandrescu<seewebsiteforem...@erdani.org> said:

That design, with which I experimented for a while, had two drawbacks:
1. It had the default reversed, i.e. most often you want to regard achar[] or a wchar[] as a range of code points, not as an array of codeunits.
2. It had the unpleasant effect that most algorithms in std.algorithmand beyond did the wrong thing by default, and the right thing only ifyou wrapped everything with byDchar().

Well, basically these two arguments are the same: iterating by codeunit isn't a good default. And I agree. But I'm unconvinced thatiterating by dchar is the right default either. For one thing it hasmore overhead, and for another it still doesn't represent a character.

Now, add graphemes to the equation and you have a representation thatmatches the user-perceived character concept, but for that you addanother layer of decoding overhead and a variable-size data type tomanipulate (a grapheme is a sequence of code points). And you have touse Unicode normalization when comparing graphemes. So is that a gooddefault? Probably not. It might be "correct" in some sense, but it'stotally overkill for most cases.

My thinking is that there is no good default. If you write an XMLparser, you'll probably want to work at the code point level; if youwrite a JSON parser, you can easily skip the overhead and work at theUTF-8 code unit level until you start parsing a string; if you writesomething to count the number of user-perceived characters or want tomap characters to a font then you'll want graphemes...

Perhaps there should be simply no default; perhaps you should be forcedto choose explicitly at which layer you want to operate each time youapply an algorithm on a string... and to make this less painful wecould have functions in std.string acting as a thin layer over similarones in std.algorithm that would automatically choose the rightrepresentation for the algorithm depending on the operation.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: std.algorithm.remove and principle of least astonishment

Reply via email to