On 2010-11-21 20:21:27 -0500, Andrei Alexandrescu <seewebsiteforem...@erdani.org> said:

That design, with which I experimented for a while, had two drawbacks:

1. It had the default reversed, i.e. most often you want to regard a char[] or a wchar[] as a range of code points, not as an array of code units.

2. It had the unpleasant effect that most algorithms in std.algorithm and beyond did the wrong thing by default, and the right thing only if you wrapped everything with byDchar().

Well, basically these two arguments are the same: iterating by code unit isn't a good default. And I agree. But I'm unconvinced that iterating by dchar is the right default either. For one thing it has more overhead, and for another it still doesn't represent a character.

Now, add graphemes to the equation and you have a representation that matches the user-perceived character concept, but for that you add another layer of decoding overhead and a variable-size data type to manipulate (a grapheme is a sequence of code points). And you have to use Unicode normalization when comparing graphemes. So is that a good default? Probably not. It might be "correct" in some sense, but it's totally overkill for most cases.

My thinking is that there is no good default. If you write an XML parser, you'll probably want to work at the code point level; if you write a JSON parser, you can easily skip the overhead and work at the UTF-8 code unit level until you start parsing a string; if you write something to count the number of user-perceived characters or want to map characters to a font then you'll want graphemes...

Perhaps there should be simply no default; perhaps you should be forced to choose explicitly at which layer you want to operate each time you apply an algorithm on a string... and to make this less painful we could have functions in std.string acting as a thin layer over similar ones in std.algorithm that would automatically choose the right representation for the algorithm depending on the operation.

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Reply via email to