On 2010-11-21 20:21:27 -0500, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> said:
That design, with which I experimented for a while, had two drawbacks:
1. It had the default reversed, i.e. most often you want to regard a
char[] or a wchar[] as a range of code points, not as an array of code
units.
2. It had the unpleasant effect that most algorithms in std.algorithm
and beyond did the wrong thing by default, and the right thing only if
you wrapped everything with byDchar().
Well, basically these two arguments are the same: iterating by code
unit isn't a good default. And I agree. But I'm unconvinced that
iterating by dchar is the right default either. For one thing it has
more overhead, and for another it still doesn't represent a character.
Now, add graphemes to the equation and you have a representation that
matches the user-perceived character concept, but for that you add
another layer of decoding overhead and a variable-size data type to
manipulate (a grapheme is a sequence of code points). And you have to
use Unicode normalization when comparing graphemes. So is that a good
default? Probably not. It might be "correct" in some sense, but it's
totally overkill for most cases.
My thinking is that there is no good default. If you write an XML
parser, you'll probably want to work at the code point level; if you
write a JSON parser, you can easily skip the overhead and work at the
UTF-8 code unit level until you start parsing a string; if you write
something to count the number of user-perceived characters or want to
map characters to a font then you'll want graphemes...
Perhaps there should be simply no default; perhaps you should be forced
to choose explicitly at which layer you want to operate each time you
apply an algorithm on a string... and to make this less painful we
could have functions in std.string acting as a thin layer over similar
ones in std.algorithm that would automatically choose the right
representation for the algorithm depending on the operation.
--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/