Re: std.algorithm.remove and principle of least astonishment

Andrei Alexandrescu Sun, 21 Nov 2010 10:40:54 -0800

On 11/20/10 9:42 PM, Rainer Deyke wrote:

On 11/20/2010 16:58, Andrei Alexandrescu wrote:

On 11/20/10 12:32 PM, Rainer Deyke wrote:

std::vector<bool>   in C++ is a specialization of std::vector that packs
eight booleans into a byte instead of storing each element separately.
It doesn't behave exactly like other std::vectors and technically
doesn't meet the C++ requirements of a container, although it tries to
come as close as possible.  This means that any code that uses
std::vector<bool>   needs to be extra careful to take those differences in
account.  This is especially an issue when dealing with generic code
that uses std::vector<T>, where T may or may not be bool.


The issue with Vector!char is similar.  Because char[] is not a true
array, generic code that uses T[] can unexpectedly fail when T is char.
   Other containers of char behave like normal containers, iterating over
individual chars.  char[] iterates over dchars.  Vector!char can,
depending on its implementation, iterate over chars, iterate over
dchars, or fail to compile at all when instantiated with T=char.  It's
not even clear which of these is the correct behavior.


The parallel does not stand scrutiny. The problem with vector<bool>  in
C++ is that it implements no formal abstraction, although it is a
specialization of one.


The problem with std::vector<bool>  is that it pretends to be a
std::vector, but isn't.  If it was called dynamic_bitset instead, nobody
would have complained.  char[] has exactly the same problem.

char[] does not exhibit the same issues that vector<bool> has. Thesituation is very different, and again, trying to reduce one to anothermisses a lot of the picture.

vector<bool> hides representation and in doing so becomes non-compliantwith vector<T> which does expose representation. Worse, vector<bool> isnot compliant with any concept, express or implied, which makesvector<bool> virtually unusable with generic code.

In contrast, char[] exposes a meaningful representation (array of codeunits) that is often useful, and obeys a slightly weaker formalabstraction (bidirectional range) which is also useful. It's simply avery different setup from vector<bool>, and again attempting to use onein predicting the fare of the other is a poor approach.

Vector!char is just an example. Any generic code that uses T[] can
unexpectedly fail to compile or behave incorrectly used when T=char.
If I were to use D2 in its present state, I would try to avoid both
char/wchar and arrays as much as possible in order to avoid this
trap. This would mean avoiding large parts of Phobos, and providing
safe wrappers around the rest.


It may be wise in fact to start using D2 and make criticism grounded in
reality that could help us improve the state of affairs.


Sorry, but no.  It would take a huge investment of time and effort on my
part to switch from C++ to D.  I'm not going to make that leap without
looking first, and I'm not going to make it when I can see that I'm
about to jump into a spike pit.

You may rest assured that if anything, strings are not a problem. Theway the abstractions are laid out make D's strings the best approach toUnicode strings I know about.

The above is
only fallacious presupposition. Algorithms in Phobos are abstracted on
the formal range interface, and as such you won't be exposed to risks
when using them with strings.


I'm not concerned about algorithms, I'm concerned about code that uses
arrays directly.  Like my Vector!char example, which I see you still
haven't addressed.

When you define your abstractions, you are free to decide how you wantto go about them. The D programming language makes it unequivocallyclear that char[] is an array of UTF-8 code units that offers abidirectional range of code points. Same about wchar[] (replace UTF-8with UTF-16). dchar[] is an array of UTF-32 code points which areequivalent to code units, and as such is a full random-access range.

If you define your own function that uses an array directly, such assort(), then attempting to sort a char[] will get you exactly what youexpect - you sort the code units in the array. The sort routine in thestandard library is modeled to work with random access ranges, and willrefuse to sort a char[].

I have often reflected whether I'd do things differently if I could goback in time and join Walter when he invented D's strings. I might havedone one or two things differently, but the gain would be marginal atbest. In fact, it's not impossible the balance of things could have beenhurt. Between speed, simplicity, effectiveness, abstraction, access torepresentation, and economy of means, D's strings are the bestcompromise out there that I know of, bar none by a wide margin.



Andrei

Re: std.algorithm.remove and principle of least astonishment

Reply via email to