On 11/20/10 9:42 PM, Rainer Deyke wrote:
On 11/20/2010 16:58, Andrei Alexandrescu wrote:
On 11/20/10 12:32 PM, Rainer Deyke wrote:
std::vector<bool>   in C++ is a specialization of std::vector that packs
eight booleans into a byte instead of storing each element separately.
It doesn't behave exactly like other std::vectors and technically
doesn't meet the C++ requirements of a container, although it tries to
come as close as possible.  This means that any code that uses
std::vector<bool>   needs to be extra careful to take those differences in
account.  This is especially an issue when dealing with generic code
that uses std::vector<T>, where T may or may not be bool.

The issue with Vector!char is similar.  Because char[] is not a true
array, generic code that uses T[] can unexpectedly fail when T is char.
   Other containers of char behave like normal containers, iterating over
individual chars.  char[] iterates over dchars.  Vector!char can,
depending on its implementation, iterate over chars, iterate over
dchars, or fail to compile at all when instantiated with T=char.  It's
not even clear which of these is the correct behavior.

The parallel does not stand scrutiny. The problem with vector<bool>  in
C++ is that it implements no formal abstraction, although it is a
specialization of one.

The problem with std::vector<bool>  is that it pretends to be a
std::vector, but isn't.  If it was called dynamic_bitset instead, nobody
would have complained.  char[] has exactly the same problem.

char[] does not exhibit the same issues that vector<bool> has. The situation is very different, and again, trying to reduce one to another misses a lot of the picture.

vector<bool> hides representation and in doing so becomes non-compliant with vector<T> which does expose representation. Worse, vector<bool> is not compliant with any concept, express or implied, which makes vector<bool> virtually unusable with generic code.

In contrast, char[] exposes a meaningful representation (array of code units) that is often useful, and obeys a slightly weaker formal abstraction (bidirectional range) which is also useful. It's simply a very different setup from vector<bool>, and again attempting to use one in predicting the fare of the other is a poor approach.

Vector!char is just an example. Any generic code that uses T[] can
unexpectedly fail to compile or behave incorrectly used when T=char.
If I were to use D2 in its present state, I would try to avoid both
char/wchar and arrays as much as possible in order to avoid this
trap. This would mean avoiding large parts of Phobos, and providing
safe wrappers around the rest.

It may be wise in fact to start using D2 and make criticism grounded in
reality that could help us improve the state of affairs.

Sorry, but no.  It would take a huge investment of time and effort on my
part to switch from C++ to D.  I'm not going to make that leap without
looking first, and I'm not going to make it when I can see that I'm
about to jump into a spike pit.

You may rest assured that if anything, strings are not a problem. The way the abstractions are laid out make D's strings the best approach to Unicode strings I know about.

The above is
only fallacious presupposition. Algorithms in Phobos are abstracted on
the formal range interface, and as such you won't be exposed to risks
when using them with strings.

I'm not concerned about algorithms, I'm concerned about code that uses
arrays directly.  Like my Vector!char example, which I see you still
haven't addressed.

When you define your abstractions, you are free to decide how you want to go about them. The D programming language makes it unequivocally clear that char[] is an array of UTF-8 code units that offers a bidirectional range of code points. Same about wchar[] (replace UTF-8 with UTF-16). dchar[] is an array of UTF-32 code points which are equivalent to code units, and as such is a full random-access range.

If you define your own function that uses an array directly, such as sort(), then attempting to sort a char[] will get you exactly what you expect - you sort the code units in the array. The sort routine in the standard library is modeled to work with random access ranges, and will refuse to sort a char[].

I have often reflected whether I'd do things differently if I could go back in time and join Walter when he invented D's strings. I might have done one or two things differently, but the gain would be marginal at best. In fact, it's not impossible the balance of things could have been hurt. Between speed, simplicity, effectiveness, abstraction, access to representation, and economy of means, D's strings are the best compromise out there that I know of, bar none by a wide margin.


Andrei

Reply via email to