Re: std.algorithm.remove and principle of least astonishment

Rainer Deyke Sun, 21 Nov 2010 20:14:41 -0800

On 11/21/2010 17:31, Andrei Alexandrescu wrote:
> On 11/21/10 6:12 PM, Rainer Deyke wrote:
>> I agree that there are differences.  For one thing, if you iterate
>> over a std::vector<bool>  you get actual booleans, albeit through an
>> extra layer of indirection.  If you iterate over char[] you might get
>> chars or you might get dchars depending on the method you use for
>> iterating.
> 
> This is sensible because a string may be seen as a sequence of code
> points or a sequence of code units. Either view is useful.


I don't dispute that either view is useful.

>> char[] isn't the equivalent of std::vector<bool>.  It's worse.
>> char[] is the equivalent of a vector<bool>  that keeps the current
>> behavior of std::vector<bool>  when iterating through iterators, but
>> gives access to bytes of packed booleans when using operator[].
> 
> I explained why char[] is better than vector<bool>. Ignoring the
> explanation and restating a fallacious conclusion based on an
> overstretched parallel does hardly much to push forward the discussion.

I'm not interested in discussing if char[] is overall a better data
structure than std::vector<bool>.  I'm focusing on one particular
property of both.

std::vector<bool> fails to provide some of the guarantees of all other
instances of std::vector<T>.  This means that generic code that uses
std::vector<T> needs to take special consideration of std::vector<bool>
if it wants to work correctly when T = bool.  This is an indisputable fact.

char[] and wchar[] fail to provide some of the guarantees of all other
instances of T[].  This means that generic code that uses T[] needs to
take special consideration of char[] if it wants to work correctly when
T = char.  This is also an indisputable fact.

I don't think it's much a stretch to draw an analogy from
std::vector<bool> to char[] based on this.  However, even if
std::vector<bool> did not exist, I would still consider this a design
flaw of char[].

> Again: code units _are_ well-defined, useful to have access to, and good
> for a variety of uses. Please understand this.

Again, I understand this and don't dispute it.  It's a complete
non-sequitur to this discussion.  I'm not arguing against the string
type providing access to both code points and code units.  I'm arguing
against the string type having the name of the array when it doesn't
share the behavior of an array.

>> I'm not concerned about strings, I'm concerned about *arrays*.
>> Arrays of T, where T may or not be a character type.  I see that you
>> ignored my Vector!char example yet again.
> 
> I sure have replied to it, but probably my reply hasn't been read.
> Please allow me to paste it again:
> 
>> When you define your abstractions, you are free to decide how you
>> want to go about them. The D programming language makes it
>> unequivocally clear that char[] is an array of UTF-8 code units that
>> offers a bidirectional range of code points. Same about wchar[]
>> (replace UTF-8 with UTF-16). dchar[] is an array of UTF-32 code
>> points which are equivalent to code units, and as such is a full
>> random-access range.
> 
> So it's up to you what Vector!char does. In D char[] is an array of code
> units that can be iterated as a bidirectional range of code points. I
> don't see anything cagey about that.

Ah, I did read that, but it doesn't address my concerns about
Vector!char at all.  I'm aware that I can write Vector!char to act like
a container of code units.  I'm also aware that I can write Vector!char
to automatically translate to code points.  My concerns are these:

  - When writing code that uses T[], it is often natural to mix
range-based access and index-based access, with the assumption that both
provide direct access to the same underlying data.  However, with char[]
this assumption is incorrect, as the underlying data is transformed when
viewing the array as a range.  This means that generic code that uses
T[] must take special consideration of char[] or it may unexpectedly
produce incorrect results when T = char.

  - char[] sets a precedent of Container!char providing a dchar range
interface.  Other containers must choose to either follow this precedent
or to avoid it.  Either choice may require extra work when implementing
the container.  Either choice can lead to surprising behavior for the
user of the container.


-- 
Rainer Deyke - rain...@eldwood.com

Re: std.algorithm.remove and principle of least astonishment

Reply via email to