Re: [Submission] D Slices

Andrei Alexandrescu Tue, 31 May 2011 11:01:04 -0700

On 5/31/11 11:16 AM, eles wrote:

I agree that 1-based indexing works pretty well with closed-right
intervals; forgot to mention that in my first response. Even with

such

an approach, certain things are less obvious, e.g. the number of
elements in an interval a..b is b-a+1, not b-a.


Yes, but in C too, when going from a[0] to a[N-1], people know there
are... (N-1)-0+1 elements (so, b-a+1). It is the same.

Now, why:

for(iterator from a[0] to a[N-1]){ //etc. }
//let use the above notation for for(i=0; i<=N-1; i++)

is acceptable, but sudden is no more acceptable to write

a[for(iterator from 0 to N-1)]

and one must use

a[for(iterator from 0 to N]]

in order to achieve exactly the same?

The last two expressions are just mental placeholders for a[0..N-1]
and for a[0..N] respectively.

All in all D doesn't attempt to break new ground with open-right
intervals. It would be gratuitously and jarringly different from

all of

its major influencers. Though I understand the choice looks odd

coming

from Matlab, the same could be said about a lot of other languages.


I don't see that ground. Maybe I simply lack information. Can you
help?

As I mentioned, the issues involved are of increasing subtlety. As youwrote, C programmers iterate upward like this:


for (int i = 0; i < N; i++)

However if N is the length of an array, that has unsigned type size_tand therefore i should follow suit:


for (size_t i = 0; i < N; i++)

This is _not_ equivalent with:

for (size_t i = 0; i <= N - 1; i++)

because at N == 0, iteration will take a very long time (and go to allkinds of seedy places).


But often C programmers iterate with pointers between given limits:

for (int* p = a; p < a + N; i++)

It would appear that that does take care of the N == 0 case, so the loopshould be equivalent with:


for (int* p = a; p <= a + N - 1; i++)

Alas, it's not. There is a SPECIAL RULE in the C programming languagethat says pointers EXACTLY past ONE the end of an array are ALWAYScomparable for (in)equality and ordering with pointers inside the array.There is no such rule for pointers one BEFORE the array.

Therefore, C programmers who use pointers and lengths or pairs ofpointers ALWAYS need to use open-right intervals because otherwise theywould be unable to iterate with well-defined code. Essentially Clegalizes open-right and only open-right intervals inside the language.C++ carried that rule through.

C++ programmers who use iterators actually write loops a bitdifferently. To go from a to b they write:


for (SomeIterator it = a; it != b; ++it)

They use "!=" instead of "<" because there are iterators that comparefor inequality but not for ordering. Furthermore, they use ++it insteadof it++ because the latter may be more expensive. A conscientious C++programmer wants to write code that makes the weakest assumptions aboutthe types involved.

This setup has as a direct consequence the fact that ranges expressed aspairs of iterators MUST use the right-open convention. Otherwise itwould be impossible to express an empty range, and people would have avery hard time iterating even a non-empty one as they'd have to write:


// Assume non-empty closed range
for (SomeIterator it = a; ; ++it)
{
    ... code ...
    if (it == b) break;
}

or something similar.

The STL also carefully defines the behavior of iterators past one thevalid range, a move with many nice consequences including consistencywith pointers and the ease of expressing an empty range as one with a ==b. Really, if there was any doubt that right-open is a good choice, theSTL blew it off its hinges.

That's not to say right-open ranges are always best. One example where aclosed range is necessary is with expressing total closed sets. Considere.g. a random function:


uint random(uint min, uint max);

In this case it's perfectly normal for random to return a number in[min, max]. If random were defined as:


uint random(uint min, uint one_after_max);

it would be impossible (or highly unintuitive) to generate a randomnumber that has uint.max as its largest value.

It could be said that closed intervals make it difficult to expressempty ranges, whereas right-open intervals make it difficult to expresstotal ranges. It turns out the former are a common case whereas thelatter is a rarely-encountered one. That's why we use closed intervalsin std.random and case statements but open intervals everywhere else.

At this point, right-open is so embedded in the ethos of D that it wouldbe pretty much impossible to introduce an exception for slices withoutconfusing a lot of people -- all for hardly one good reason.




Andrei

Re: [Submission] D Slices

Reply via email to