On 5/31/11 11:16 AM, eles wrote:
I agree that 1-based indexing works pretty well with closed-right
intervals; forgot to mention that in my first response. Even with
such
an approach, certain things are less obvious, e.g. the number of
elements in an interval a..b is b-a+1, not b-a.

Yes, but in C too, when going from a[0] to a[N-1], people know there
are... (N-1)-0+1 elements (so, b-a+1). It is the same.

Now, why:

for(iterator from a[0] to a[N-1]){ //etc. }
//let use the above notation for for(i=0; i<=N-1; i++)

is acceptable, but sudden is no more acceptable to write

a[for(iterator from 0 to N-1)]

and one must use

a[for(iterator from 0 to N]]

in order to achieve exactly the same?

The last two expressions are just mental placeholders for a[0..N-1]
and for a[0..N] respectively.


All in all D doesn't attempt to break new ground with open-right
intervals. It would be gratuitously and jarringly different from
all of
its major influencers. Though I understand the choice looks odd
coming
from Matlab, the same could be said about a lot of other languages.

I don't see that ground. Maybe I simply lack information. Can you
help?

As I mentioned, the issues involved are of increasing subtlety. As you wrote, C programmers iterate upward like this:

for (int i = 0; i < N; i++)

However if N is the length of an array, that has unsigned type size_t and therefore i should follow suit:

for (size_t i = 0; i < N; i++)

This is _not_ equivalent with:

for (size_t i = 0; i <= N - 1; i++)

because at N == 0, iteration will take a very long time (and go to all kinds of seedy places).

But often C programmers iterate with pointers between given limits:

for (int* p = a; p < a + N; i++)

It would appear that that does take care of the N == 0 case, so the loop should be equivalent with:

for (int* p = a; p <= a + N - 1; i++)

Alas, it's not. There is a SPECIAL RULE in the C programming language that says pointers EXACTLY past ONE the end of an array are ALWAYS comparable for (in)equality and ordering with pointers inside the array. There is no such rule for pointers one BEFORE the array.

Therefore, C programmers who use pointers and lengths or pairs of pointers ALWAYS need to use open-right intervals because otherwise they would be unable to iterate with well-defined code. Essentially C legalizes open-right and only open-right intervals inside the language. C++ carried that rule through.

C++ programmers who use iterators actually write loops a bit differently. To go from a to b they write:

for (SomeIterator it = a; it != b; ++it)

They use "!=" instead of "<" because there are iterators that compare for inequality but not for ordering. Furthermore, they use ++it instead of it++ because the latter may be more expensive. A conscientious C++ programmer wants to write code that makes the weakest assumptions about the types involved.

This setup has as a direct consequence the fact that ranges expressed as pairs of iterators MUST use the right-open convention. Otherwise it would be impossible to express an empty range, and people would have a very hard time iterating even a non-empty one as they'd have to write:

// Assume non-empty closed range
for (SomeIterator it = a; ; ++it)
{
    ... code ...
    if (it == b) break;
}

or something similar.

The STL also carefully defines the behavior of iterators past one the valid range, a move with many nice consequences including consistency with pointers and the ease of expressing an empty range as one with a == b. Really, if there was any doubt that right-open is a good choice, the STL blew it off its hinges.

That's not to say right-open ranges are always best. One example where a closed range is necessary is with expressing total closed sets. Consider e.g. a random function:

uint random(uint min, uint max);

In this case it's perfectly normal for random to return a number in [min, max]. If random were defined as:

uint random(uint min, uint one_after_max);

it would be impossible (or highly unintuitive) to generate a random number that has uint.max as its largest value.

It could be said that closed intervals make it difficult to express empty ranges, whereas right-open intervals make it difficult to express total ranges. It turns out the former are a common case whereas the latter is a rarely-encountered one. That's why we use closed intervals in std.random and case statements but open intervals everywhere else.

At this point, right-open is so embedded in the ethos of D that it would be pretty much impossible to introduce an exception for slices without confusing a lot of people -- all for hardly one good reason.



Andrei

Reply via email to