On 12/30/10 6:41 AM, spir wrote:
Hello,


In the course of a project (1) 2 partner D programmers and myself are
currently implementing, we faced 2 issues which prevented us using a
range interface as planned. We initially intended to do it for better
compliance with D's coming new style, and nice inter-relation with
Phobos modules like std.algorithm. Instead, we use opApply to
implement traversal; which does the job. This post's purpose is to
expose these problems to help and making range interfaces better
usable, or simply usable, in various practical cases. (Opinions are
mine --I don't known my partners' exact position, except that they
indeed agree these topics plainly prevent us using ranges.)
[snip]

Thanks very much for taking the time to document your experience. This is very useful.

-1- textual output
Functions like writeln silently use the formatValue set of templates
in std.format, to get a textual form for the element to be output.
Template constraints determine which version is used for a given kind of
element. As implemented now:
* There are formatValue variants for elements that implement a range
interface (which produce an array-like form).
* Constraint selection for a range interface has high precedence;
especially it shortcuts any toString function explicitely implemented by
the programmer.

This prevents a user to define any adequate output format for the
elements of a custom type, if the type also happens to implement a range
interface. A critical bug, and an absolute blocker for us.

Seeing "absolute blocker" here makes me think: is there /absolutely no way/ to get that particular task done in the D programming language? I might be off here because I don't have details, but my knee-jerk reaction is:

struct MyRange { ... }
void print(MyRange r) { ... }

Of course it would be great if you could use the predefined write/writeln functions, and we should improve them, but I have a hard time considering this an absolute blocker. You write a function once and then you call it. I do that in my C++ code all the time. Iostreams are terrible (to be euphemistic), but they don't prevent me and others from using C++.

Actually, this cannot even work in numerous cases. In our project, the range's 
ElementType (as returned by opindex&  front) would happen to be identical to 
the range type itself. In this case, the formatting algorithm via a range runs into 
an infinite loop.

Ha, glad you mention that. When I first designed ranges I had this fear in a couple of places: if a range is its own element type, then there's a problem. I deferred my solution about that to later. Apparently that later has come now.

It is instead necessary to use the programmer-defined format in toString (or 
later writeTo).
In the example below, trying to write() an element yields an infinite series of 
"[[[...", in trying to print it like an array.

Yep, yep, great feedback. This is a fixed point in the iteration typeof(r) -> typeof(r.front), which is not difficult to detect. One question would be - what to do when we _do_ detect that?

(One thing that should be kept in mind is that ranges are printed the way they are simply because I had to choose _something_. There is definitely a lot of room for improvement there in many aspects.)

Note that it is a common situation: for instance, (textual) string types like 
ones in high-level languages do not expose a separate element type (character); 
instead, a character is just a singleton of the type. Even more commonly, list, 
tree, graph nodes usually are of the same type as the container (or the 
root/container is of a sub-type implementing global methods).

This is interesting, but I find it difficult to imagine concrete cases because in all my code the iteration strategy is encoded in the range's type. For example, splitter() returns a range that iterates a string by words. Arguably the input and output have the same type, but the type of the range is _not_ string, it's Splitter!string or whatever. Similarly, for a self-referential data structure, the range type would be e.g. InOrder!TreeNode, PostOrder!TreeNode etc. I guess there are legitimate cases in which the TreeNode has front() that returns itself and so on, so we need to address this.

Another problem is that if the type is a class (as opposed to a struct), then 
defining a range interface introduces a template-selection conflict with the 
standard class case: as of now, they are not mutually exclusive, leading to 
compile-time error.

I don't understand this, could you please give a bit of detail?

Anyway, what is the need for a range-specific output format? And why should it 
exactly look like an array (misleading)?

Because I needed to put something there and couldn't think of something better...

Structs and classes also implementing ranges can define toString (later 
writeTo) for an accurate format; and in case this would be the right choice, 
reproducing an array-like format is a few lines of code.

Agreed. writeTo() with a delegate taking const(char)[] has been discussed and I think is a good solution.

-2- indexed iteration

It seems there is no way to directly define indexed iteration using ranges, 
like commonly needed by:
     foreach(index,element ; collection) {...}

Aha! The time of reckoning has come :o).

A possible but inadequate workaround would be to define a tool type for this:
     struct TraversalPair (Element) {uint index ; Element element;}
Then define the range's element output routines (opIndex, front, back) to 
return pairs instead of elements; and use this like:
     foreach (pair ; collection) {actWith(pair.element, pair.index);}
But this requires a client programmer to know this particularity; and anyway 
does not fit D common style and practice, I guess.

Clearly that's an annoying limitation of ranges when compared against opApply, and we need to fix that. That being said, I see your example uses the index as a simple counter 0, 1, 2, ... so again putting my "could this or could this not be done in the D programming language?" I can't stop thinking about this:

size_t index = 0;
foreach (e; collection) {
    ...
    ++index;
}

You need to mind the presence of "continue" and the larger scope of "index", but these are simple matters.

Annoying? Definitely. Makes ranges unusable? Perhaps not as much.

How to solve this practically? I would be happy with the above workaround if it 
became a commonly used solution, supported by the stdlib and if necessary by 
the core language. This may scale by defining the tool type as a superclass 
instead, allowing various variants, possibly with more elements.
Maybe an alternative is to use tuples: allow variants of opIndex, front, and 
back, to return (index,element) tuples and let the compiler use these overloads 
when the client code requests 2 (or more) returned values.
The first solution is more explicite, and possibly general; the second may fit 
common practice better if/when tuples become widely used (a rather far&  
hypothetical future ;-) ?).

I'd think tuples are the simplest solution and the basis of the language change that fixes this insufficiency.

First, if all you want is an index you can write this:

foreach (e; zip(iota(r.length), r))
{
    // e[0] is 0, 1, 2, ...
    // e[1] is the current range element
}

If the range doesn't have a length you can use iota with the largest number - iteration will stop at the shortest:

foreach (e; zip(iota(size_t.max), r))
{
    // e[0] is 0, 1, 2, ...
    // e[1] is the current range element
}

If such uses of iota become common we could make iota() with no arguments just go 0, 1, 2,..., size_t.max, 0, 1, 2,...

Second, if you want to do some cleverer stuff than just 0, 1, 2,... you could always have r.front yield a tuple:

struct MyRange
{
    @property Tuple!(string, double) front() { ... }
    ...
}

foreach (e; r)
{
    // e[0] is the current string
    // e[1] is the current double
}

This is also the basis of a language solution. Currently the language lowers foreach for ranges down to a loop that uses front(), popFront(), and empty(). We discussed in this group a while ago that foreach with multiple iteration elements (e.g. a, b, c) could be lowered by binding a to r.front[0], b to r.front[1], and c to r.front[2 .. $]. This is compatible with the current lowering and allows convenient usage with ranges that have either tuples or arrays as their front.

Thanks again for this report. We definitely need to work on all of the above. Let me ask you this - did the project actually need to use algorithms in std.algorithm? Based on my experience, my assumption is that you didn't need the more complex algorithms (e.g. sort or bringToFront) at all, and that you didn't have intensive use for the simpler algorithms (map, reduce) either. This is because if you did, ranges would have had enough strategic advantage for you, and consequently you would have been much more inclined to work around the issues mentioned above. As such, it's possible that you started with ranges because they sounded nice to have but didn't quite find a solid need for their advantages over e.g. opApply. Please let me know - thanks.


Andrei

Reply via email to