Re: range practicle use

Andrei Alexandrescu Thu, 30 Dec 2010 09:20:23 -0800

On 12/30/10 6:41 AM, spir wrote:

Hello,



In the course of a project (1) 2 partner D programmers and myself are
currently implementing, we faced 2 issues which prevented us using a
range interface as planned. We initially intended to do it for better
compliance with D's coming new style, and nice inter-relation with
Phobos modules like std.algorithm. Instead, we use opApply to
implement traversal; which does the job. This post's purpose is to
expose these problems to help and making range interfaces better
usable, or simply usable, in various practical cases. (Opinions are
mine --I don't known my partners' exact position, except that they
indeed agree these topics plainly prevent us using ranges.)

[snip]

Thanks very much for taking the time to document your experience. Thisis very useful.

-1- textual output
Functions like writeln silently use the formatValue set of templates

in std.format, to get a textual form for the element to be output.
Template constraints determine which version is used for a given kind of
element. As implemented now:

* There are formatValue variants for elements that implement a range

interface (which produce an array-like form).

* Constraint selection for a range interface has high precedence;

especially it shortcuts any toString function explicitely implemented by
the programmer.


This prevents a user to define any adequate output format for the

elements of a custom type, if the type also happens to implement a range
interface. A critical bug, and an absolute blocker for us.

Seeing "absolute blocker" here makes me think: is there /absolutely noway/ to get that particular task done in the D programming language? Imight be off here because I don't have details, but my knee-jerkreaction is:


struct MyRange { ... }
void print(MyRange r) { ... }

Of course it would be great if you could use the predefinedwrite/writeln functions, and we should improve them, but I have a hardtime considering this an absolute blocker. You write a function once andthen you call it. I do that in my C++ code all the time. Iostreams areterrible (to be euphemistic), but they don't prevent me and others fromusing C++.

Actually, this cannot even work in numerous cases. In our project, the range's 
ElementType (as returned by opindex&  front) would happen to be identical to 
the range type itself. In this case, the formatting algorithm via a range runs into 
an infinite loop.

Ha, glad you mention that. When I first designed ranges I had this fearin a couple of places: if a range is its own element type, then there'sa problem. I deferred my solution about that to later. Apparently thatlater has come now.

It is instead necessary to use the programmer-defined format in toString (or 
later writeTo).
In the example below, trying to write() an element yields an infinite series of 
"[[[...", in trying to print it like an array.

Yep, yep, great feedback. This is a fixed point in the iterationtypeof(r) -> typeof(r.front), which is not difficult to detect. Onequestion would be - what to do when we _do_ detect that?

(One thing that should be kept in mind is that ranges are printed theway they are simply because I had to choose _something_. There isdefinitely a lot of room for improvement there in many aspects.)

Note that it is a common situation: for instance, (textual) string types like 
ones in high-level languages do not expose a separate element type (character); 
instead, a character is just a singleton of the type. Even more commonly, list, 
tree, graph nodes usually are of the same type as the container (or the 
root/container is of a sub-type implementing global methods).

This is interesting, but I find it difficult to imagine concrete casesbecause in all my code the iteration strategy is encoded in the range'stype. For example, splitter() returns a range that iterates a string bywords. Arguably the input and output have the same type, but the type ofthe range is _not_ string, it's Splitter!string or whatever. Similarly,for a self-referential data structure, the range type would be e.g.InOrder!TreeNode, PostOrder!TreeNode etc. I guess there are legitimatecases in which the TreeNode has front() that returns itself and so on,so we need to address this.

Another problem is that if the type is a class (as opposed to a struct), then 
defining a range interface introduces a template-selection conflict with the 
standard class case: as of now, they are not mutually exclusive, leading to 
compile-time error.


I don't understand this, could you please give a bit of detail?

Anyway, what is the need for a range-specific output format? And why should it 
exactly look like an array (misleading)?

Because I needed to put something there and couldn't think of somethingbetter...

Structs and classes also implementing ranges can define toString (later 
writeTo) for an accurate format; and in case this would be the right choice, 
reproducing an array-like format is a few lines of code.

Agreed. writeTo() with a delegate taking const(char)[] has beendiscussed and I think is a good solution.

-2- indexed iteration

It seems there is no way to directly define indexed iteration using ranges, 
like commonly needed by:
     foreach(index,element ; collection) {...}


Aha! The time of reckoning has come :o).

A possible but inadequate workaround would be to define a tool type for this:
     struct TraversalPair (Element) {uint index ; Element element;}
Then define the range's element output routines (opIndex, front, back) to 
return pairs instead of elements; and use this like:
     foreach (pair ; collection) {actWith(pair.element, pair.index);}
But this requires a client programmer to know this particularity; and anyway 
does not fit D common style and practice, I guess.

Clearly that's an annoying limitation of ranges when compared againstopApply, and we need to fix that. That being said, I see your exampleuses the index as a simple counter 0, 1, 2, ... so again putting my"could this or could this not be done in the D programming language?" Ican't stop thinking about this:


size_t index = 0;
foreach (e; collection) {
    ...
    ++index;
}

You need to mind the presence of "continue" and the larger scope of"index", but these are simple matters.


Annoying? Definitely. Makes ranges unusable? Perhaps not as much.

How to solve this practically? I would be happy with the above workaround if it 
became a commonly used solution, supported by the stdlib and if necessary by 
the core language. This may scale by defining the tool type as a superclass 
instead, allowing various variants, possibly with more elements.
Maybe an alternative is to use tuples: allow variants of opIndex, front, and 
back, to return (index,element) tuples and let the compiler use these overloads 
when the client code requests 2 (or more) returned values.
The first solution is more explicite, and possibly general; the second may fit 
common practice better if/when tuples become widely used (a rather far&  
hypothetical future ;-) ?).

I'd think tuples are the simplest solution and the basis of the languagechange that fixes this insufficiency.


First, if all you want is an index you can write this:

foreach (e; zip(iota(r.length), r))
{
    // e[0] is 0, 1, 2, ...
    // e[1] is the current range element
}

If the range doesn't have a length you can use iota with the largestnumber - iteration will stop at the shortest:


foreach (e; zip(iota(size_t.max), r))
{
    // e[0] is 0, 1, 2, ...
    // e[1] is the current range element
}

If such uses of iota become common we could make iota() with noarguments just go 0, 1, 2,..., size_t.max, 0, 1, 2,...

Second, if you want to do some cleverer stuff than just 0, 1, 2,... youcould always have r.front yield a tuple:


struct MyRange
{
    @property Tuple!(string, double) front() { ... }
    ...
}

foreach (e; r)
{
    // e[0] is the current string
    // e[1] is the current double
}

This is also the basis of a language solution. Currently the languagelowers foreach for ranges down to a loop that uses front(), popFront(),and empty(). We discussed in this group a while ago that foreach withmultiple iteration elements (e.g. a, b, c) could be lowered by binding ato r.front[0], b to r.front[1], and c to r.front[2 .. $]. This iscompatible with the current lowering and allows convenient usage withranges that have either tuples or arrays as their front.

Thanks again for this report. We definitely need to work on all of theabove. Let me ask you this - did the project actually need to usealgorithms in std.algorithm? Based on my experience, my assumption isthat you didn't need the more complex algorithms (e.g. sort orbringToFront) at all, and that you didn't have intensive use for thesimpler algorithms (map, reduce) either. This is because if you did,ranges would have had enough strategic advantage for you, andconsequently you would have been much more inclined to work around theissues mentioned above. As such, it's possible that you started withranges because they sounded nice to have but didn't quite find a solidneed for their advantages over e.g. opApply. Please let me know - thanks.



Andrei

Re: range practicle use

Reply via email to