On 12/30/10 6:41 AM, spir wrote:
Hello,
In the course of a project (1) 2 partner D programmers and myself are
currently implementing, we faced 2 issues which prevented us using a
range interface as planned. We initially intended to do it for better
compliance with D's coming new style, and nice inter-relation with
Phobos modules like std.algorithm. Instead, we use opApply to
implement traversal; which does the job. This post's purpose is to
expose these problems to help and making range interfaces better
usable, or simply usable, in various practical cases. (Opinions are
mine --I don't known my partners' exact position, except that they
indeed agree these topics plainly prevent us using ranges.)
[snip]
Thanks very much for taking the time to document your experience. This
is very useful.
-1- textual output
Functions like writeln silently use the formatValue set of templates
in std.format, to get a textual form for the element to be output.
Template constraints determine which version is used for a given kind of
element. As implemented now:
* There are formatValue variants for elements that implement a range
interface (which produce an array-like form).
* Constraint selection for a range interface has high precedence;
especially it shortcuts any toString function explicitely implemented by
the programmer.
This prevents a user to define any adequate output format for the
elements of a custom type, if the type also happens to implement a range
interface. A critical bug, and an absolute blocker for us.
Seeing "absolute blocker" here makes me think: is there /absolutely no
way/ to get that particular task done in the D programming language? I
might be off here because I don't have details, but my knee-jerk
reaction is:
struct MyRange { ... }
void print(MyRange r) { ... }
Of course it would be great if you could use the predefined
write/writeln functions, and we should improve them, but I have a hard
time considering this an absolute blocker. You write a function once and
then you call it. I do that in my C++ code all the time. Iostreams are
terrible (to be euphemistic), but they don't prevent me and others from
using C++.
Actually, this cannot even work in numerous cases. In our project, the range's
ElementType (as returned by opindex& front) would happen to be identical to
the range type itself. In this case, the formatting algorithm via a range runs into
an infinite loop.
Ha, glad you mention that. When I first designed ranges I had this fear
in a couple of places: if a range is its own element type, then there's
a problem. I deferred my solution about that to later. Apparently that
later has come now.
It is instead necessary to use the programmer-defined format in toString (or
later writeTo).
In the example below, trying to write() an element yields an infinite series of
"[[[...", in trying to print it like an array.
Yep, yep, great feedback. This is a fixed point in the iteration
typeof(r) -> typeof(r.front), which is not difficult to detect. One
question would be - what to do when we _do_ detect that?
(One thing that should be kept in mind is that ranges are printed the
way they are simply because I had to choose _something_. There is
definitely a lot of room for improvement there in many aspects.)
Note that it is a common situation: for instance, (textual) string types like
ones in high-level languages do not expose a separate element type (character);
instead, a character is just a singleton of the type. Even more commonly, list,
tree, graph nodes usually are of the same type as the container (or the
root/container is of a sub-type implementing global methods).
This is interesting, but I find it difficult to imagine concrete cases
because in all my code the iteration strategy is encoded in the range's
type. For example, splitter() returns a range that iterates a string by
words. Arguably the input and output have the same type, but the type of
the range is _not_ string, it's Splitter!string or whatever. Similarly,
for a self-referential data structure, the range type would be e.g.
InOrder!TreeNode, PostOrder!TreeNode etc. I guess there are legitimate
cases in which the TreeNode has front() that returns itself and so on,
so we need to address this.
Another problem is that if the type is a class (as opposed to a struct), then
defining a range interface introduces a template-selection conflict with the
standard class case: as of now, they are not mutually exclusive, leading to
compile-time error.
I don't understand this, could you please give a bit of detail?
Anyway, what is the need for a range-specific output format? And why should it
exactly look like an array (misleading)?
Because I needed to put something there and couldn't think of something
better...
Structs and classes also implementing ranges can define toString (later
writeTo) for an accurate format; and in case this would be the right choice,
reproducing an array-like format is a few lines of code.
Agreed. writeTo() with a delegate taking const(char)[] has been
discussed and I think is a good solution.
-2- indexed iteration
It seems there is no way to directly define indexed iteration using ranges,
like commonly needed by:
foreach(index,element ; collection) {...}
Aha! The time of reckoning has come :o).
A possible but inadequate workaround would be to define a tool type for this:
struct TraversalPair (Element) {uint index ; Element element;}
Then define the range's element output routines (opIndex, front, back) to
return pairs instead of elements; and use this like:
foreach (pair ; collection) {actWith(pair.element, pair.index);}
But this requires a client programmer to know this particularity; and anyway
does not fit D common style and practice, I guess.
Clearly that's an annoying limitation of ranges when compared against
opApply, and we need to fix that. That being said, I see your example
uses the index as a simple counter 0, 1, 2, ... so again putting my
"could this or could this not be done in the D programming language?" I
can't stop thinking about this:
size_t index = 0;
foreach (e; collection) {
...
++index;
}
You need to mind the presence of "continue" and the larger scope of
"index", but these are simple matters.
Annoying? Definitely. Makes ranges unusable? Perhaps not as much.
How to solve this practically? I would be happy with the above workaround if it
became a commonly used solution, supported by the stdlib and if necessary by
the core language. This may scale by defining the tool type as a superclass
instead, allowing various variants, possibly with more elements.
Maybe an alternative is to use tuples: allow variants of opIndex, front, and
back, to return (index,element) tuples and let the compiler use these overloads
when the client code requests 2 (or more) returned values.
The first solution is more explicite, and possibly general; the second may fit
common practice better if/when tuples become widely used (a rather far&
hypothetical future ;-) ?).
I'd think tuples are the simplest solution and the basis of the language
change that fixes this insufficiency.
First, if all you want is an index you can write this:
foreach (e; zip(iota(r.length), r))
{
// e[0] is 0, 1, 2, ...
// e[1] is the current range element
}
If the range doesn't have a length you can use iota with the largest
number - iteration will stop at the shortest:
foreach (e; zip(iota(size_t.max), r))
{
// e[0] is 0, 1, 2, ...
// e[1] is the current range element
}
If such uses of iota become common we could make iota() with no
arguments just go 0, 1, 2,..., size_t.max, 0, 1, 2,...
Second, if you want to do some cleverer stuff than just 0, 1, 2,... you
could always have r.front yield a tuple:
struct MyRange
{
@property Tuple!(string, double) front() { ... }
...
}
foreach (e; r)
{
// e[0] is the current string
// e[1] is the current double
}
This is also the basis of a language solution. Currently the language
lowers foreach for ranges down to a loop that uses front(), popFront(),
and empty(). We discussed in this group a while ago that foreach with
multiple iteration elements (e.g. a, b, c) could be lowered by binding a
to r.front[0], b to r.front[1], and c to r.front[2 .. $]. This is
compatible with the current lowering and allows convenient usage with
ranges that have either tuples or arrays as their front.
Thanks again for this report. We definitely need to work on all of the
above. Let me ask you this - did the project actually need to use
algorithms in std.algorithm? Based on my experience, my assumption is
that you didn't need the more complex algorithms (e.g. sort or
bringToFront) at all, and that you didn't have intensive use for the
simpler algorithms (map, reduce) either. This is because if you did,
ranges would have had enough strategic advantage for you, and
consequently you would have been much more inclined to work around the
issues mentioned above. As such, it's possible that you started with
ranges because they sounded nice to have but didn't quite find a solid
need for their advantages over e.g. opApply. Please let me know - thanks.
Andrei