https://issues.dlang.org/show_bug.cgi?id=13936
--- Comment #5 from Andrei Alexandrescu <and...@erdani.com> --- (In reply to hsteoh from comment #4) > @andrei: Thanks for answering. OK, let's see. > The current implementation of groupBy only traverses the range once if it's > an input range (non-forward). Yes - couldn't even do otherwise. Input non-forward ranges move along all together (that's somewhat implied but let's assume it's always the case). So groupBy relies on the fact that if r1 and r2 are copies of the same input/non-forward range, calling popFront in one also pops the front in the other. That's efficient and all. The challenge here is to make sure the same goes for forward ranges. The simplest way to achieve that is by using a pointer, i.e. instead of using `Range`, use `Range*`. Then copying the pointer around will refer to the same actual range. Calling popFront through one will also pop the other. But that produces garbage so a RefCounted!Range comes to mind. That would probably be a nice solution. In my opinion what we have right now penalizes forward ranges too much to be workable. > The difference between equivalence and non-equivalence relations is actually > important; it's not a "fussy" issue. A non-equivalence relation requires > that you always evaluate the predicate on two adjacent elements, because you > cannot assume transitivity: the fact that pred(r[0], r[1]) and pred(r[1], > r[2]) holds, does not imply pred(r[0], r[2]) also holds, so you have to > .save every previous position of the range instead of just once per subrange > (determining the end of the subrange simply by evaluating the predicate with > the head element of that subrange), when you know that pred is an > equivalence relation. We must go with efficiency for the common classic relational case. I think a distinct place we do not want to be in is: "Well for cool relational stuff you can use groupBy, it's very general and actually goes outside what relational algebra can do. However, if you want speed don't use it; you gotta use manual loops." We must handle the typical relational algebra treatment, and handle it well enough that handmade approaches are unnecessary. If someone has a weird predicate and a weird requirement they can use adjacentFind or handmade loops etc. We don't need to cater for them. Pushing two ranges along is inefficient and therefore unacceptable. > Your fast/slow range idea is flawed, because the outer range's .front must > return an independent instance of the subrange. This is an artifact of the > range API. There is no way for the outer range to keep track of how far the > subrange has moved ahead without introducing some kind of reference > semantics to it, which becomes messy if the caller invokes .front multiple > times. So there is no way, or there's a messy way? Big difference. I say there is a way. > For example, if the user calls .front twice and .save's both > subranges, then randomly iterates one or the other, what should happen? > Should the outer range's 'fast' range track the first subrange, the second, > or both, or neither (because they are .save'd copies)? Does advancing the > first copy of the subrange also advance the second? (Which, btw, breaks the > semantics of .save)? > > The only way for your fast/slow range idea to work is to make subranges > non-forward, so that there is only one copy of the 'fast' range that gets > advanced no matter which copy of it you call popFront on. But this means > users can no longer call .save on subranges, which sux if they handed you a > forward range precisely for that purpose. This can be handled, it's indeed not simple. There is an implementation in https://github.com/D-Programming-Language/phobos/pull/1186 that is working but a bit oddly and is probably more complicated than it needs to. Anyhow, the larger point is we want groupBy to work as close to native speed for the common case of going once through all groups and with an equivalence relation. For more complex uses, Just Fine(tm) at a small extra cost should be okay. Is this reasonable? --