https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #9 from hst...@quickfur.ath.cx ---
The nice thing about assuming non-equivalence relation by default is that it's
the most general: it will produce correct behaviour in all use cases with no
further work required from the user. In my initial implementation of groupBy,
it assumed equivalence relation by default, and led to buggy behaviour like
non-terminating empty ranges or "wrong" groupings because either transitivity
or reflexivity failed to hold, but the code assumed they did.

My initial reaction was to define it away -- state that only equivalence
relations are supported and close the bug. But upon further thought, I decided
that supporting non-equivalence relations opens up more possibilities: you can
now do things like grouping by maximum adjacent difference, for example, which
can be useful as a kind of edge detector in a stream of data, but maximum
adjacent difference is not an equivalence relation due to non-transitivity. It
makes groupBy useful beyond the usual mundane equivalence relation based
predicates.

In any case, this is a (mostly) orthogonal issue to performance. The problem I
see with your "call home" proposal is that it introduces a lot of hidden
complexities that either may offset the performance gain, or introduces subtle
semantic corner cases that are ugly to deal with. First of all, this would
require the subranges to retain a reference to the outer range. There are
several problems with this:

(1) the outer range can no longer be a by-value type, because if it's a struct,
structs are liable to being moved around, invalidating references to it;

(2) so we have to allocate on the heap, which is already a minus given the
current push for reducing Phobos allocations.

(3) Which in turn means GC load, unless we introduce reference counting, which
adds additional complication (plus overhead to maintain the ref count).

(4) The subranges may need extra care to implement -- if they are a by-value
struct type, the struct may get copied around, leaving stray references to the
outer range, so either you need GC to clean up, or you need to tread into
dangerous territory of struct dtors (which leads into areas with many compiler
bugs and/or limitations, and fragile behaviour -- a lot of Phobos code doesn't
work well with structs that have dtors because they don't have nice semantics
for things like "T tmp; return tmp;" to get an empty range of the same type,
for example).

(5) This *still* doesn't fully solve the performance issue -- if I have a
1000-element subrange and I iterate over 900 of them and then give up and
decide to move on to the next subrange, the parent range will still repeat the
1000 popFront's from the beginning of the subrange because the subrange never
called home.

(6) Your call-home trick falls flat on its face if somebody calls .save on the
outer range.  Which copy of the outer range should the subranges call home to?

Using the single-subrange idea I had, does solve (5), but then it introduces
poor API because the subranges can no longer be forward ranges. (In retrospect
this is not surprising, since we *are* trying to optimize for the single-pass
use case, so input-only ranges would be the natural solution. It's wanting
*both* single-pass performance *plus* the full semantics of range-of-ranges
that makes things complicated.) One idea that occurs to me is to store the
subrange as a struct inside the parent range (so it never needs to call home,
the parent range just picks up on where it left off and continues scanning for
the next subrange), and have .save just return an independent copy of the
subrange. However, this isn't a foolproof solution either, since a common idiom
with forward ranges is to always work with a .save'd copy of a range, so then
the user never actually uses the embedded copy of the subrange and any
popFront()'s called on the .save'd copy doesn't benefit the outer range.

One possible solution I can think of, that still requires the extra complexity
of heap-allocating the outer range, but does solve (5): one of the possibly
multiple copies of the subranges is designated as the "furthest subrange", and
gets the special privilege of having a reference to it from the outer range.
Each copy of the subrange "competes" for this position by keeping a pop-counter
that is bumped every time popFront is called. If the pop-counter exceeds the
pop-counter of the current "furthest subrange", the range that went further
installs itself as the new "furthest subrange" in the outer range, replacing
the old champion. (The old champion can snatch the position back if it later on
has more popFront's called.) When popFront on the outer range is called, it
.save's the current furthest subrange and scans for the next subrange boundary.
It also bumps the serial number, which indicates to all previous subranges that
they've all been demoted and can no longer compete for the furthest subrange
trophy. This ensures that the outer range always does the minimum necessary
work to find the next subrange.

However, this requires extra overhead in popFront for each subrange (a pointer
dereference, a comparison, and potentially one or more pointer assignments). At
the end of the day, this may turn out making performance worse, because the
optimizer is hampered by the aliasing caused by all these references and
probably won't be able to generate optimal code or elide all this extra work
that isn't needed in the single-pass use case. So sure, we save on the big-O
characteristics (O(n) where n=length of range, instead of O(n*r) where r=number
of subranges)), but it may not actually *run* faster in practice compared to a
plain old for-loop. And (6) is still not solved, which I surmise is a pretty
major showstopper.

I still think that if you want maximal performance for the single-pass case,
you should just return an input range -- and make this something chosen by the
user. We can have groupBy() return a fully-general forward range, and have
groupByFast() (or groupBy!(Yes.singlePass)) implement a truly optimal
single-pass algorithm where .save is not implemented, thus side-stepping the
complexity and reducing to basically a manually-written for-loop except it has
a range API. Well actually, you *can* implement .save for the outer range if
you use the embedded-subrange idea I proposed: the same struct holds both the
outer range and the inner range, so .save can just copy the struct and now you
have sane semantics on both copies. You just can't .save the inner range,
that's all. (Well actually, you *can*... by .save'ing the outer range, since
the inner range has reference semantics so .front will resume where the inner
range left off. You just can't save the inner range without also saving the
outer range.)

In retrospect, this seems to confirm my observation that basically the current
state of things is an artifact of subranges having the full (forward) range
API. If the outer range only allowed a "current subrange element"
cursor/iterator/what-have-you, we wouldn't have this problem, since it'd be
clear that there is only a single outer range coupled with a single subrange at
any time, so you don't have the many-to-many problem where somebody .save's
both outer and inner ranges and now you don't know how to have them talk to
each other in a sane way. By making the inner iteration a full-fledged range,
all of this baggage gets inherited. Even in my proposed single-subrange idea,
if you restrict the inner range to be input-only, the problem is still somewhat
tractable. But once you allow .save in the inner range, all hell breaks loose.
The thing is, if you knew ahead of time how the user is going to traverse the
range, you can optimize for that, but since we can't know that ahead of time,
either we need the user to tell us (I want full .save semantics on everything,
so give me GC outer/inner ranges; or, I want single-pass only, so make
everything a non-copyable input range, I don't care), or we have to cater to
the general case, which unfortunately means poor performance.

--

Reply via email to