[Issue 13936] groupBy must be redone

2015-08-20 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

hst...@quickfur.ath.cx changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #50 from hst...@quickfur.ath.cx ---
Since nobody objects, I'm going to resolve this now.

--


[Issue 13936] groupBy must be redone

2015-06-30 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #49 from hst...@quickfur.ath.cx ---
Sounds like this is done now. Should we resolve this bug?

--


[Issue 13936] groupBy must be redone

2015-03-21 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #48 from Andrei Alexandrescu and...@erdani.com ---
Thanks everybody for their work! Is this done now?

--


[Issue 13936] groupBy must be redone

2015-01-16 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

hst...@quickfur.ath.cx changed:

   What|Removed |Added

   Keywords||pull

--- Comment #47 from hst...@quickfur.ath.cx ---
https://github.com/D-Programming-Language/phobos/pull/2878

--


[Issue 13936] groupBy must be redone

2015-01-14 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #40 from Andrei Alexandrescu and...@erdani.com ---
RefCounted can be made observably pure (by means of casts) but making it safe
must wait for the scoped work. Could you please file a bug against RefCounted?

--


[Issue 13936] groupBy must be redone

2015-01-14 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #39 from hst...@quickfur.ath.cx ---
I've run into some roadblocks: RefCounted is @system which makes groupBy
@system. This can be mostly swept under the carpet by making the GroupBy ctor
@trusted, but there's a further problem: RefCounted is also (inherently)
impure, which makes this design unusable in pure code. :-(

--


[Issue 13936] groupBy must be redone

2015-01-14 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #46 from bearophile_h...@eml.cc ---
(In reply to Andrei Alexandrescu from comment #40)
 RefCounted can be made observably pure (by means of casts)

Sounds like a trusted pure could be useful.

--


[Issue 13936] groupBy must be redone

2015-01-14 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #45 from Andrei Alexandrescu and...@erdani.com ---
Guess that counts for a new bug or an addition to the one you added. Thanks!

--


[Issue 13936] groupBy must be redone

2015-01-14 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #44 from hst...@quickfur.ath.cx ---
I didn't look further into it, but the current groupBy code was inferred
nothrow, but the new code was inferred as throwing, so the nothrow unittests
refused to compile. The compiler error indicated that it was the RefCounted
ctor/dtor that broke nothrow.

--


[Issue 13936] groupBy must be redone

2015-01-14 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #43 from Andrei Alexandrescu and...@erdani.com ---
Wouldn't nothrow be deduced?

--


[Issue 13936] groupBy must be redone

2015-01-14 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #42 from hst...@quickfur.ath.cx ---
Filed: https://issues.dlang.org/show_bug.cgi?id=13983

--


[Issue 13936] groupBy must be redone

2015-01-08 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #9 from hst...@quickfur.ath.cx ---
The nice thing about assuming non-equivalence relation by default is that it's
the most general: it will produce correct behaviour in all use cases with no
further work required from the user. In my initial implementation of groupBy,
it assumed equivalence relation by default, and led to buggy behaviour like
non-terminating empty ranges or wrong groupings because either transitivity
or reflexivity failed to hold, but the code assumed they did.

My initial reaction was to define it away -- state that only equivalence
relations are supported and close the bug. But upon further thought, I decided
that supporting non-equivalence relations opens up more possibilities: you can
now do things like grouping by maximum adjacent difference, for example, which
can be useful as a kind of edge detector in a stream of data, but maximum
adjacent difference is not an equivalence relation due to non-transitivity. It
makes groupBy useful beyond the usual mundane equivalence relation based
predicates.

In any case, this is a (mostly) orthogonal issue to performance. The problem I
see with your call home proposal is that it introduces a lot of hidden
complexities that either may offset the performance gain, or introduces subtle
semantic corner cases that are ugly to deal with. First of all, this would
require the subranges to retain a reference to the outer range. There are
several problems with this:

(1) the outer range can no longer be a by-value type, because if it's a struct,
structs are liable to being moved around, invalidating references to it;

(2) so we have to allocate on the heap, which is already a minus given the
current push for reducing Phobos allocations.

(3) Which in turn means GC load, unless we introduce reference counting, which
adds additional complication (plus overhead to maintain the ref count).

(4) The subranges may need extra care to implement -- if they are a by-value
struct type, the struct may get copied around, leaving stray references to the
outer range, so either you need GC to clean up, or you need to tread into
dangerous territory of struct dtors (which leads into areas with many compiler
bugs and/or limitations, and fragile behaviour -- a lot of Phobos code doesn't
work well with structs that have dtors because they don't have nice semantics
for things like T tmp; return tmp; to get an empty range of the same type,
for example).

(5) This *still* doesn't fully solve the performance issue -- if I have a
1000-element subrange and I iterate over 900 of them and then give up and
decide to move on to the next subrange, the parent range will still repeat the
1000 popFront's from the beginning of the subrange because the subrange never
called home.

(6) Your call-home trick falls flat on its face if somebody calls .save on the
outer range.  Which copy of the outer range should the subranges call home to?

Using the single-subrange idea I had, does solve (5), but then it introduces
poor API because the subranges can no longer be forward ranges. (In retrospect
this is not surprising, since we *are* trying to optimize for the single-pass
use case, so input-only ranges would be the natural solution. It's wanting
*both* single-pass performance *plus* the full semantics of range-of-ranges
that makes things complicated.) One idea that occurs to me is to store the
subrange as a struct inside the parent range (so it never needs to call home,
the parent range just picks up on where it left off and continues scanning for
the next subrange), and have .save just return an independent copy of the
subrange. However, this isn't a foolproof solution either, since a common idiom
with forward ranges is to always work with a .save'd copy of a range, so then
the user never actually uses the embedded copy of the subrange and any
popFront()'s called on the .save'd copy doesn't benefit the outer range.

One possible solution I can think of, that still requires the extra complexity
of heap-allocating the outer range, but does solve (5): one of the possibly
multiple copies of the subranges is designated as the furthest subrange, and
gets the special privilege of having a reference to it from the outer range.
Each copy of the subrange competes for this position by keeping a pop-counter
that is bumped every time popFront is called. If the pop-counter exceeds the
pop-counter of the current furthest subrange, the range that went further
installs itself as the new furthest subrange in the outer range, replacing
the old champion. (The old champion can snatch the position back if it later on
has more popFront's called.) When popFront on the outer range is called, it
.save's the current furthest subrange and scans for the next subrange boundary.
It also bumps the serial number, which indicates to all previous subranges that
they've all been demoted and can no longer compete for the furthest subrange
trophy. This ensures that the outer 

[Issue 13936] groupBy must be redone

2015-01-08 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #12 from hst...@quickfur.ath.cx ---
I don't see what's the big problem with supporting non-equivalence relations,
all it means is that you have to evaluate the predicate only between adjacent
elements rather than between an element and the head of its subrange (which was
what was done in the initial implementation). But whatever, I don't actually
have any current code that relies on groupBy supporting non-equivalence
relations, so I don't really care. I just don't understand why this has
suddenly become such a major issue.

By it's complicated I do not mean it's complicated to write code for --
that's a non-issue. What I mean is it introduces complicated semantics, which
inevitably results in obscure corner cases that have unexpected/pathological
behaviour. This includes effects of similar calibre as transient ranges (they
technically conform 100% to the range API but have unusual behaviour that
cannot be easily dealt with). One such effect is the semantics of .save.

The compiler bugs bit, I admit, is a bit a of red herring, though. :-P

Don't forget that random-access ranges are still forward ranges, so you still
have to deal with the semantics of .save.

And I wasn't trying to find reasons it can't be done; I was just pointing out
obvious flaws in your proposed solution that would introduce more problems than
it solves.

--