I'm working on https://github.com/D-Programming-Language/phobos/pull/1186, which is somewhat important; "group by" is a powerful operation. Along the way, I stumbled upon an interesting issue in which I wanted to consult the community.

Usually groupBy is used to find runs of equivalent elements in a range. For example:

[1, 1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 6].groupBy()

yields the range [[1, 1, 1], [2, 2], [3, 3, 3], [4], [5, 5], [6]].

As is usual with Phobos algorithms, groupBy accepts a predicate. The default (as illustrated above) is "a = b", i.e. all elements in a group are equal to one another.

Equality is transitive and commutative. But there are useful cases in which predicates are not commutative. Consider we want to find strictly monotonic subranges in the range. We'd write:

auto r = [1, 3, 2, 4, 5, 1].groupBy!"a < b";

That should produce [[1, 3], [2, 4, 5], [1]]. For non-strict monotonic runs, the predicate would be "a <= b" etc. All that is pretty awesome.

However, that makes life a bit tougher for the algorithm - it must only compare adjacent elements only. In the case of "a = b", it suffices to save the first element in a group and compare it against every other element in the group.

Meanwhile, a very similar pull request (https://github.com/D-Programming-Language/phobos/pull/1453) uses unary predicates, i.e. an optional transformation function that is then used in conjunction with "==" to decide which elements belong in the same group.

Unary predicates make life simpler for the algorithm (save the transform of the first element, then compare it against the transform of the next etc) and are often easier to write by the end user, too (e.g. just write "a.length" instead of "a.length == b.length" to group by length).

So I was thinking to allow both cases, with the understanding that grouping by unary predicates uses "==" for comparison whereas grouping by binary predicates looks at adjacent elements to figure out group membership. That approach would, however, preclude the use of string lambdas (because deducing arity for string lambdas is possible, but quite unwieldy).

What do you think?


Andrei

Reply via email to