The Scrunch version of combine accepts a function Iterable[V] => V . This
causes a lot of unexpected behaviour because the iterable that is wrapped
is actually a SingleUseIterable, and much of Scala's collection function
implementations actually try and access the underlying iterator multiple
times if they know that it's possible. This leads to often having to write
code like this:
...
.groupByKey()
.combine { _.iterator reduce { _ + _ } }
This is a silly example of course, because there's an Aggregator for
summation, but if your reduce function is more complex you have to do this
indirection via iterator in order to get correct behaviour.
Possible fixes:
a) Change combine to accept a function TraversableOnce[V] => V or
Iterator[V] => V, better reflecting the single-use nature of the underlying
Iterable
b) Given that most custom combines will in fact be folds over monoids, we
could promote the notion of reduce or fold up the the PGroupedTable itself,
so you can do .groupByKey().foldValues(_+_)