On Friday, 2012-11-02, Josh Wills wrote: > On Fri, Nov 2, 2012 at 3:13 AM, Matthias Friedrich <[email protected]> wrote: [...] >> One thing I don't understand: Is there really a need for >> AggregatorFactory? Am I overlooking something? It's not like we have >> to create Aggregator instances for each key -- that's what >> Aggregator.reset() is for. > Yeah, the reason for AggregatorFactory (which I think is evident from the > tests) is when you want to do aggregations over tuples, e.g., PTable<K, > Pair<Double, Double>> can't reuse the same SUM_DOUBLE aggregator for both > values of the Pair instance. Hmm, I'm not sure if that's something we need to prevent actively. Users will probably write it like this:
PGroupedTable<String, Pair<Integer, Integer>> table = /* ... */; table.combineValues( pairAggregator(SUM_INTS(), SUM_INTS()) ); And that's how I would document it on the Aggregators javadoc. Having the factory in there makes the API a bit harder to use if you want to write your own Aggregator. My guess is people will use this correctly, since it's more work to do it wrong. Anybody who really wants to shoot himself in the foot can always return the same instance for each factory call. But maybe they even have a good reason for it :) Regards, Matthias
