+brock explicitly Brock, do you remember why we did AggregatorFactory? It was like a year ago, and I can't remember if there was some additional reason for it or if I just did the impl without thinking the problem through.
J On Fri, Nov 2, 2012 at 9:53 AM, Matthias Friedrich <[email protected]> wrote: > On Friday, 2012-11-02, Josh Wills wrote: > > On Fri, Nov 2, 2012 at 3:13 AM, Matthias Friedrich <[email protected]> wrote: > [...] > >> One thing I don't understand: Is there really a need for > >> AggregatorFactory? Am I overlooking something? It's not like we have > >> to create Aggregator instances for each key -- that's what > >> Aggregator.reset() is for. > > > Yeah, the reason for AggregatorFactory (which I think is evident from the > > tests) is when you want to do aggregations over tuples, e.g., PTable<K, > > Pair<Double, Double>> can't reuse the same SUM_DOUBLE aggregator for both > > values of the Pair instance. > > Hmm, I'm not sure if that's something we need to prevent actively. > Users will probably write it like this: > > PGroupedTable<String, Pair<Integer, Integer>> table = /* ... */; > table.combineValues( pairAggregator(SUM_INTS(), SUM_INTS()) ); > > And that's how I would document it on the Aggregators javadoc. Having > the factory in there makes the API a bit harder to use if you want to > write your own Aggregator. My guess is people will use this correctly, > since it's more work to do it wrong. > > Anybody who really wants to shoot himself in the foot can always > return the same instance for each factory call. But maybe they even > have a good reason for it :) > > Regards, > Matthias > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
