I am looking into implementing a new Aggregator in the datasketches extension using the ItemSketch in the frequencies package:
https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/frequencies Ive started on a partial implementation here (still a WIP, lots of TODOs): https://github.com/apache/druid/compare/master...michaelschiff:fis-aggregator?expand=1 >From everything I've seen, it's critical that there is an efficient >implementation of BufferAggregator. The existing aggregators take advantage of >other sketch types providing "Direct" implementations that are implemented >directly against a ByteBuffer. This leads to fairly transparent >implementation of BufferAggregator. ItemSketch is able to serialize itself >and to wrap ByteBuffer for instantiation, but the actual interactions are all >on heap (core of the implementation is >https://github.com/apache/datasketches-java/blob/27ecce938555d731f29df97f12f4744a0efb663d/src/main/java/org/apache/datasketches/frequencies/ReversePurgeItemHashMap.java). Can anyone confirm that it is critical (i.e. Aggregator will not function) to have an implementation of BufferAggregator? Assuming it is, we can begin talking with the datasketches team about the possibility of a Direct implementation. I am also thinking of finishing the implementation by explicitly serializing the entire sketch on each update, but this would only be for experimentation as I doubt this is the intended behavior for implementations of BufferedAggregator. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org