I am looking into implementing a new Aggregator in the datasketches extension 
using the ItemSketch in the frequencies package:

https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html
https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/frequencies

Ive started on a partial implementation here (still a WIP, lots of TODOs):
https://github.com/apache/druid/compare/master...michaelschiff:fis-aggregator?expand=1

>From everything I've seen, it's critical that there is an efficient 
>implementation of BufferAggregator. The existing aggregators take advantage of 
>other sketch types providing "Direct" implementations that are implemented 
>directly against a ByteBuffer.  This leads to fairly transparent 
>implementation of BufferAggregator.  ItemSketch is able to serialize itself 
>and to wrap ByteBuffer for instantiation, but the actual interactions are all 
>on heap (core of the implementation is 
>https://github.com/apache/datasketches-java/blob/27ecce938555d731f29df97f12f4744a0efb663d/src/main/java/org/apache/datasketches/frequencies/ReversePurgeItemHashMap.java).

Can anyone confirm that it is critical (i.e. Aggregator will not function) to 
have an implementation of BufferAggregator? Assuming it is, we can begin 
talking with the datasketches team about the possibility of a Direct 
implementation.  I am also thinking of finishing the implementation by 
explicitly serializing the entire sketch on each update, but this would only be 
for experimentation as I doubt this is the intended behavior for 
implementations of BufferedAggregator.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to