Team, this is an important thread to be aware of. :)

Lee.

---------- Forwarded message ---------
From: Michael Schiff <[email protected]>
Date: Fri, Jul 23, 2021 at 12:18 PM
Subject: ItemsSketch Aggregator in druid-datasketches extension
To: <[email protected]>


I am looking into implementing a new Aggregator in the datasketches
extension using the ItemSketch in the frequencies package:

https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html
https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/frequencies

Ive started on a partial implementation here (still a WIP, lots of TODOs):
https://github.com/apache/druid/compare/master...michaelschiff:fis-aggregator?expand=1

>From everything I've seen, it's critical that there is an efficient
implementation of BufferAggregator. The existing aggregators take advantage
of other sketch types providing "Direct" implementations that are
implemented directly against a ByteBuffer.  This leads to fairly
transparent implementation of BufferAggregator.  ItemSketch is able to
serialize itself and to wrap ByteBuffer for instantiation, but the actual
interactions are all on heap (core of the implementation is
https://github.com/apache/datasketches-java/blob/27ecce938555d731f29df97f12f4744a0efb663d/src/main/java/org/apache/datasketches/frequencies/ReversePurgeItemHashMap.java
).

Can anyone confirm that it is critical (i.e. Aggregator will not function)
to have an implementation of BufferAggregator? Assuming it is, we can begin
talking with the datasketches team about the possibility of a Direct
implementation.  I am also thinking of finishing the implementation by
explicitly serializing the entire sketch on each update, but this would
only be for experimentation as I doubt this is the intended behavior for
implementations of BufferedAggregator.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to