It might be worth pointing out to them that we don't have a direct
implementation of a sketch storing a variable-length item.

Unless I'm missing something, we have theta and the original quantiles of
double. Maybe they want to check the speed when re-serializing after each
update?

  jon

On Sun, Jul 25, 2021, 10:33 PM leerho <[email protected]> wrote:

> Team, this is an important thread to be aware of. :)
>
> Lee.
>
> ---------- Forwarded message ---------
> From: Michael Schiff <[email protected]>
> Date: Fri, Jul 23, 2021 at 12:18 PM
> Subject: ItemsSketch Aggregator in druid-datasketches extension
> To: <[email protected]>
>
>
> I am looking into implementing a new Aggregator in the datasketches
> extension using the ItemSketch in the frequencies package:
>
> https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html
>
> https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/frequencies
>
> Ive started on a partial implementation here (still a WIP, lots of TODOs):
>
> https://github.com/apache/druid/compare/master...michaelschiff:fis-aggregator?expand=1
>
> From everything I've seen, it's critical that there is an efficient
> implementation of BufferAggregator. The existing aggregators take advantage
> of other sketch types providing "Direct" implementations that are
> implemented directly against a ByteBuffer.  This leads to fairly
> transparent implementation of BufferAggregator.  ItemSketch is able to
> serialize itself and to wrap ByteBuffer for instantiation, but the actual
> interactions are all on heap (core of the implementation is
> https://github.com/apache/datasketches-java/blob/27ecce938555d731f29df97f12f4744a0efb663d/src/main/java/org/apache/datasketches/frequencies/ReversePurgeItemHashMap.java
> ).
>
> Can anyone confirm that it is critical (i.e. Aggregator will not function)
> to have an implementation of BufferAggregator? Assuming it is, we can begin
> talking with the datasketches team about the possibility of a Direct
> implementation.  I am also thinking of finishing the implementation by
> explicitly serializing the entire sketch on each update, but this would
> only be for experimentation as I doubt this is the intended behavior for
> implementations of BufferedAggregator.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to