It might be worth pointing out to them that we don't have a direct implementation of a sketch storing a variable-length item.
Unless I'm missing something, we have theta and the original quantiles of double. Maybe they want to check the speed when re-serializing after each update? jon On Sun, Jul 25, 2021, 10:33 PM leerho <[email protected]> wrote: > Team, this is an important thread to be aware of. :) > > Lee. > > ---------- Forwarded message --------- > From: Michael Schiff <[email protected]> > Date: Fri, Jul 23, 2021 at 12:18 PM > Subject: ItemsSketch Aggregator in druid-datasketches extension > To: <[email protected]> > > > I am looking into implementing a new Aggregator in the datasketches > extension using the ItemSketch in the frequencies package: > > https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html > > https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/frequencies > > Ive started on a partial implementation here (still a WIP, lots of TODOs): > > https://github.com/apache/druid/compare/master...michaelschiff:fis-aggregator?expand=1 > > From everything I've seen, it's critical that there is an efficient > implementation of BufferAggregator. The existing aggregators take advantage > of other sketch types providing "Direct" implementations that are > implemented directly against a ByteBuffer. This leads to fairly > transparent implementation of BufferAggregator. ItemSketch is able to > serialize itself and to wrap ByteBuffer for instantiation, but the actual > interactions are all on heap (core of the implementation is > https://github.com/apache/datasketches-java/blob/27ecce938555d731f29df97f12f4744a0efb663d/src/main/java/org/apache/datasketches/frequencies/ReversePurgeItemHashMap.java > ). > > Can anyone confirm that it is critical (i.e. Aggregator will not function) > to have an implementation of BufferAggregator? Assuming it is, we can begin > talking with the datasketches team about the possibility of a Direct > implementation. I am also thinking of finishing the implementation by > explicitly serializing the entire sketch on each update, but this would > only be for experimentation as I doubt this is the intended behavior for > implementations of BufferedAggregator. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
