I have not. The original HLL paper does have some points in it about bias corrections for small cardinalities, and I am not sure if those are implemented in Druid's HLL implementation.
On Mon, Sep 24, 2018 at 8:49 AM Charles Allen <charles.al...@snap.com.invalid> wrote: > https://github.com/apache/incubator-druid/pull/5712 adds some great > functionality to the Datasketches hooks in Druid. > > One thing noted in > > https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html > is the severe bias the druid HLL implementation shows at ~5k uniques being > fed in. This is something we've seen in a severe way internally, where a > bias of a few percent makes a big difference in results. As such, I'm > curious if anyone has done any research into simple bias correction to > attempt to minimize the error seen on the outputs around the error state? > > Cheers, > Charles Allen >