https://github.com/apache/incubator-druid/pull/5712 adds some great
functionality to the Datasketches hooks in Druid.

One thing noted in
https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html
is the severe bias the druid HLL implementation shows at ~5k uniques being
fed in. This is something we've seen in a severe way internally, where a
bias of a few percent makes a big difference in results. As such, I'm
curious if anyone has done any research into simple bias correction to
attempt to minimize the error seen on the outputs around the error state?

Cheers,
Charles Allen

Reply via email to