https://github.com/apache/incubator-druid/pull/5712 adds some great functionality to the Datasketches hooks in Druid.
One thing noted in https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html is the severe bias the druid HLL implementation shows at ~5k uniques being fed in. This is something we've seen in a severe way internally, where a bias of a few percent makes a big difference in results. As such, I'm curious if anyone has done any research into simple bias correction to attempt to minimize the error seen on the outputs around the error state? Cheers, Charles Allen