I have not. The original HLL paper does have some points in it about bias
corrections for small cardinalities, and I am not sure if those are
implemented in Druid's HLL implementation.

On Mon, Sep 24, 2018 at 8:49 AM Charles Allen
<charles.al...@snap.com.invalid> wrote:

> https://github.com/apache/incubator-druid/pull/5712 adds some great
> functionality to the Datasketches hooks in Druid.
>
> One thing noted in
>
> https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html
> is the severe bias the druid HLL implementation shows at ~5k uniques being
> fed in. This is something we've seen in a severe way internally, where a
> bias of a few percent makes a big difference in results. As such, I'm
> curious if anyone has done any research into simple bias correction to
> attempt to minimize the error seen on the outputs around the error state?
>
> Cheers,
> Charles Allen
>

Reply via email to