Re: Unique Sketch aggregations and bias correction

Gian Merlino Mon, 24 Sep 2018 14:20:01 -0700

I have not. The original HLL paper does have some points in it about bias
corrections for small cardinalities, and I am not sure if those are
implemented in Druid's HLL implementation.


On Mon, Sep 24, 2018 at 8:49 AM Charles Allen
<[email protected]> wrote:

> https://github.com/apache/incubator-druid/pull/5712 adds some great
> functionality to the Datasketches hooks in Druid.
>
> One thing noted in
>
> https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html
> is the severe bias the druid HLL implementation shows at ~5k uniques being
> fed in. This is something we've seen in a severe way internally, where a
> bias of a few percent makes a big difference in results. As such, I'm
> curious if anyone has done any research into simple bias correction to
> attempt to minimize the error seen on the outputs around the error state?
>
> Cheers,
> Charles Allen
>

Re: Unique Sketch aggregations and bias correction

Reply via email to