Re: [DISCUSS] Sketch Libraries

Casey Stella Wed, 22 Feb 2017 06:12:19 -0800

So looking at it, it seems to fit the bill, with a couple of comments:

   - The quantiles stuff provides a CDF and PMF function, which is
   sufficient for our purposes.  I haven't seen any real comparison between
   t-digests and their approach.  A cursory glance at the source code leads me
   to believe that it's not tree-based, so I'd have to dig into it a bit more
   to understand the tradeoffs of their approach vs a tree-based approach like
   in t-digest
   - The HLL stuff seems to be pure HLL, rather than HLL+, which is what we
   support.  HLL+ has better accuracy characteristics for small sets, as I
   recall.  I'll defer to Mike Miklavcic on that as I haven't read the paper
   in a while.


On the whole, I'd love to integrate with it and maybe swap out the t-digest
approach for this since it has an active community around it.

Anyway, thanks for bringing it to our attention and if anyone wants to take
that on, I'd be on board with a +1 ;)

Casey

On Tue, Feb 21, 2017 at 10:22 PM, Matt Foley <ma...@apache.org> wrote:

> Looks interesting.  Any indication whether it supports MAD (median
> absolute deviation) for outlier detection?
>
>
> On 2/21/17, 8:08 AM, "Nick Allen" <n...@nickallen.org> wrote:
>
>     We currently use the tdunning/t-digest
>     <https://github.com/tdunning/t-digest> library for generating our
> STATS_*
>     sketches and then a separate library addthis/stream-lib
>     <https://github.com/addthis/stream-lib> for doing the HLL distinct
> count.
>
>     I ran across another library originating from Yahoo that looks quite
>     featureful, well documented and quite active.  On the surface it
> *seems* to
>     be able to do what we need for both the STATS_* sketches and HLL.
>
>     https://datasketches.github.io/
>
>
>     Has anyone evaluated this library before?  Are there deficiencies as
>     compared to the libraries that we currently use?
>
>
>
>

Re: [DISCUSS] Sketch Libraries

Reply via email to