[GitHub] [datasketches-java] jmalkin commented on issue #446: Sketches for Histogram and NDV

via GitHub Tue, 30 May 2023 11:33:30 -0700


jmalkin commented on issue #446:
URL: 
https://github.com/apache/datasketches-java/issues/446#issuecomment-1568890200


   Sketches are great for answering specific questions, but their size and 
speed benefits generally come at the cost of less flexibility to answer other 
questions.
   
   KLL relies only on a comparator between elements. There's no consideration 
given to duplicates. The short answer is what @AlexanderSaydakov mentioned 
above.
   
   One possible idea -- with a huge caveat that we can't say much about error 
bounds -- would be to have a tuple sketch containing the raw values in the 
tuple summary. You could query the KLL sketch to get the values associated with 
the rank range boundaries and then filter the the tuple summaries to be within 
the range. The number of retained values divided by theta would be _an_ 
estimate of distinct values within the range. But, again, any error bounds 
produced by the sketch would be misleading since we don't know what they'd be 
using an approximation of an approximation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [datasketches-java] jmalkin commented on issue #446: Sketches for Histogram and NDV

Reply via email to