jmalkin commented on issue #446: URL: https://github.com/apache/datasketches-java/issues/446#issuecomment-1568890200
Sketches are great for answering specific questions, but their size and speed benefits generally come at the cost of less flexibility to answer other questions. KLL relies only on a comparator between elements. There's no consideration given to duplicates. The short answer is what @AlexanderSaydakov mentioned above. One possible idea -- with a huge caveat that we can't say much about error bounds -- would be to have a tuple sketch containing the raw values in the tuple summary. You could query the KLL sketch to get the values associated with the rank range boundaries and then filter the the tuple summaries to be within the range. The number of retained values divided by theta would be _an_ estimate of distinct values within the range. But, again, any error bounds produced by the sketch would be misleading since we don't know what they'd be using an approximation of an approximation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
