[GitHub] [datasketches-java] leerho commented on issue #446: Sketches for Histogram and NDV

via GitHub Wed, 07 Jun 2023 18:54:49 -0700


leerho commented on issue #446:
URL: 
https://github.com/apache/datasketches-java/issues/446#issuecomment-1581773089


   Here is another, but very crude solution.  If you just want a very rough 
idea of what the NDVs are per bin, you could do this:  
   From the histogram information produced by the KLL sketch, you can compute 
the fractional density of each bin (fraction of total values including 
duplicates). Then with a parallel HLL sketch counting NDV of the entire stream 
you can compute the fractional number of duplicates in the stream.  Finally, 
with the huge assumption that the duplicates are roughly uniformly distributed 
across the ranks, you can guess-timate the number of NDV in each bin.  
   
   (I put this in not just for its humor value, but this is almost exactly what 
political pollsters do!) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [datasketches-java] leerho commented on issue #446: Sketches for Histogram and NDV

Reply via email to