Proposal to get the NDV of a Range query through KMV

yiifeger wu Wed, 19 May 2021 09:38:56 -0700

Hi all,
     I recently learned about the DataSketch project that is so brilliant,
but questions occurred when prepared to utilize it.
     I want to get the count of distinct values for a range query in my
project. After some study about the KMV algorithm according to the
introduction in DataSketch project, we propose an adjusted KMV algorithm to
solve it.
      In origin KMV, it only stores K  hash_values and then computes the
NDV through the average density. So what if we store extra origin values
for which hash_value contained by the k -Minimum hash_values ?  So we can
estimate the distinct value of the range query through


>           *  ndv_in_the_range = ( ndv_in_range_for_k_minimum / k)  *
> total_ndv*


    So if the idea works and the Sketch does not  implement it, could you
give some advice
on how to implement it in this project (P.s prefer the java version).
     Thanks for your help in advance!

Proposal to get the NDV of a Range query through KMV

Reply via email to