In their 1985 paper "Probabilistic counting algorithms for database
applications," Flajolet and Martin, proposed the following algorithm:

Suppose the elements in the set {a_1, ..., a_n}
are from [m] = {1, 2, ..., m}.
Pick a random hash function
h : [m] -> [0, 1], apply h to all elements of the set and 
compute v = min_{j=1}^n h(a_j). Then the estimation of number of
distinct elements is F = 1/v.

Justification is the following:
if there are F0 independent and uniform values in [0, 1], then their
expected minimum is around 1/F0.


My question is: 
How would you prove this justification? This should probably be fairly
easy to compute expected minimum, but I am missing it.

Thanks.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to