Shape is not intended to "Perform the standard computations using some of
n, m, k, p to produce optimal values for the other values of n, m, k, p:"
that is left to the developer to determine possibly with the help of
https://hur.st/bloomfilter/ as referenced in the class javadoc. However,
writing the constructors ends up performing the standard computations.

On Sat, Mar 14, 2020 at 8:54 AM Alex Herbert <alex.d.herb...@gmail.com>
wrote:

>
>
> > On 10 Mar 2020, at 01:36, Alex Herbert <alex.d.herb...@gmail.com> wrote:
> >
> > I was going to modify the BloomFilter interface as discussed on this
> thread.
> >
> > However before that I have made some changes to the current code to tidy
> it up. PR is [1]. The changes target the following:
>
> >
> > [1] https://github.com/apache/commons-collections/pull/140 <
> https://github.com/apache/commons-collections/pull/140>
> >
>
> This PR has updated the validations that the Shape does during
> construction. Number of bits has changed from >= 8 to > 0. But working on
> this it seems to me that the Shape is a composite class and should be
> refactored.
>
> I think that the Shape is combining two functionalities and should be
> separated:
>
> 1. Perform the standard computations using some of n, m, k, p to produce
> optimal values for the other values of n, m, k, p:
>
> n = number of items
> m = number of bits
> k = number of hash functions
> p = Probability of false positives (when storing n items)
>
> n = ceil(m / (-k / ln(1 - exp(ln(p) / k))))
> p = pow(1 - exp(-k / (m / n)), k)
> m = ceil((n * ln(p)) / ln(1 / pow(2, ln(2))))
> k = round((m / n) * ln(2))
>
> 2. Provide the functional shape for a Bloom filter
>
> This only requires storing m and k. It also requires storing the hash
> function identity.
>
> I suggest:
>
> - Moving all the computations to a ShapeCalculator
>
> - Only storing k and m in the Shape.
>
> - If you want to know n given p or p given n then you just call the method
> in the ShapeCalculator.
>
> - Change constructors in the Shape to allow construction with:
>
> n, m; where n < m; compute k
> n, m, k; where n < m; validate p
> n, p; compute k and m
> m, k, p; compute n
> k, m;
>
> This adds the constructor (k, m) which is all you actually need for a
> working Bloom filter. Possible we should validate k < m.
>
> The constructor using n, m and k only serves to validate p is less than 1
> when the filter is saturated since n is only used to compute p. Perhaps
> this constructor should be removed.
>
> My argument is that a Bloom filter does not require anything to function
> other than the number of bits (m) and a hash functions (k). Typical use
> case is to construct using (n, p) if you know your number of items and
> filter efficiency, or a constructor using m if you know your memory
> limitations.
>
> Do we need any other constructors, e.g.?
>
> m, p; compute k
>
> Note that bundling the HashFunctionIdentity with the Shape ties the Shape
> to a functional Bloom filter. It could be renamed to BloomFilterShape to
> make it clear that it is directly linked to the Bloom filter. But that is
> over verbose.
>
> However the name Shape is generic and does not incorporate the fact that
> it also has a HashFunctionIdentity. There should be something in the Shape
> javadoc to make it clear the Shape is not just a store for values
> associated with any Bloom filter (m, k) but is specific to a working Bloom
> filter as it also defines the hash function to be used with a Bloom filter.
>
> Alex
>
>
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to