Shape is not intended to "Perform the standard computations using some of n, m, k, p to produce optimal values for the other values of n, m, k, p:" that is left to the developer to determine possibly with the help of https://hur.st/bloomfilter/ as referenced in the class javadoc. However, writing the constructors ends up performing the standard computations.
On Sat, Mar 14, 2020 at 8:54 AM Alex Herbert <alex.d.herb...@gmail.com> wrote: > > > > On 10 Mar 2020, at 01:36, Alex Herbert <alex.d.herb...@gmail.com> wrote: > > > > I was going to modify the BloomFilter interface as discussed on this > thread. > > > > However before that I have made some changes to the current code to tidy > it up. PR is [1]. The changes target the following: > > > > > [1] https://github.com/apache/commons-collections/pull/140 < > https://github.com/apache/commons-collections/pull/140> > > > > This PR has updated the validations that the Shape does during > construction. Number of bits has changed from >= 8 to > 0. But working on > this it seems to me that the Shape is a composite class and should be > refactored. > > I think that the Shape is combining two functionalities and should be > separated: > > 1. Perform the standard computations using some of n, m, k, p to produce > optimal values for the other values of n, m, k, p: > > n = number of items > m = number of bits > k = number of hash functions > p = Probability of false positives (when storing n items) > > n = ceil(m / (-k / ln(1 - exp(ln(p) / k)))) > p = pow(1 - exp(-k / (m / n)), k) > m = ceil((n * ln(p)) / ln(1 / pow(2, ln(2)))) > k = round((m / n) * ln(2)) > > 2. Provide the functional shape for a Bloom filter > > This only requires storing m and k. It also requires storing the hash > function identity. > > I suggest: > > - Moving all the computations to a ShapeCalculator > > - Only storing k and m in the Shape. > > - If you want to know n given p or p given n then you just call the method > in the ShapeCalculator. > > - Change constructors in the Shape to allow construction with: > > n, m; where n < m; compute k > n, m, k; where n < m; validate p > n, p; compute k and m > m, k, p; compute n > k, m; > > This adds the constructor (k, m) which is all you actually need for a > working Bloom filter. Possible we should validate k < m. > > The constructor using n, m and k only serves to validate p is less than 1 > when the filter is saturated since n is only used to compute p. Perhaps > this constructor should be removed. > > My argument is that a Bloom filter does not require anything to function > other than the number of bits (m) and a hash functions (k). Typical use > case is to construct using (n, p) if you know your number of items and > filter efficiency, or a constructor using m if you know your memory > limitations. > > Do we need any other constructors, e.g.? > > m, p; compute k > > Note that bundling the HashFunctionIdentity with the Shape ties the Shape > to a functional Bloom filter. It could be renamed to BloomFilterShape to > make it clear that it is directly linked to the Bloom filter. But that is > over verbose. > > However the name Shape is generic and does not incorporate the fact that > it also has a HashFunctionIdentity. There should be something in the Shape > javadoc to make it clear the Shape is not just a store for values > associated with any Bloom filter (m, k) but is specific to a working Bloom > filter as it also defines the hash function to be used with a Bloom filter. > > Alex > > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren