Re: [collections] Bloom filters

Claude Warren Sun, 08 Mar 2020 03:11:24 -0700

With the upcoming change the StaticHash usage model has changed.  It was
serving two purposes:


   1. as a mechanism to preserve the list of integers from the BloomFilter
   as well as the shape.
   2. as a way to construct a Hasher from a collection of integers and a
   shape so that they could be merged into a Bloom filter without the overhead
   of constructing a temporary Bloom filter to use in the merge.

The first purpose is removed by the removal of getHasher and addition of
iterator() in the BloomFilter interface.

I think we need a Hasher that accepts a shape and a collection of Integers
or a function producing an iterator.  Something like:

{code:java}
public static class CollectionHasher implements Hasher {

        Shape shape;
        Supplier<PrimitiveIterator.OfInt> func;

        CollectionHasher( Supplier<PrimitiveIterator.OfInt> func, Shape
shape) {
            this.shape = shape;
            this.func = func;
        }

        CollectionHasher( Collection<Integer> collection, Shape shape) {
            this.shape = shape;
            this.func = new Supplier<PrimitiveIterator.OfInt>() {
            Collection<Integer> coll = collection;
@Override
public OfInt get() {
return new PrimitiveIterator.OfInt() {
Iterator<Integer> iter = coll.iterator();

@Override
public boolean hasNext() {
return iter.hasNext();
}

@Override
public int nextInt() {
return iter.next().intValue();
}

@Override
public Integer next() {
return iter.next();
}


};
}};
        }

        @Override
        public OfInt getBits(Shape shape) {
            if (!this.shape.equals(shape)) {
                throw new IllegalArgumentException(String.format("Hasher
shape (%s) is not the same as shape (%s)",
                    this.shape.toString(), shape.toString()));
            }
            return func.get();
        }

        @Override
        public HashFunctionIdentity getHashFunctionIdentity() {
            return shape.getHashFunctionIdentity();
        }

        @Override
        public boolean isEmpty() {
            return !func.get().hasNext();
        }


    }
{code}

On Sun, Mar 8, 2020 at 12:39 AM Alex Herbert <alex.d.herb...@gmail.com>
wrote:

>
> > On 6 Mar 2020, at 02:14, Alex Herbert <alex.d.herb...@gmail.com> wrote:
> >
> > The change to make the CountingBloomFilter an interface is in this PR
> [1].
>
> Claude has stated in a review of the PR on GitHub that the change to
> CountingBloomFilter as an interface is good.
>
> I will now progress to updating the BloomFilter interface as previously
> discussed and put that into a PR. Changes would be:
>
> - boolean return values from the merge operations.
> - remove getHasher() and switch to providing an iterator of enabled indexes
>
> As per below:
> > *public* *interface* BloomFilter {
> >
> >      *int* andCardinality(BloomFilter other);
> >
> >      *int* cardinality();
> >
> >      *boolean* contains(BloomFilter other);
> >
> >      *boolean* contains(Hasher hasher);
> >
> >      *long*[] getBits();
> >
> >    // Change
> >    PrimitiveIterator.OfInt iterator();
> >
> >      Shape getShape();
> >
> >
> > *   // Change   boolean* merge(BloomFilter other);
> >
> >
> > *// Change   boolean* merge(Hasher hasher);
> >
> >      *int* orCardinality(BloomFilter other);
> >
> >      *int* xorCardinality(BloomFilter other);
> >
> > }
>
> Given the CountingBloomFilter provides a forEach(BitCountConsumer) method
> it may be useful to also have the following method to receive all the
> enabled indexes:
>
> forEach(IntConsumer)
>
> Thus you can use the iterator of indexes for fail-fast checking against
> each index, or use the forEach method when you know you want to process all
> the bit indexes. In many cases the forEach can be more efficiently
> implemented than an iterator and would avoid an iterator object creation.
>
>
> >
> >
> > [1] https://github.com/apache/commons-collections/pull/137
> >  <https://github.com/apache/commons-collections/pull/137>
>
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: [collections] Bloom filters

Reply via email to