With the upcoming change the StaticHash usage model has changed. It was
serving two purposes:
1. as a mechanism to preserve the list of integers from the BloomFilter
as well as the shape.
2. as a way to construct a Hasher from a collection of integers and a
shape so that they could be merged into a Bloom filter without the overhead
of constructing a temporary Bloom filter to use in the merge.
The first purpose is removed by the removal of getHasher and addition of
iterator() in the BloomFilter interface.
I think we need a Hasher that accepts a shape and a collection of Integers
or a function producing an iterator. Something like:
{code:java}
public static class CollectionHasher implements Hasher {
Shape shape;
Supplier<PrimitiveIterator.OfInt> func;
CollectionHasher( Supplier<PrimitiveIterator.OfInt> func, Shape
shape) {
this.shape = shape;
this.func = func;
}
CollectionHasher( Collection<Integer> collection, Shape shape) {
this.shape = shape;
this.func = new Supplier<PrimitiveIterator.OfInt>() {
Collection<Integer> coll = collection;
@Override
public OfInt get() {
return new PrimitiveIterator.OfInt() {
Iterator<Integer> iter = coll.iterator();
@Override
public boolean hasNext() {
return iter.hasNext();
}
@Override
public int nextInt() {
return iter.next().intValue();
}
@Override
public Integer next() {
return iter.next();
}
};
}};
}
@Override
public OfInt getBits(Shape shape) {
if (!this.shape.equals(shape)) {
throw new IllegalArgumentException(String.format("Hasher
shape (%s) is not the same as shape (%s)",
this.shape.toString(), shape.toString()));
}
return func.get();
}
@Override
public HashFunctionIdentity getHashFunctionIdentity() {
return shape.getHashFunctionIdentity();
}
@Override
public boolean isEmpty() {
return !func.get().hasNext();
}
}
{code}
On Sun, Mar 8, 2020 at 12:39 AM Alex Herbert <[email protected]>
wrote:
>
> > On 6 Mar 2020, at 02:14, Alex Herbert <[email protected]> wrote:
> >
> > The change to make the CountingBloomFilter an interface is in this PR
> [1].
>
> Claude has stated in a review of the PR on GitHub that the change to
> CountingBloomFilter as an interface is good.
>
> I will now progress to updating the BloomFilter interface as previously
> discussed and put that into a PR. Changes would be:
>
> - boolean return values from the merge operations.
> - remove getHasher() and switch to providing an iterator of enabled indexes
>
> As per below:
> > *public* *interface* BloomFilter {
> >
> > *int* andCardinality(BloomFilter other);
> >
> > *int* cardinality();
> >
> > *boolean* contains(BloomFilter other);
> >
> > *boolean* contains(Hasher hasher);
> >
> > *long*[] getBits();
> >
> > // Change
> > PrimitiveIterator.OfInt iterator();
> >
> > Shape getShape();
> >
> >
> > * // Change boolean* merge(BloomFilter other);
> >
> >
> > *// Change boolean* merge(Hasher hasher);
> >
> > *int* orCardinality(BloomFilter other);
> >
> > *int* xorCardinality(BloomFilter other);
> >
> > }
>
> Given the CountingBloomFilter provides a forEach(BitCountConsumer) method
> it may be useful to also have the following method to receive all the
> enabled indexes:
>
> forEach(IntConsumer)
>
> Thus you can use the iterator of indexes for fail-fast checking against
> each index, or use the forEach method when you know you want to process all
> the bit indexes. In many cases the forEach can be more efficiently
> implemented than an iterator and would avoid an iterator object creation.
>
>
> >
> >
> > [1] https://github.com/apache/commons-collections/pull/137
> > <https://github.com/apache/commons-collections/pull/137>
>
>
--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren