On 16/03/2020 07:57, Claude Warren wrote:
I made a quick pass at changing getHasher() to iterator().
A look at the feasibility or have you started work on this? If so then
I'll not start work on it as well.
I changed master to return a boolean for the merge operations in
BloomFilter. So the outstanding changes are to drop getHasher() from the
BloomFilter interface in favour of an iterator, spliterator and a
forEachBit method.
I think we can get rid of HasherBloomFilter as its purpose was really to
create a Bloom filter for temporary usage and it doesn't seem to be
required if we have a hasher that can be created from a Shape and a
function that creates an Iterator.
I agree.
One change that could be made is to clarify the contract between a
Hasher and a BloomFilter. At present the Hasher can operate without a
defined contract in this method:
PrimitiveIterator.OfInt getBits(Shape shape)
It should validate that it can generate indexes for the shape. But it
doesn't have to. It could return unlimited indexes and they could be
outside the number of bits of the BloomFilter.
There does not appear to be any control anywhere on the number of hash
functions generated by the Hasher. I would expect this test in the
AbstractBloomFilterTest to pass:
@Test
public void hasherMergeTest() {
int n = 1;
int m = 10;
HashFunctionIdentity h = new
HashFunctionIdentityImpl("provider", "name",
Signedness.SIGNED, ProcessType.CYCLIC, 0L);
Hasher hasher = new Hasher() {
@Override
public boolean isEmpty() {
return false;
}
@Override
public HashFunctionIdentity getHashFunctionIdentity() {
return h;
}
@Override
public OfInt getBits(Shape shape) {
// Do not respect the shape number of hash functions
but do respect
// the number of bits
return IntStream.range(0, m).iterator();
}
};
for (int k = 1; k < 5; k++) {
Shape shape = new Shape(h, n, m, k);
BloomFilter bf = createEmptyFilter(shape);
bf.merge(hasher);
assertEquals("incorrect cardinality", k, bf.cardinality());
}
}
It currently does not as all the BloomFilters will not respect the Shape
with which they were created, i.e. they disregard the number of hash
functions in the Shape. So does the Hasher.
I think some of the control should be returned to the BloomFilter. The
Hasher would be reduced to a simple generator of data for the
BloomFilter, for example:
PrimitiveIterator.OfInt getBits(int m);
PrimitiveIterator.OfInt getBits(int k, int m);
PrimitiveIterator.OfLong getBits();
The BloomFilter then accept responsibility for converting the primitives
to a suitable index and creating the correct number of hash functions
(i.e. indexes).
A merge operation with a BloomFilter then becomes:
- check the Hasher is using the correct hash function identity
- ask the Hasher for an iterator
- set k bits in the filter using the iterator, mapping each to the range
[0, m)
The BloomFilter has then encapsulated its state and respects the Shape.
The HashFuntion will convert byte[] to a long.
The Hasher exists to convert anything to a byte[] format.
This perhaps needs the Hasher.Builder to be revised to include more
methods that accept all the primitive data types. These are all
converted to a single byte[] representation. Thus the Hasher.Builder is
effectively a specification for a ByteBuffer. Once an object is
decomposed into the byte[] it can be fed through the HashFunction with
different seeds or using the cyclic method to create the iterator. The
BloomFilter consumes the raw long output from the stream produced by the
Hasher and sets k bits within the range m.
Alex
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org