Re: [BloomFilters] changes to BloomFilter

Alex Herbert Mon, 16 Mar 2020 11:34:28 -0700


On 16/03/2020 07:57, Claude Warren wrote:

I made a quick pass at changing getHasher() to iterator().

A look at the feasibility or have you started work on this? If so thenI'll not start work on it as well.

I changed master to return a boolean for the merge operations inBloomFilter. So the outstanding changes are to drop getHasher() from theBloomFilter interface in favour of an iterator, spliterator and aforEachBit method.

I think we can get rid of HasherBloomFilter as its purpose was really to
create a Bloom filter for temporary usage and it doesn't seem to be
required if we have a hasher that can be created from a Shape and a
function that creates an Iterator.


I agree.

One change that could be made is to clarify the contract between aHasher and a BloomFilter. At present the Hasher can operate without adefined contract in this method:


PrimitiveIterator.OfInt getBits(Shape shape)

It should validate that it can generate indexes for the shape. But itdoesn't have to. It could return unlimited indexes and they could beoutside the number of bits of the BloomFilter.

There does not appear to be any control anywhere on the number of hashfunctions generated by the Hasher. I would expect this test in theAbstractBloomFilterTest to pass:


    @Test
    public void hasherMergeTest() {
        int n = 1;
        int m = 10;

HashFunctionIdentity h = newHashFunctionIdentityImpl("provider", "name",

            Signedness.SIGNED, ProcessType.CYCLIC, 0L);
        Hasher hasher = new Hasher() {
            @Override
            public boolean isEmpty() {
                return false;
            }
            @Override
            public HashFunctionIdentity getHashFunctionIdentity() {
                return h;
            }
            @Override
            public OfInt getBits(Shape shape) {

// Do not respect the shape number of hash functionsbut do respect

                // the number of bits
                return IntStream.range(0, m).iterator();
            }
        };
        for (int k = 1; k < 5; k++) {
            Shape shape = new Shape(h, n, m, k);
            BloomFilter bf = createEmptyFilter(shape);
            bf.merge(hasher);
            assertEquals("incorrect cardinality", k, bf.cardinality());
        }
    }

It currently does not as all the BloomFilters will not respect the Shapewith which they were created, i.e. they disregard the number of hashfunctions in the Shape. So does the Hasher.

I think some of the control should be returned to the BloomFilter. TheHasher would be reduced to a simple generator of data for theBloomFilter, for example:


    PrimitiveIterator.OfInt getBits(int m);
    PrimitiveIterator.OfInt getBits(int k, int m);
    PrimitiveIterator.OfLong getBits();

The BloomFilter then accept responsibility for converting the primitivesto a suitable index and creating the correct number of hash functions(i.e. indexes).


A merge operation with a BloomFilter then becomes:

- check the Hasher is using the correct hash function identity
- ask the Hasher for an iterator

- set k bits in the filter using the iterator, mapping each to the range[0, m)


The BloomFilter has then encapsulated its state and respects the Shape.

The HashFuntion will convert byte[] to a long.

The Hasher exists to convert anything to a byte[] format.

This perhaps needs the Hasher.Builder to be revised to include moremethods that accept all the primitive data types. These are allconverted to a single byte[] representation. Thus the Hasher.Builder iseffectively a specification for a ByteBuffer. Once an object isdecomposed into the byte[] it can be fed through the HashFunction withdifferent seeds or using the cyclic method to create the iterator. TheBloomFilter consumes the raw long output from the stream produced by theHasher and sets k bits within the range m.


Alex



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [BloomFilters] changes to BloomFilter

Reply via email to