Greetings,

I am attempting to use the Bloomfilter code in Kafka to manage PID
generation.  The requirement is to remove pid tracking after a period of
time.  This is possible with the LayeredBloomFilter but it has an edge case
problem.

The LayeredBloomFilter uses the LayerManager to manage the filters that
comprise the layers of the LayerdBloomFilter.
The LayerManager uses a Consumer<LinkedList<BloomFilter>> called
filterCleanup to remove old layers.
The filterCleanup is only called when a new layer is added to the layered
filter.

This solution works well in the general case where data is flowing through
the layered filter.  However if nothing is added to the filter,
filterCleanup is not called.

In the Kafka case we have a LayeredBloomFilter for PIDs for each producer.
As long as a producer is producing PIDs the filter gets updated.

However, if a producer drops from the system or goes offline for a period
of time, then they will no longer be producing PIDs and their old expired
data will remain.

We want to remove the producer from the collection when there are no more
PIDs being tracked.

I think this can be solved by adding a clean() method to the LayerManager
that simply calls the existing filterCleanup.
It would be easier to access this method if the LayeredBloomFilter had a
method to return the LayerManager that was passed in the constructor.

Does anyone see any issues with this approach?  Are there other solutions
to be had?

Questions and comments welcomed.
-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to