Greetings, I am attempting to use the Bloomfilter code in Kafka to manage PID generation. The requirement is to remove pid tracking after a period of time. This is possible with the LayeredBloomFilter but it has an edge case problem.
The LayeredBloomFilter uses the LayerManager to manage the filters that comprise the layers of the LayerdBloomFilter. The LayerManager uses a Consumer<LinkedList<BloomFilter>> called filterCleanup to remove old layers. The filterCleanup is only called when a new layer is added to the layered filter. This solution works well in the general case where data is flowing through the layered filter. However if nothing is added to the filter, filterCleanup is not called. In the Kafka case we have a LayeredBloomFilter for PIDs for each producer. As long as a producer is producing PIDs the filter gets updated. However, if a producer drops from the system or goes offline for a period of time, then they will no longer be producing PIDs and their old expired data will remain. We want to remove the producer from the collection when there are no more PIDs being tracked. I think this can be solved by adding a clean() method to the LayerManager that simply calls the existing filterCleanup. It would be easier to access this method if the LayeredBloomFilter had a method to return the LayerManager that was passed in the constructor. Does anyone see any issues with this approach? Are there other solutions to be had? Questions and comments welcomed. -- LinkedIn: http://www.linkedin.com/in/claudewarren
