Alex, I like your solution. To answer your question. We create a BloomFilter that has a timestamp associated with it. When the timestamp is greater than System.currentTimeMillis() the filter is removed. The custom cleanup calls Cleanup.removeEmptyTarget().andThen(<timestampCleanup>)
I think that creating a cleanup() or clean() method on the LayeredBloomFilter is the appropriate solution and that it should call cleanup() on the LayerManager. (so 2 new methods, one exposed). The next() method is used when external circumstances dictate that a new layer should be created. I think a StableBloomFilter I implemented required it, but I do not have the code to hand at the moment. Claude On Tue, Apr 9, 2024 at 10:38 AM Alex Herbert <alex.d.herb...@gmail.com> wrote: > Hi Claude, > > Q. What is your current clean-up filter, i.e. > the Consumer<LinkedList<BloomFilter>>? I assume you are using a custom one. > > The current collections code only has 2 functional implementations. One > will remove the newest filter if it is empty, one will remove the oldest > filters until the size is below a limit. Since neither of those will > iterate the list and purge stale objects then I assume you are using a > custom clean-up filter. So you had to have created the layer manager with > your custom filter. Assuming this then there are at least two solutions for > the current code: > > 1. The current implementation always calls the clean-up filter with the > same LinkedList since it is final. So you can capture the list and do what > you want with it: > > @SuppressWarnings("rawtypes") > LinkedList[] captured = new LinkedList[1]; > Consumer<LinkedList<BloomFilter>> cleanup = list -> { > captured[0] = list; > // ... do clean-up > }; > > // Once you have captured the list, you can clean it when you want: > // unchecked conversion > cleanup.accept(captured[0]); > > Obviously this is not ideal as you have to manage the captured list to call > cleanup. But it delivers exactly what you require in terms of being able to > call cleanup at any time. > > 2. The call to next() will clean the layers but also add a new layer. So > your custom clean method could clean stale objects and also any empty > filters not at the end of the list. This will avoid building up lots of > empty filters when you frequently trigger next() to purge stale filters. > You can call next() directly on the LayeredBloomFilter. I do not know what > extend check you are using so there is some management to be done with the > other settings of the LayerManager to avoid removing any functional layers > which are currently empty. > > -- > > As to exposing the LayerManager and adding a clean() method to the > LayerManager, I think this is not in keeping with the current design. The > LayerManager is used during construction and then never used again. So > functionality to act on the layers is public through the LayeredBloomFilter > (e.g. calling next()). So perhaps the change to the API should be to add a > cleanup() method to LayeredBloomFilter. This does the same as next(), but > does not add a new layer. > > I cannot recall the use case for next() in the LayeredBloomFilter. Would > the addition of cleanup() make the next() method redundant? > > -- > > Note: The typing against LinkedList could be updated to java.util.Deque. > The only issue with this is the method: > public final BloomFilter get(int depth) > > This is not supported by the Deque interface. However the LinkedList > implementation of get(int) will use the iterator from the start or end of > the list (whichever is closer) to find the element. This can use the > iterator/descendingIterator method of Deque for the same performance (but > the code to do this has to be written). > > Alex > > > On Tue, 9 Apr 2024 at 08:45, Claude Warren <cla...@xenei.com> wrote: > > > Greetings, > > > > I am attempting to use the Bloomfilter code in Kafka to manage PID > > generation. The requirement is to remove pid tracking after a period of > > time. This is possible with the LayeredBloomFilter but it has an edge > case > > problem. > > > > The LayeredBloomFilter uses the LayerManager to manage the filters that > > comprise the layers of the LayerdBloomFilter. > > The LayerManager uses a Consumer<LinkedList<BloomFilter>> called > > filterCleanup to remove old layers. > > The filterCleanup is only called when a new layer is added to the layered > > filter. > > > > This solution works well in the general case where data is flowing > through > > the layered filter. However if nothing is added to the filter, > > filterCleanup is not called. > > > > In the Kafka case we have a LayeredBloomFilter for PIDs for each > producer. > > As long as a producer is producing PIDs the filter gets updated. > > > > However, if a producer drops from the system or goes offline for a period > > of time, then they will no longer be producing PIDs and their old expired > > data will remain. > > > > We want to remove the producer from the collection when there are no more > > PIDs being tracked. > > > > I think this can be solved by adding a clean() method to the LayerManager > > that simply calls the existing filterCleanup. > > It would be easier to access this method if the LayeredBloomFilter had a > > method to return the LayerManager that was passed in the constructor. > > > > Does anyone see any issues with this approach? Are there other solutions > > to be had? > > > > Questions and comments welcomed. > > -- > > LinkedIn: http://www.linkedin.com/in/claudewarren > > > -- LinkedIn: http://www.linkedin.com/in/claudewarren