Alex,

I like your solution.  To answer your question. We create a BloomFilter
that has a timestamp associated with it.  When the timestamp is greater
than System.currentTimeMillis() the filter is removed.  The custom cleanup
calls Cleanup.removeEmptyTarget().andThen(<timestampCleanup>)

I think that creating a cleanup() or clean() method on the
LayeredBloomFilter is the appropriate solution and that it should call
cleanup() on the LayerManager. (so 2 new methods, one exposed).

The next() method is used when external circumstances dictate that a new
layer should be created.  I think a StableBloomFilter I implemented
required it,  but I do not have the code to hand at the moment.

Claude


On Tue, Apr 9, 2024 at 10:38 AM Alex Herbert <alex.d.herb...@gmail.com>
wrote:

> Hi Claude,
>
> Q. What is your current clean-up filter, i.e.
> the Consumer<LinkedList<BloomFilter>>? I assume you are using a custom one.
>
> The current collections code only has 2 functional implementations. One
> will remove the newest filter if it is empty, one will remove the oldest
> filters until the size is below a limit. Since neither of those will
> iterate the list and purge stale objects then I assume you are using a
> custom clean-up filter. So you had to have created the layer manager with
> your custom filter. Assuming this then there are at least two solutions for
> the current code:
>
> 1. The current implementation always calls the clean-up filter with the
> same LinkedList since it is final. So you can capture the list and do what
> you want with it:
>
>         @SuppressWarnings("rawtypes")
>         LinkedList[] captured = new LinkedList[1];
>         Consumer<LinkedList<BloomFilter>> cleanup = list -> {
>             captured[0] = list;
>             // ... do clean-up
>         };
>
>         // Once you have captured the list, you can clean it when you want:
>         // unchecked conversion
>         cleanup.accept(captured[0]);
>
> Obviously this is not ideal as you have to manage the captured list to call
> cleanup. But it delivers exactly what you require in terms of being able to
> call cleanup at any time.
>
> 2. The call to next() will clean the layers but also add a new layer. So
> your custom clean method could clean stale objects and also any empty
> filters not at the end of the list. This will avoid building up lots of
> empty filters when you frequently trigger next() to purge stale filters.
> You can call next() directly on the LayeredBloomFilter. I do not know what
> extend check you are using so there is some management to be done with the
> other settings of the LayerManager to avoid removing any functional layers
> which are currently empty.
>
> --
>
> As to exposing the LayerManager and adding a clean() method to the
> LayerManager, I think this is not in keeping with the current design. The
> LayerManager is used during construction and then never used again. So
> functionality to act on the layers is public through the LayeredBloomFilter
> (e.g. calling next()). So perhaps the change to the API should be to add a
> cleanup() method to LayeredBloomFilter. This does the same as next(), but
> does not add a new layer.
>
> I cannot recall the use case for next() in the LayeredBloomFilter. Would
> the addition of cleanup() make the next() method redundant?
>
> --
>
> Note: The typing against LinkedList could be updated to java.util.Deque.
> The only issue with this is the method:
> public final BloomFilter get(int depth)
>
> This is not supported by the Deque interface. However the LinkedList
> implementation of get(int) will use the iterator from the start or end of
> the list (whichever is closer) to find the element. This can use the
> iterator/descendingIterator method of Deque for the same performance (but
> the code to do this has to be written).
>
> Alex
>
>
> On Tue, 9 Apr 2024 at 08:45, Claude Warren <cla...@xenei.com> wrote:
>
> > Greetings,
> >
> > I am attempting to use the Bloomfilter code in Kafka to manage PID
> > generation.  The requirement is to remove pid tracking after a period of
> > time.  This is possible with the LayeredBloomFilter but it has an edge
> case
> > problem.
> >
> > The LayeredBloomFilter uses the LayerManager to manage the filters that
> > comprise the layers of the LayerdBloomFilter.
> > The LayerManager uses a Consumer<LinkedList<BloomFilter>> called
> > filterCleanup to remove old layers.
> > The filterCleanup is only called when a new layer is added to the layered
> > filter.
> >
> > This solution works well in the general case where data is flowing
> through
> > the layered filter.  However if nothing is added to the filter,
> > filterCleanup is not called.
> >
> > In the Kafka case we have a LayeredBloomFilter for PIDs for each
> producer.
> > As long as a producer is producing PIDs the filter gets updated.
> >
> > However, if a producer drops from the system or goes offline for a period
> > of time, then they will no longer be producing PIDs and their old expired
> > data will remain.
> >
> > We want to remove the producer from the collection when there are no more
> > PIDs being tracked.
> >
> > I think this can be solved by adding a clean() method to the LayerManager
> > that simply calls the existing filterCleanup.
> > It would be easier to access this method if the LayeredBloomFilter had a
> > method to return the LayerManager that was passed in the constructor.
> >
> > Does anyone see any issues with this approach?  Are there other solutions
> > to be had?
> >
> > Questions and comments welcomed.
> > --
> > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
>


-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to