Re: [COLLECTIONS] Thread safe Bloom filter

Alex Herbert Sun, 02 Jul 2023 01:30:58 -0700

I do not think concurrent write usage is a general use case for the
filter. So this seems like a specialisation. If implementing it would
harm performance then I would be against it for the base
implementation.

For specialisation I would prefer the JDK pattern used for
Synchronized collections which have a class to wrap the underlying
implementation, e.g.

BloomFilters:
public static BloomFilter synchronizedBloomFilter(BloomFilter)
public static CountingBloomFilter
synchronizedCountingBloomFilter(CountingBloomFilter)

You can then make any implementation thread safe. If your suggested
solution would be to use a lock wrapping the entire method body then
the effect of a synchronized call wrapping the implementation is the
same. This would satisfy the use case of building a single filter
using multiple threads. Note however that the same can be achieved by
building multiple filters, 1 per thread, and then merging them all.

Note that it is dangerous to think that only some methods require
being synchronized. E.g. if a filter "contains" method does not alter
the internal state then it could be unsynchronized. But if a
concurrent thread changes the state then the result of the method call
can change. Those filters with complex internal state that may
reallocate memory will require synchronization around all methods,
e.g. SparseBloomFilter. Without this there may be concurrent
modification exceptions from internal data structures which are
traversed but may be updated concurrently.

For a SimpleBloomFilter there can be no structural modification after
construction (the internal backing array is final). But concurrent
writes may not be atomic and so data corruption can occur. For the
case of reads (i.e. calls to contains), the result is probabilistic
anyway so some error due to concurrent update while reading may be
acceptable.

What is your use case? It may be simpler to provide a static helper
class to wrap access to the methods you are using that can change the
filter (merge/cardinality). Your pool of threads can then call
'contains' without synchronization overhead and any threads that
update the filter use the helper:

// operations synchronized on the filter that is modified
// applies only to filters with no object reallocation on modification
static class FilterHelper {
    boolean merge(BloomFilter bf1, BloomFilter bf2) {
        synchronized (bf1) {
            return bf1.merge(bf2);
        }
    }
}

Thus you effectively synchronize writes but do not restrict reads.
Some level of probabilistic error on read would have to be tolerated.

Alex

On Sun, 2 Jul 2023 at 07:40, Claude Warren <cla...@xenei.com> wrote:
>
> I have been thinking about what it would take to make SimpleBloomFilter
> thread safe.  I think that the answer is to use a copy on write strategy
> and a lock within all the merge methods.
>
> However, this leaves the problem of the cardinality calculation.  Currently
> it is lazily performed and cached.  I am thinking that there are 2
> solutions.
>
>    1. mark cardinality volatile and leave the calculation as it effectively
>    does a copy on write.
>    2. update the cardinality inside the merge.  This should be doable as we
>    can get the number of bits in each long after the merge and simply add them
>    up during processing.
>
>
> Does anyone see a problem with this solution?
>
> Claude
> --
> LinkedIn: http://www.linkedin.com/in/claudewarren

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [COLLECTIONS] Thread safe Bloom filter

Reply via email to