[PR] Bloom Filter (datasketches-java)

via GitHub Mon, 26 Feb 2024 23:00:57 -0800


jmalkin opened a new pull request, #513:
URL: https://github.com/apache/datasketches-java/pull/513


   A Bloom filter isn't quite one of our normal sketches, but we've had 
requests over the years for one. Why do we need yet another implementation of 
this? Comparing versus Spark's implementation (itself based somewhat on 
Guava's) I noticed a couple things:
   1. The theoretical size can exceed 2^31-1 bits -- but only 31 bits of the 
hash function are ever used. The index is always a positive 32-bit int.
   2. Our library specializes in simple cross-language portability. While it 
may be a good idea to look at alternatives at some point, seamless data 
movement between languages is a known quantity. When we port this to 
C++/Python, we'll have that in ways that are at least somewhat more complicated 
with other versions.
   
   API change suggestions are quite welcomw. I'm wondering if I should move the 
public constructors entirely to the builder class, for instance.
   
   Since we have a couple newly donated membership filters, once we have API 
nomenclature down I plan to make a MembershipFilter abstract class or interface 
in the directory above so that we can have a common usage API. That should make 
using filters in distributed systems particularly useful -- deserialize any of 
them and blindly use regardless of the specific underlying implementation. That 
will make it easier for people to experiment with and, with luck, adopt the 
newer filters once they're production ready.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Bloom Filter (datasketches-java)

Reply via email to