very cool, thanks for sharing!
Am 19.12.25 um 18:13 schrieb Michael McCandless:
Hi Team, I thought this might be interesting to some of us :)
If anyone out there also struggles to understand / intuit how a bloom
filter's multiple hash functions (which Lucene's bloom filter
implements) improve the false positive error rate, here is an
interactive 3D surface/chart to see how hash function count (k), bits
per element (m/n), and false positive error rate relate:
https://githubsearch.mikemccandless.com/gemini_bloom10.html
The mouseover also shows the saturation (percent bits set) but it's
not rendered in the graph.
I created this by iterating with Gemini Pro 3 ... see the full
prompting/iterations here: https://gemini.google.com/share/46a219e0abaf
Net/net doing multiple hash functions into the same backing bit set is
surprisingly worthwhile even as the saturation grows, up to 50%.
It's entirely possible Gemini hallucinated something in the graph!!
But it seems sorta correct to me. And of course that false positive
error rate formula involves e!
Mike McCandless
http://blog.mikemccandless.com