very cool, thanks for sharing!

Am 19.12.25 um 18:13 schrieb Michael McCandless:
Hi Team, I thought this might be interesting to some of us :)

If anyone out there also struggles to understand / intuit how a bloom filter's multiple hash functions (which Lucene's bloom filter implements) improve the false positive error rate, here is an interactive 3D surface/chart to see how hash function count (k), bits per element (m/n), and false positive error rate relate:

https://githubsearch.mikemccandless.com/gemini_bloom10.html

The mouseover also shows the saturation (percent bits set) but it's not rendered in the graph.

I created this by iterating with Gemini Pro 3 ... see the full prompting/iterations here: https://gemini.google.com/share/46a219e0abaf

Net/net doing multiple hash functions into the same backing bit set is surprisingly worthwhile even as the saturation grows, up to 50%.

It's entirely possible Gemini hallucinated something in the graph!!  But it seems sorta correct to me.  And of course that false positive error rate formula involves e!

Mike McCandless

http://blog.mikemccandless.com

Reply via email to