On Tue, 13 Aug 2013 12:26:38 -0400 Jeffrey Johnson <n3...@me.com> wrote: > Yes. The Bloom filters been in use for about 3 years with no known > problems, including randomized installations for 5-10 linux distros > under a CI harness. > > THe performance/memory gains are significant. > > > Are there any other places in the code with algos like this? > > > > Yes. Bloom filters are quite convenient.
I am sure they are. For a _preliminary_ check. This can still a huge performance improvement. Let's say we have 1000 packages each 1000 files average and we are looking for 'a package containing given file'. And let's say the Bloom filter gives 1 in 1000 false positives on average and a cost of Bloom filter lookup is '2', while cost of file list scan comparision is '1'. 1. No filter: 1000 * 1000 = 1000000 comparisions, no false positives cost: 1000000 2. Bloom filter only: 1000 filter lookups, 2 false positives cost: 2000 500 times faster, but unreliable. 3. Bloom filter plus scan: 1000 filter lookups -> give us 2 'candidates' -> these require 2000 comparisions cost: 4000 Only twice slower than Bloom filter only and still 250 times faster than just scanning the file list. One could also estimate b-tree index lookup cost for this case. Greets, Jacek _______________________________________________ pld-devel-en mailing list pld-devel-en@lists.pld-linux.org http://lists.pld-linux.org/mailman/listinfo/pld-devel-en