Norman, Basho also has a bloom filter implementation packaged as a separate project[1], that you might find useful. It's used in Bitcask.
Cheers, Bob [1] http://github.com/basho/ebloom On Sep 24, 2010, at 11:21 PM, Norman Barker wrote: > Paul, > > yes, performance is actually much better (for some of our harder > queries, so all docs over time with field X (two views), 10x faster), > I am testing with docs that in total emit ~100K of keys (following the > raindrop megaview). > > Some of the scalable bloom filter project contained EPL headers, > others didn't, googling for the source code I had seen other projects > add the EPL headers to bit array so I did the same. I will contact the > author as he seems active on the erlang mailing lists and if not I > will write a bloom filter from scratch, the theory is well documented, > though I like his code! > > thanks for your help, let me know any suggestions you may have. > > thanks, > > Norman > > > > On Fri, Sep 24, 2010 at 7:16 PM, Paul Davis <paul.joseph.da...@gmail.com> > wrote: >> Norman, >> >> Just glanced through. Looks better. Any feeling for a performance >> differences? >> >> Also, I glanced at the original files that you linked to. The bit >> array files didn't have a license, but what you've got there does have >> EPL headers. We need to make sure we have permission to do so. I would >> assume as much, but we have to be careful about such things in the >> ASF. You only need to get an email from the original author saying its >> ok. >> >> I'm a bit caught up with some other code at the moment, I'll give a >> more thorough combing over tomorrow. >> >> Paul >> >> On Fri, Sep 24, 2010 at 7:54 PM, Norman Barker <norman.bar...@gmail.com> >> wrote: >>> Hi, >>> >>> thanks to Paul's excellent suggestion I have rewritten the multiview >>> to use bloom filters, I had a concern that a bloom filter per view >>> would use too much memory but thanks in the main to excellent >>> implementation of bloom filters in erlang >>> (http://sites.google.com/site/scalablebloomfilters/) they seem to be >>> very space efficient. >>> >>> New code is here >>> >>> http://github.com/normanb/couchdb/ >>> >>> The code is simple, all one process, once we have agreed the approach >>> we can decide if there is any benefit in making the bloom filter >>> generation occur a separate process (using a genserver). >>> >>> Comments as always appreciated, I will continue adding to the test suite. >>> >>> thanks for the help, >>> >>> Norman >>> >>