Although, I don't believe theirs is growable. But if it is, that might be interesting to test for speed. Or we could add the growable parts.
On Sat, Sep 25, 2010 at 5:44 AM, Robert Dionne <dio...@dionne-associates.com> wrote: > Norman, > > Basho also has a bloom filter implementation packaged as a separate > project[1], that you might find useful. It's used in Bitcask. > > Cheers, > > Bob > > > > [1] http://github.com/basho/ebloom > > > > > On Sep 24, 2010, at 11:21 PM, Norman Barker wrote: > >> Paul, >> >> yes, performance is actually much better (for some of our harder >> queries, so all docs over time with field X (two views), 10x faster), >> I am testing with docs that in total emit ~100K of keys (following the >> raindrop megaview). >> >> Some of the scalable bloom filter project contained EPL headers, >> others didn't, googling for the source code I had seen other projects >> add the EPL headers to bit array so I did the same. I will contact the >> author as he seems active on the erlang mailing lists and if not I >> will write a bloom filter from scratch, the theory is well documented, >> though I like his code! >> >> thanks for your help, let me know any suggestions you may have. >> >> thanks, >> >> Norman >> >> >> >> On Fri, Sep 24, 2010 at 7:16 PM, Paul Davis <paul.joseph.da...@gmail.com> >> wrote: >>> Norman, >>> >>> Just glanced through. Looks better. Any feeling for a performance >>> differences? >>> >>> Also, I glanced at the original files that you linked to. The bit >>> array files didn't have a license, but what you've got there does have >>> EPL headers. We need to make sure we have permission to do so. I would >>> assume as much, but we have to be careful about such things in the >>> ASF. You only need to get an email from the original author saying its >>> ok. >>> >>> I'm a bit caught up with some other code at the moment, I'll give a >>> more thorough combing over tomorrow. >>> >>> Paul >>> >>> On Fri, Sep 24, 2010 at 7:54 PM, Norman Barker <norman.bar...@gmail.com> >>> wrote: >>>> Hi, >>>> >>>> thanks to Paul's excellent suggestion I have rewritten the multiview >>>> to use bloom filters, I had a concern that a bloom filter per view >>>> would use too much memory but thanks in the main to excellent >>>> implementation of bloom filters in erlang >>>> (http://sites.google.com/site/scalablebloomfilters/) they seem to be >>>> very space efficient. >>>> >>>> New code is here >>>> >>>> http://github.com/normanb/couchdb/ >>>> >>>> The code is simple, all one process, once we have agreed the approach >>>> we can decide if there is any benefit in making the bloom filter >>>> generation occur a separate process (using a genserver). >>>> >>>> Comments as always appreciated, I will continue adding to the test suite. >>>> >>>> thanks for the help, >>>> >>>> Norman >>>> >>> > >