I have added the formatting changes and contacted the author of scalable bloom filters, it seems that bitarray (and the hipe version) came from a discussion on the erlang mailing lists
http://groups.google.com/group/erlang-programming/browse_thread/thread/7c0191b1d709a5fe/ea5cf52b46d67d76?lnk=gst&q=bitarray#ea5cf52b46d67d76 but the author of the bloom.erl should be able to confirm. Any other comments, anyone had a chance to test it out?!! thanks, Norman On Sat, Sep 25, 2010 at 8:45 AM, Paul Davis <[email protected]> wrote: > Although, I don't believe theirs is growable. But if it is, that might > be interesting to test for speed. Or we could add the growable parts. > > On Sat, Sep 25, 2010 at 5:44 AM, Robert Dionne > <[email protected]> wrote: >> Norman, >> >> Basho also has a bloom filter implementation packaged as a separate >> project[1], that you might find useful. It's used in Bitcask. >> >> Cheers, >> >> Bob >> >> >> >> [1] http://github.com/basho/ebloom >> >> >> >> >> On Sep 24, 2010, at 11:21 PM, Norman Barker wrote: >> >>> Paul, >>> >>> yes, performance is actually much better (for some of our harder >>> queries, so all docs over time with field X (two views), 10x faster), >>> I am testing with docs that in total emit ~100K of keys (following the >>> raindrop megaview). >>> >>> Some of the scalable bloom filter project contained EPL headers, >>> others didn't, googling for the source code I had seen other projects >>> add the EPL headers to bit array so I did the same. I will contact the >>> author as he seems active on the erlang mailing lists and if not I >>> will write a bloom filter from scratch, the theory is well documented, >>> though I like his code! >>> >>> thanks for your help, let me know any suggestions you may have. >>> >>> thanks, >>> >>> Norman >>> >>> >>> >>> On Fri, Sep 24, 2010 at 7:16 PM, Paul Davis <[email protected]> >>> wrote: >>>> Norman, >>>> >>>> Just glanced through. Looks better. Any feeling for a performance >>>> differences? >>>> >>>> Also, I glanced at the original files that you linked to. The bit >>>> array files didn't have a license, but what you've got there does have >>>> EPL headers. We need to make sure we have permission to do so. I would >>>> assume as much, but we have to be careful about such things in the >>>> ASF. You only need to get an email from the original author saying its >>>> ok. >>>> >>>> I'm a bit caught up with some other code at the moment, I'll give a >>>> more thorough combing over tomorrow. >>>> >>>> Paul >>>> >>>> On Fri, Sep 24, 2010 at 7:54 PM, Norman Barker <[email protected]> >>>> wrote: >>>>> Hi, >>>>> >>>>> thanks to Paul's excellent suggestion I have rewritten the multiview >>>>> to use bloom filters, I had a concern that a bloom filter per view >>>>> would use too much memory but thanks in the main to excellent >>>>> implementation of bloom filters in erlang >>>>> (http://sites.google.com/site/scalablebloomfilters/) they seem to be >>>>> very space efficient. >>>>> >>>>> New code is here >>>>> >>>>> http://github.com/normanb/couchdb/ >>>>> >>>>> The code is simple, all one process, once we have agreed the approach >>>>> we can decide if there is any benefit in making the bloom filter >>>>> generation occur a separate process (using a genserver). >>>>> >>>>> Comments as always appreciated, I will continue adding to the test suite. >>>>> >>>>> thanks for the help, >>>>> >>>>> Norman >>>>> >>>> >> >> >
