Norman,

   Basho also has a bloom filter implementation packaged as a separate 
project[1], that you might find useful. It's used in Bitcask.

Cheers,

Bob



[1] http://github.com/basho/ebloom




On Sep 24, 2010, at 11:21 PM, Norman Barker wrote:

> Paul,
> 
> yes, performance is actually much better (for some of our harder
> queries, so all docs over time with field X (two views), 10x faster),
> I am testing with docs that in total emit ~100K of keys (following the
> raindrop megaview).
> 
> Some of the scalable bloom filter project contained EPL headers,
> others didn't, googling for the source code I had seen other projects
> add the EPL headers to bit array so I did the same. I will contact the
> author as he seems active on the erlang mailing lists and if not I
> will write a bloom filter from scratch, the theory is well documented,
> though I like his code!
> 
> thanks for your help, let me know any suggestions you may have.
> 
> thanks,
> 
> Norman
> 
> 
> 
> On Fri, Sep 24, 2010 at 7:16 PM, Paul Davis <paul.joseph.da...@gmail.com> 
> wrote:
>> Norman,
>> 
>> Just glanced through. Looks better. Any feeling for a performance 
>> differences?
>> 
>> Also, I glanced at the original files that you linked to. The bit
>> array files didn't have a license, but what you've got there does have
>> EPL headers. We need to make sure we have permission to do so. I would
>> assume as much, but we have to be careful about such things in the
>> ASF. You only need to get an email from the original author saying its
>> ok.
>> 
>> I'm a bit caught up with some other code at the moment, I'll give a
>> more thorough combing over tomorrow.
>> 
>> Paul
>> 
>> On Fri, Sep 24, 2010 at 7:54 PM, Norman Barker <norman.bar...@gmail.com> 
>> wrote:
>>> Hi,
>>> 
>>> thanks to Paul's excellent suggestion I have rewritten the multiview
>>> to use bloom filters, I had a concern that a bloom filter per view
>>> would use too much memory but thanks in the main to excellent
>>> implementation of bloom filters in erlang
>>> (http://sites.google.com/site/scalablebloomfilters/) they seem to be
>>> very space efficient.
>>> 
>>> New code is here
>>> 
>>> http://github.com/normanb/couchdb/
>>> 
>>> The code is simple, all one process, once we have agreed the approach
>>> we can decide if there is any benefit in making the bloom filter
>>> generation occur a separate process (using a genserver).
>>> 
>>> Comments as always appreciated, I will continue adding to the test suite.
>>> 
>>> thanks for the help,
>>> 
>>> Norman
>>> 
>> 

Reply via email to