[ 
https://issues.apache.org/jira/browse/CASSANDRA-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934132#comment-13934132
 ] 

Benedict commented on CASSANDRA-6633:
-------------------------------------

Well, (1) and (2) should actually be pretty easy, and maybe very beneficial. 
Probably worth doing for 3.0, as it should permit scaling more gracefully to 
larger datasets.

Probably worth splitting out investigating Golomb encoding and other 
optimisations for Later though.

> Investigate Bloom Filter Improvements
> -------------------------------------
>
>                 Key: CASSANDRA-6633
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6633
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>
> Investigate various possible improvements to our bloom filters:
> 1) Dynamic resizing would be useful. There are a few ways this could be 
> achieved: with some modification, downsampling could be supported; 
> partitioning the hash functions so that we may select the number of 
> hashes/bits dynamically, by loading/unloading a given partition; and there 
> are some related data structures, such as Quotient Filters that support 
> resizing and merging.
> 2) Faster loading: should be possible to mmap the bloom filter representation 
> on disk
> 3) Most ambitious of all, would be to try to reduce the memory requirement of 
> bloom filters. Golomb Coded Sets 
> [1|http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf]
>  are a possibility, as are other compressed hash structures 
> [2|http://www.it-c.dk/people/pagh/papers/bloom.pdf].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to