[ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027658#comment-13027658
 ] 

Chris Burroughs commented on CASSANDRA-1278:
--------------------------------------------

bq. bq.  There is literature on scalable or dynamic BloomFilters to do this in 
a mathematically sound way.

bq. For example?

{noformat}
References
[1] P. Almeida, C. Baquero, N. Preguica, and D. Hutchison. Scalable bloom
    filters. Information Processing Letters, 101(6):255–261, March 2007.

[2] Deke Guo, Jie Wu, Honghui Chen, Ye Yuan, and Xueshan Luo. The
    dynamic bloom filters. IEEE Transactions on Knowledge and Data
    Engineering, 22(1):120–133, January 2010.
{noformat}

And also "Dynamic Bloom Filters: Analysis and usability" which is not appear to 
be in  a journal but does cast some doubt on the practicality.  Google Scholar 
can find PDFs for all of these.

> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-1278
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Matthew F. Dennis
>             Fix For: 0.8.1
>
>         Attachments: 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 
> 1278-cassandra-0.7.txt
>
>   Original Estimate: 40h
>          Time Spent: 40h 40m
>  Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art.  People are either 
> directed to just do it responsibly with thrift or a higher level client, or 
> they have to explore the contrib/bmt example - 
> http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
> delving into the code to find out how it works and then applying it to the 
> given problem.  Using either method, the user also needs to keep in mind that 
> overloading the cluster is possible - which will hopefully be addressed in 
> CASSANDRA-685
> This improvement would be to create a contrib module or set of documents 
> dealing with bulk loading.  Perhaps it could include code in the Core to make 
> it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to 
> do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to