[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Matthew F. Dennis (JIRA) Thu, 28 Apr 2011 21:42:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026840#comment-13026840
 ]


Matthew F. Dennis commented on CASSANDRA-1278:
----------------------------------------------

The above numbers are correct, but at RF=1 (I mistyped it in IM).

At both RF=1 and RF=3 there were 5 M1.XL C* nodes and 20 M1.XL proxy nodes, 
each doing 10M inserts.

At RF=1 C* nodes bump up against max CPU while the proxies are running from 
building indexes/filters and compacting. The nodes sustain ~150Mb/s incoming 
traffic each. All the proxies finished between 810 and 825 seconds. With 20 
proxies * 10M inserts/proxy * RF=1 that is 200M inserts across 4 * 20 cores on 
the proxies or 4 * 5 cores when measured by cluster cores resulting in a bit 
over 3K inserts/sec/core on the proxies and a bit over 12K "effective 
inserts"/sec/core on the cluster.

At RF=3 the results are as expected, taking about 2560 seconds to finish (so 
about 100 seconds longer than expected when increasing from RF=1). This is just 
shy of 3K inserts/sec/core on the proxies and little under 12K "effective 
inserts"/sec/core on the cluster. As it looked like 20 proxies maxed out 5 
nodes at RF=1 one would expect RF=3 to take roughly 3 times as long. Network 
traffic was more variable though at RF=3 as it bounced between 80-200 Mb/s.

There were no timeouts in either case.


> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-1278
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Matthew F. Dennis
>             Fix For: 0.8.1
>
>         Attachments: 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 
> 1278-cassandra-0.7.txt
>
>   Original Estimate: 40h
>          Time Spent: 40h 40m
>  Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art.  People are either 
> directed to just do it responsibly with thrift or a higher level client, or 
> they have to explore the contrib/bmt example - 
> http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
> delving into the code to find out how it works and then applying it to the 
> given problem.  Using either method, the user also needs to keep in mind that 
> overloading the cluster is possible - which will hopefully be addressed in 
> CASSANDRA-685
> This improvement would be to create a contrib module or set of documents 
> dealing with bulk loading.  Perhaps it could include code in the Core to make 
> it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to 
> do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Reply via email to