[ https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026840#comment-13026840 ]
Matthew F. Dennis commented on CASSANDRA-1278: ---------------------------------------------- The above numbers are correct, but at RF=1 (I mistyped it in IM). At both RF=1 and RF=3 there were 5 M1.XL C* nodes and 20 M1.XL proxy nodes, each doing 10M inserts. At RF=1 C* nodes bump up against max CPU while the proxies are running from building indexes/filters and compacting. The nodes sustain ~150Mb/s incoming traffic each. All the proxies finished between 810 and 825 seconds. With 20 proxies * 10M inserts/proxy * RF=1 that is 200M inserts across 4 * 20 cores on the proxies or 4 * 5 cores when measured by cluster cores resulting in a bit over 3K inserts/sec/core on the proxies and a bit over 12K "effective inserts"/sec/core on the cluster. At RF=3 the results are as expected, taking about 2560 seconds to finish (so about 100 seconds longer than expected when increasing from RF=1). This is just shy of 3K inserts/sec/core on the proxies and little under 12K "effective inserts"/sec/core on the cluster. As it looked like 20 proxies maxed out 5 nodes at RF=1 one would expect RF=3 to take roughly 3 times as long. Network traffic was more variable though at RF=3 as it bounced between 80-200 Mb/s. There were no timeouts in either case. > Make bulk loading into Cassandra less crappy, more pluggable > ------------------------------------------------------------ > > Key: CASSANDRA-1278 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1278 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Jeremy Hanna > Assignee: Matthew F. Dennis > Fix For: 0.8.1 > > Attachments: 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, > 1278-cassandra-0.7.txt > > Original Estimate: 40h > Time Spent: 40h 40m > Remaining Estimate: 0h > > Currently bulk loading into Cassandra is a black art. People are either > directed to just do it responsibly with thrift or a higher level client, or > they have to explore the contrib/bmt example - > http://wiki.apache.org/cassandra/BinaryMemtable That contrib module requires > delving into the code to find out how it works and then applying it to the > given problem. Using either method, the user also needs to keep in mind that > overloading the cluster is possible - which will hopefully be addressed in > CASSANDRA-685 > This improvement would be to create a contrib module or set of documents > dealing with bulk loading. Perhaps it could include code in the Core to make > it more pluggable for external clients of different types. > It is just that this is something that many that are new to Cassandra need to > do - bulk load their data into Cassandra. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira