[ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080429#comment-14080429 ]
Russell Alexander Spitzer commented on CASSANDRA-7631: ------------------------------------------------------ Back on topic, I've been running through a series of experiments to see how much faster (if any) running through cqlsstablewriter would be than just using the native client. Here are some quick numbers run on my macbook against C* also running on my macbook (for native protocol) {code} NOOP = Just generate a row don't do anything with it (I know this may be optimized out) Native = Run using -mode native cql3 SSTable = Run passing rows to a queue which is consumed by a single thread running CQLSSTableWriter n=1M using the example user profile user n=1000000 no_warmup profile=cqlstress-example.yaml ops(insert=1) -rate threads=N -mode (sstable|native cql3) Partitions Per Second Threads NOOP Native SSTable 1 22765 10165 20917 2 38333 17247 38659 4 58089 26920 33956 8 72434 33507 29354 16 87837 34195 29354 {code} So while a single SSTable writer can keep up with the generator threads it looks like contention over the ArrayBlockingQueue puts a threshold on performance. I'm going to look into getting a threading safe version of the SSTableWriter tomorrow (there is at the very least contention on file naming), hopefully we'll be able to just tie a different SSTableWriter to each generator. If all else fails we can just have them writing to different directories then rename the sstables when we have finished. > Allow Stress to write directly to SSTables > ------------------------------------------ > > Key: CASSANDRA-7631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7631 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Russell Alexander Spitzer > Assignee: Russell Alexander Spitzer > > One common difficulty with benchmarking machines is the amount of time it > takes to initially load data. For machines with a large amount of ram this > becomes especially onerous because a very large amount of data needs to be > placed on the machine before page-cache can be circumvented. > To remedy this I suggest we add a top level flag to Cassandra-Stress which > would cause the tool to write directly to sstables rather than actually > performing CQL inserts. Internally this would use CQLSStable writer to write > directly to sstables while skipping any keys which are not owned by the node > stress is running on. The same stress command run on each node in the cluster > would then write unique sstables only containing data which that node is > responsible for. Following this no further network IO would be required to > distribute data as it would all already be correctly in place. -- This message was sent by Atlassian JIRA (v6.2#6252)