[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables

Russell Alexander Spitzer (JIRA) Wed, 30 Jul 2014 20:19:26 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080429#comment-14080429
 ]


Russell Alexander Spitzer commented on CASSANDRA-7631:
------------------------------------------------------

Back on topic, I've been running through a series of experiments to see how 
much faster (if any) running through cqlsstablewriter would be than just using 
the native client.

Here are some quick numbers run on my macbook against C* also running on my 
macbook (for native protocol)
{code}
     NOOP = Just generate a row don't do anything with it (I know this may be 
optimized out)
     Native = Run using -mode native cql3
     SSTable = Run passing rows to a queue which is consumed by a single thread 
running CQLSSTableWriter

     n=1M using the example user profile
user n=1000000 no_warmup profile=cqlstress-example.yaml ops(insert=1) -rate 
threads=N -mode (sstable|native cql3)

            Partitions Per Second
Threads     NOOP   Native   SSTable
1       22765   10165   20917
2       38333   17247   38659
4       58089   26920   33956
8       72434   33507   29354
16      87837   34195   29354                     
{code}

So while a single SSTable writer can keep up with the generator threads it 
looks like contention over the ArrayBlockingQueue puts a threshold on 
performance. I'm going to look into getting a threading safe version of the 
SSTableWriter tomorrow (there is at the very least contention on file naming), 
hopefully we'll be able to just tie a different SSTableWriter to each generator.

If all else fails we can just have them writing to different directories then 
rename the sstables when we have finished. 

> Allow Stress to write directly to SSTables
> ------------------------------------------
>
>                 Key: CASSANDRA-7631
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Russell Alexander Spitzer
>            Assignee: Russell Alexander Spitzer
>
> One common difficulty with benchmarking machines is the amount of time it 
> takes to initially load data. For machines with a large amount of ram this 
> becomes especially onerous because a very large amount of data needs to be 
> placed on the machine before page-cache can be circumvented. 
> To remedy this I suggest we add a top level flag to Cassandra-Stress which 
> would cause the tool to write directly to sstables rather than actually 
> performing CQL inserts. Internally this would use CQLSStable writer to write 
> directly to sstables while skipping any keys which are not owned by the node 
> stress is running on. The same stress command run on each node in the cluster 
> would then write unique sstables only containing data which that node is 
> responsible for. Following this no further network IO would be required to 
> distribute data as it would all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables

Reply via email to