[ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077020#comment-14077020 ]
Russell Alexander Spitzer edited comment on CASSANDRA-7631 at 7/28/14 10:32 PM: -------------------------------------------------------------------------------- https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java wraps SSTableSimpleUnsorted Writer so I think we are ok there. The main reason I would like this as part of stress is that we already have all the data generation code written in for arbitrary schemas, Thanks [~tjake]! This way we could prepare for a test that writes a large amount of data and then runs a mixed workload much faster. was (Author: rspitzer): https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java wraps SSTableSimpleUnsorted Writer so I think we are ok there. The main reason I would like this as part of stress is that we already have all the data generation code backed in for arbitrary schemas, Thanks [~tjake]! This way we could prepare for a test that uses a large amount of data and a mixed workload much faster. > Allow Stress to write directly to SSTables > ------------------------------------------ > > Key: CASSANDRA-7631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7631 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Russell Alexander Spitzer > Assignee: Russell Alexander Spitzer > > One common difficulty with benchmarking machines is the amount of time it > takes to initially load data. For machines with a large amount of ram this > becomes especially onerous because a very large amount of data needs to be > placed on the machine before page-cache can be circumvented. > To remedy this I suggest we add a top level flag to Cassandra-Stress which > would cause the tool to write directly to sstables rather than actually > performing CQL inserts. Internally this would use CQLSStable writer to write > directly to sstables while skipping any keys which are not owned by the node > stress is running on. The same stress command run on each node in the cluster > would then write unique sstables only containing data which that node is > responsible for. Following this no further network IO would be required to > distribute data as it would all already be correctly in place. -- This message was sent by Atlassian JIRA (v6.2#6252)