[ https://issues.apache.org/jira/browse/CASSANDRA-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879453#action_12879453 ]
Johan Oskarsson commented on CASSANDRA-875: ------------------------------------------- I should have replied here earlier, sent my thoughts as an email to the mailing list instead a while back. I am working on parts of a Cassandra performance testing solution when I have time. This is my current thinking: * The benchmarking would be done by YCSB as suggested. It is known to work and Brian has been very helpful in accepting changes. I have added support for exporting the results into a JSON file so that it can be read programmatically. * Since getting a hold of servers to run this on I suggest starting out by using one of the cloud services, hopefully Rackspace. While there are a few scripts bundles out there to deploy Cassandra clusters on these I have started adding support for Cassandra to Whirr. That project intends to add support for many of the popular Apache projects (Hadoop, HBase, Zookeeper, Cassandra etc). As an Apache project with developers from Cloudera, Yahoo!, HP etc Whirr will hopefully get continued support. Especially the inclusion of HBase would make it an interesting choice. Sharing a similar test setup to them would be beneficial to both projects. http://incubator.apache.org/whirr/ * In order to make sense of the YCSB output I have forked a Hudson plugin and ported it to read the YCSB output files. It is still very much a rough first version, but would eventually provide reports, graphs and alerts if the performance drops unexepectedly. http://github.com/johanoskarsson/hudson-ycsb Thoughts on this approach? > Performance regression tests, take 2 > ------------------------------------ > > Key: CASSANDRA-875 > URL: https://issues.apache.org/jira/browse/CASSANDRA-875 > Project: Cassandra > Issue Type: Test > Components: Tools > Reporter: Jonathan Ellis > > We have a stress test in contrib/py_stress, and Paul has a tool using > libcloud to automate running it against an ephemeral cluster of rackspace > cloud servers, but to really qualify as "performance regression tests" we > need to > - test a wide variety of data types (skinny rows, wide rows, different > comparator types, different value byte[] sizes, etc) > - produce pretty graphs. seriously. > - archive historical data somewhere for comparison (rackspace can provide a > VM to host a db for this, if the ASF doesn't have something in place for this > kind of thing already) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.