[ 
https://issues.apache.org/jira/browse/CASSANDRA-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879453#action_12879453
 ] 

Johan Oskarsson commented on CASSANDRA-875:
-------------------------------------------

I should have replied here earlier, sent my thoughts as an email to the mailing 
list instead a while back.

I am working on parts of a Cassandra performance testing solution when I have 
time. This is my current thinking:

* The benchmarking would be done by YCSB as suggested. It is known to work and 
Brian has been very helpful in accepting changes. I have added support for 
exporting the results into a JSON file so that it can be read programmatically.
* Since getting a hold of servers to run this on I suggest starting out by 
using one of the cloud services, hopefully Rackspace. While there are a few 
scripts bundles out there to deploy Cassandra clusters on these I have started 
adding support for Cassandra to Whirr. That project intends to add support for 
many of the popular Apache projects (Hadoop, HBase, Zookeeper, Cassandra etc). 
As an Apache project with developers from Cloudera, Yahoo!, HP etc Whirr will 
hopefully get continued support. Especially the inclusion of HBase would make 
it an interesting choice. Sharing a similar test setup to them would be 
beneficial to both projects. http://incubator.apache.org/whirr/
* In order to make sense of the YCSB output I have forked a Hudson plugin and 
ported it to read the YCSB output files. It is still very much a rough first 
version, but would eventually provide reports, graphs and alerts if the 
performance drops unexepectedly. http://github.com/johanoskarsson/hudson-ycsb

Thoughts on this approach?

> Performance regression tests, take 2
> ------------------------------------
>
>                 Key: CASSANDRA-875
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-875
>             Project: Cassandra
>          Issue Type: Test
>          Components: Tools
>            Reporter: Jonathan Ellis
>
> We have a  stress test in contrib/py_stress, and Paul has a tool using 
> libcloud to automate running it against an ephemeral cluster of rackspace 
> cloud servers, but to really qualify as "performance regression tests" we 
> need to
>  - test a wide variety of data types (skinny rows, wide rows, different 
> comparator types, different value byte[] sizes, etc)
>  - produce pretty graphs.  seriously.
>  - archive historical data somewhere for comparison (rackspace can provide a 
> VM to host a db for this, if the ASF doesn't have something in place for this 
> kind of thing already)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to