[ https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350897#comment-14350897 ]
Jonathan Shook commented on CASSANDRA-8929: ------------------------------------------- The ability to build testing tools around particular workloads is something we have been needing for a long time. I don't understand why the implementation would be complex. It is arguably much simpler than something like tracing, possibly even just a subset of tracing. All that has to be done is to support probabilistic sampling of statements, either at the coordinator or at the replica level. It's not complicated. Capturing the data in sample form is just the first step. The ability to look at a set of captured data and build a reasonably accurate test profile is something that we can't yet do automatically. However, it is something that can be made possible by having the samples. Still, I'd consider analysis of samples as a separate scope, and not the thrust of this request. Consuming sstables offline as a way to generate stress profiles is really avoiding the whole idea of sampling. You might be able to use CDC for that eventually (CASSANDRA-8844). In order to capture meaningful samples at a reasonable cost and level of operational simplicity means that we have to treat this as an operational feature worth pursuing. There are other reasons to want sampling besides just feeding stress. There are other testing tools which might make use of the data to help with full-stack testing. I can easily see someone wanting to use samples in an operational monitoring sense as well. > Workload sampling > ----------------- > > Key: CASSANDRA-8929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8929 > Project: Cassandra > Issue Type: New Feature > Components: Tools > Reporter: Jonathan Ellis > > Workload *recording* looks to be unworkable (CASSANDRA-6572). We could build > something almost as useful by sampling the requests sent to a node and > building a synthetic workload with the same characteristics using the same > (or anonymized) schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)