[ 
https://issues.apache.org/jira/browse/CASSANDRA-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350897#comment-14350897
 ] 

Jonathan Shook commented on CASSANDRA-8929:
-------------------------------------------

The ability to build testing tools around particular workloads is something we 
have been needing for a long time. I don't understand why the implementation 
would be complex. It is arguably much simpler than something like tracing, 
possibly even just a subset of tracing. All that has to be done is to support 
probabilistic sampling of statements, either at the coordinator or at the 
replica level. It's not complicated.

Capturing the data in sample form is just the first step. The ability to look 
at a set of captured data and build a reasonably accurate test profile is 
something that we can't yet do automatically. However, it is something that can 
be made possible by having the samples. Still, I'd consider analysis of samples 
as a separate scope, and not the thrust of this request.

Consuming sstables offline as a way to generate stress profiles is really 
avoiding the whole idea of sampling. You might be able to use CDC for that 
eventually (CASSANDRA-8844). In order to capture meaningful samples at a 
reasonable cost and level of operational simplicity means that we have to treat 
this as an operational feature worth pursuing. There are other reasons to want 
sampling besides just feeding stress. There are other testing tools which might 
make use of the data to help with full-stack testing. I can easily see someone 
wanting to use samples in an operational monitoring sense as well.


> Workload sampling
> -----------------
>
>                 Key: CASSANDRA-8929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>
> Workload *recording* looks to be unworkable (CASSANDRA-6572).  We could build 
> something almost as useful by sampling the requests sent to a node and 
> building a synthetic workload with the same characteristics using the same 
> (or anonymized) schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to