[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561963#comment-15561963
 ] 

Benedict commented on CASSANDRA-12490:
--------------------------------------

Well, the cqlstress-seq-example.yaml attached to the ticket shows this in use 
by the partition key specification, which is broken either way you cut it.  If 
we reset the seed each time, we will only ever generate one partition, no 
matter how hard we try (making it a fairly terrible load test).  If we do not, 
we can never query the data (meaningfully, reliably; perhaps by chance).

If all you want is some way to populate all possible values for a clustering 
column, that's a very specific problem and I'm not sure abusing distribution is 
the right way to achieve that.  Perhaps the distribution parameter could take 
an ALL value, that tells stress it should generate every value.  However it 
does this already (iirc) if the number of unique values it can generate is <= 
the number of values it is being asked to generate for a partition.

For generating specific value distributions within a partition, my view is that 
we should really be supporting nashorn function definitions in the json.  These 
could accept the partition and clustering row seeds (and perhaps, optionally, 
index array, i.e. with three clustering columns the index within each column we 
are generating, i.e. the first row would be [0,0,0], the second [0,0,1] or 
[0,1,0] or [1,0,0]) as their parameter, and return a value for the column).  
This would allow you to reliably produce whatever distribution of values you 
wanted.  

> Add sequence distribution type to cassandra stress
> --------------------------------------------------
>
>                 Key: CASSANDRA-12490
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Ben Slater
>            Assignee: Ben Slater
>            Priority: Minor
>             Fix For: 3.10
>
>         Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to