[ https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561963#comment-15561963 ]
Benedict commented on CASSANDRA-12490: -------------------------------------- Well, the cqlstress-seq-example.yaml attached to the ticket shows this in use by the partition key specification, which is broken either way you cut it. If we reset the seed each time, we will only ever generate one partition, no matter how hard we try (making it a fairly terrible load test). If we do not, we can never query the data (meaningfully, reliably; perhaps by chance). If all you want is some way to populate all possible values for a clustering column, that's a very specific problem and I'm not sure abusing distribution is the right way to achieve that. Perhaps the distribution parameter could take an ALL value, that tells stress it should generate every value. However it does this already (iirc) if the number of unique values it can generate is <= the number of values it is being asked to generate for a partition. For generating specific value distributions within a partition, my view is that we should really be supporting nashorn function definitions in the json. These could accept the partition and clustering row seeds (and perhaps, optionally, index array, i.e. with three clustering columns the index within each column we are generating, i.e. the first row would be [0,0,0], the second [0,0,1] or [0,1,0] or [1,0,0]) as their parameter, and return a value for the column). This would allow you to reliably produce whatever distribution of values you wanted. > Add sequence distribution type to cassandra stress > -------------------------------------------------- > > Key: CASSANDRA-12490 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12490 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Ben Slater > Assignee: Ben Slater > Priority: Minor > Fix For: 3.10 > > Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml > > > When using the write command, cassandra stress sequentially generates seeds. > This ensures generated values don't overlap (unless the sequence wraps) > providing more predictable number of inserted records (and generating a base > set of data without wasted writes). > When using a yaml stress spec there is no sequenced distribution available. > It think it would be useful to have this for doing initial load of data for > testing -- This message was sent by Atlassian JIRA (v6.3.4#6332)