[ https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561852#comment-15561852 ]
Ben Slater commented on CASSANDRA-12490: ---------------------------------------- Hi Benedict, I must be missing something here because as far as I can tell from testing a few different scenarios, setting -pop seq=1..N doesn't have any impact on the set of data generated when used with a YAML file. That aside, the intent is that you use the SEQ distribution for doing an initial load of background data before running say a read test or a mixed read/write test so that you are running with a representative volume of data on disk (and that you would probably wouldn't use SEQ for these later tests). In that case you wouldn't expect/care whether the set of data generated initially lines up in the same order as what is generated by later runs (although you would expect them to be from the same overall populations of values which I believe does hold). I believe the sequence of data generation would have to change similarly if you changed between existing distribution types between runs? Looking again at the code, I can see how the current implementation of SEQ is any issue for implementation future data validation as it doesn't "reset" as you visit each partition. I think the other distributions effectively rest due to the call to setSeed(). However, I think this can fairly easily be rectified by having the setSeed() implementation of DistrubtionSequence reset the next value to 0? Cheers Ben > Add sequence distribution type to cassandra stress > -------------------------------------------------- > > Key: CASSANDRA-12490 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12490 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Ben Slater > Assignee: Ben Slater > Priority: Minor > Fix For: 3.10 > > Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml > > > When using the write command, cassandra stress sequentially generates seeds. > This ensures generated values don't overlap (unless the sequence wraps) > providing more predictable number of inserted records (and generating a base > set of data without wasted writes). > When using a yaml stress spec there is no sequenced distribution available. > It think it would be useful to have this for doing initial load of data for > testing -- This message was sent by Atlassian JIRA (v6.3.4#6332)