[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561852#comment-15561852
 ] 

Ben Slater commented on CASSANDRA-12490:
----------------------------------------

Hi Benedict,

I must be missing something here because as far as I can tell from testing a 
few different scenarios, setting -pop seq=1..N doesn't have any impact on the 
set of data generated when used with a YAML file.

That aside, the intent is that you use the SEQ distribution for doing an 
initial load of background data before running say a read test or a mixed 
read/write test so that you are running with a representative volume of data on 
disk (and that you would probably wouldn't use SEQ for these later tests). In 
that case you wouldn't expect/care whether the set of data generated initially 
lines up in the same order as what is generated by later runs (although you 
would expect them to be from the same overall populations of values which I 
believe does hold). I believe the sequence of data generation would have to 
change similarly if you changed between existing distribution types between 
runs?

Looking again at the code, I can see how the current implementation of  SEQ is 
any issue for implementation future data validation as it doesn't "reset" as 
you visit each partition.  I think the other distributions effectively rest due 
to the call to setSeed(). However, I think this can fairly easily be rectified 
by having the setSeed() implementation of DistrubtionSequence reset the next 
value to 0?

Cheers
Ben



> Add sequence distribution type to cassandra stress
> --------------------------------------------------
>
>                 Key: CASSANDRA-12490
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Ben Slater
>            Assignee: Ben Slater
>            Priority: Minor
>             Fix For: 3.10
>
>         Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to