[ https://issues.apache.org/jira/browse/CASSANDRA-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108219#comment-14108219 ]
T Jake Luciani commented on CASSANDRA-7519: ------------------------------------------- Ran some tests and tweaked the schema from the blogpost and things look better. I do have some further questions/suggestions besides the better names. - What is the point of batchcount? The point of a batch is to group the inserts into a single statement for the server, so why would you send multiple of these sequentially? Even though it's possible I can't think of a realistic workload that would use it. - I think it would be helpful to output some information on the partition sizes and batch sizes for inserts to give people a sense of what their selected values will do, like: {code} Global: Partitions: Min of X, Max of Y Rows per partition: Min of X, Max of Y Per Batch: Partitions: Min of X, Max of Y Rows per partition: Min of X, Max of Y {code} > Further stress improvements to generate more realistic workloads > ---------------------------------------------------------------- > > Key: CASSANDRA-7519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7519 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Benedict > Assignee: Benedict > Priority: Minor > Labels: tools > Fix For: 2.1.1 > > > We generally believe that the most common workload is for reads to > exponentially prefer most recently written data. However as stress currently > behaves we have two id generation modes: sequential and random (although > random can be distributed). I propose introducing a new mode which is > somewhat like sequential, except we essentially 'look back' from the current > id by some amount defined by a distribution. I may possibly make the position > only increment as it's first written to also, so that this mode can be run > from a clean slate with a mixed workload. This should allow is to generate > workloads that are more representative. > At the same time, I will introduce a timestamp value generator for primary > key columns that is strictly ascending, i.e. has some random component but is > based off of the actual system time (or some shared monotonically increasing > state) so that we can again generate a more realistic workload. This may be > challenging to tie in with the new procedurally generated partitions, but I'm > sure it can be done without too much difficulty. -- This message was sent by Atlassian JIRA (v6.2#6252)