[jira] [Commented] (CASSANDRA-7519) Further stress improvements to generate more realistic workloads

T Jake Luciani (JIRA) Sat, 23 Aug 2014 18:36:12 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108219#comment-14108219
 ]


T Jake Luciani commented on CASSANDRA-7519:
-------------------------------------------

Ran some tests and tweaked the schema from the blogpost and things look better. 
 I do have some further questions/suggestions besides the better names.

- What is the point of batchcount?  The point of a batch is to group the 
inserts into a single statement for the server, so why would you send multiple 
of these sequentially? Even though it's possible I can't think of a realistic 
workload that would use it.

- I think it would be helpful to output some information on the partition sizes 
and batch sizes for inserts to give people a sense of what their selected 
values will do, like:

{code}
Global:
  Partitions: Min of X, Max of Y  
  Rows per partition:  Min of X,  Max of Y 

Per Batch:
  Partitions: Min of X, Max of Y
  Rows per partition: Min of X, Max of Y
{code}



> Further stress improvements to generate more realistic workloads
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-7519
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7519
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Minor
>              Labels: tools
>             Fix For: 2.1.1
>
>
> We generally believe that the most common workload is for reads to 
> exponentially prefer most recently written data. However as stress currently 
> behaves we have two id generation modes: sequential and random (although 
> random can be distributed). I propose introducing a new mode which is 
> somewhat like sequential, except we essentially 'look back' from the current 
> id by some amount defined by a distribution. I may possibly make the position 
> only increment as it's first written to also, so that this mode can be run 
> from a clean slate with a mixed workload. This should allow is to generate 
> workloads that are more representative.
> At the same time, I will introduce a timestamp value generator for primary 
> key columns that is strictly ascending, i.e. has some random component but is 
> based off of the actual system time (or some shared monotonically increasing 
> state) so that we can again generate a more realistic workload. This may be 
> challenging to tie in with the new procedurally generated partitions, but I'm 
> sure it can be done without too much difficulty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7519) Further stress improvements to generate more realistic workloads

Reply via email to