[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051902#comment-14051902
 ] 

Benedict edited comment on CASSANDRA-6146 at 7/3/14 9:09 PM:
-------------------------------------------------------------

bq. You can reproduce by changing the default clustering distribution to 
uniform(1..1024) 

Well, since there are 6 clustering components, a uniform(1..1024) default 
distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per 
partition. Not surprisingly this causes an overflow in calculations. Probably 
worth spotting and letting people know this is an absurdly large size if it 
happens, and also worth using double instead of float everywhere we calculate a 
probability.

bq. no_warmup option doesn't work

Good spot. I didn't wire it up.

bq. The value component generator uses the seed of the last clustering 
component so it always gets the same value for all rows in a partition, since 
the seeds are cached.

-Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level 
clustering component) are the same? Well spotted, this is an off-by-1 bug, and 
I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they 
are the same for the whole partition.- Ah, nuts, the off-by-1 would cause it to 
always generate the same seeds. Whoops

bq. I'm concerned we won't be able to explain how to use this to joe user but 
perhaps if we come up with better terminology it and some visual examples it 
will make more sense. For example the clustering distribution is used to define 
the possible values in a single partition? if you have a population of 
uniform(1..1000) and clustering of fixed(1) you only see one value per partition

We may need to bikeshed the nomenclature. I don't think clustering is that 
tough though: it is the number of instances of that component for each instance 
of its parent (i.e. for C components with average N clustering, there will be 
N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we 
could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the 
description that they are ratios.


was (Author: benedict):
bq. You can reproduce by changing the default clustering distribution to 
uniform(1..1024) 

Well, since there are 6 clustering components, a uniform(1..1024) default 
distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per 
partition. Not surprisingly this causes an overflow in calculations. Probably 
worth spotting and letting people know this is an absurdly large size if it 
happens, and also worth using double instead of float everywhere we calculate a 
probability.

bq. no_warmup option doesn't work

Good spot. I didn't wire it up.

bq. The value component generator uses the seed of the last clustering 
component so it always gets the same value for all rows in a partition, since 
the seeds are cached.

Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level 
clustering component) are the same? Well spotted, this is an off-by-1 bug, and 
I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they 
are the same for the whole partition.

bq. I'm concerned we won't be able to explain how to use this to joe user but 
perhaps if we come up with better terminology it and some visual examples it 
will make more sense. For example the clustering distribution is used to define 
the possible values in a single partition? if you have a population of 
uniform(1..1000) and clustering of fixed(1) you only see one value per partition

We may need to bikeshed the nomenclature. I don't think clustering is that 
tough though: it is the number of instances of that component for each instance 
of its parent (i.e. for C components with average N clustering, there will be 
N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we 
could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the 
description that they are ratios.

> CQL-native stress
> -----------------
>
>                 Key: CASSANDRA-6146
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: T Jake Luciani
>             Fix For: 2.1.1
>
>         Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt
>
>
> The existing CQL "support" in stress is not worth discussing.  We need to 
> start over, and we might as well kill two birds with one stone and move to 
> the native protocol while we're at it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to