[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress

2014-07-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051902#comment-14051902
 ] 

Benedict edited comment on CASSANDRA-6146 at 7/3/14 9:09 PM:
-

bq. You can reproduce by changing the default clustering distribution to 
uniform(1..1024) 

Well, since there are 6 clustering components, a uniform(1..1024) default 
distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per 
partition. Not surprisingly this causes an overflow in calculations. Probably 
worth spotting and letting people know this is an absurdly large size if it 
happens, and also worth using double instead of float everywhere we calculate a 
probability.

bq. no_warmup option doesn't work

Good spot. I didn't wire it up.

bq. The value component generator uses the seed of the last clustering 
component so it always gets the same value for all rows in a partition, since 
the seeds are cached.

-Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level 
clustering component) are the same? Well spotted, this is an off-by-1 bug, and 
I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they 
are the same for the whole partition.- Ah, nuts, the off-by-1 would cause it to 
always generate the same seeds. Whoops

bq. I'm concerned we won't be able to explain how to use this to joe user but 
perhaps if we come up with better terminology it and some visual examples it 
will make more sense. For example the clustering distribution is used to define 
the possible values in a single partition? if you have a population of 
uniform(1..1000) and clustering of fixed(1) you only see one value per partition

We may need to bikeshed the nomenclature. I don't think clustering is that 
tough though: it is the number of instances of that component for each instance 
of its parent (i.e. for C components with average N clustering, there will be 
N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we 
could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the 
description that they are ratios.


was (Author: benedict):
bq. You can reproduce by changing the default clustering distribution to 
uniform(1..1024) 

Well, since there are 6 clustering components, a uniform(1..1024) default 
distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per 
partition. Not surprisingly this causes an overflow in calculations. Probably 
worth spotting and letting people know this is an absurdly large size if it 
happens, and also worth using double instead of float everywhere we calculate a 
probability.

bq. no_warmup option doesn't work

Good spot. I didn't wire it up.

bq. The value component generator uses the seed of the last clustering 
component so it always gets the same value for all rows in a partition, since 
the seeds are cached.

Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level 
clustering component) are the same? Well spotted, this is an off-by-1 bug, and 
I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they 
are the same for the whole partition.

bq. I'm concerned we won't be able to explain how to use this to joe user but 
perhaps if we come up with better terminology it and some visual examples it 
will make more sense. For example the clustering distribution is used to define 
the possible values in a single partition? if you have a population of 
uniform(1..1000) and clustering of fixed(1) you only see one value per partition

We may need to bikeshed the nomenclature. I don't think clustering is that 
tough though: it is the number of instances of that component for each instance 
of its parent (i.e. for C components with average N clustering, there will be 
N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we 
could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the 
description that they are ratios.

> CQL-native stress
> -
>
> Key: CASSANDRA-6146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: T Jake Luciani
> Fix For: 2.1.1
>
> Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt
>
>
> The existing CQL "support" in stress is not worth discussing.  We need to 
> start over, and we might as well kill two birds with one stone and move to 
> the native protocol while we're at it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress

2014-07-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051902#comment-14051902
 ] 

Benedict edited comment on CASSANDRA-6146 at 7/3/14 8:58 PM:
-

bq. You can reproduce by changing the default clustering distribution to 
uniform(1..1024) 

Well, since there are 6 clustering components, a uniform(1..1024) default 
distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per 
partition. Not surprisingly this causes an overflow in calculations. Probably 
worth spotting and letting people know this is an absurdly large size if it 
happens, and also worth using double instead of float everywhere we calculate a 
probability.

bq. no_warmup option doesn't work

Good spot. I didn't wire it up.

bq. The value component generator uses the seed of the last clustering 
component so it always gets the same value for all rows in a partition, since 
the seeds are cached.

Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level 
clustering component) are the same? Well spotted, this is an off-by-1 bug, and 
I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they 
are the same for the whole partition.

bq. I'm concerned we won't be able to explain how to use this to joe user but 
perhaps if we come up with better terminology it and some visual examples it 
will make more sense. For example the clustering distribution is used to define 
the possible values in a single partition? if you have a population of 
uniform(1..1000) and clustering of fixed(1) you only see one value per partition

We may need to bikeshed the nomenclature. I don't think clustering is that 
tough though: it is the number of instances of that component for each instance 
of its parent (i.e. for C components with average N clustering, there will be 
N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we 
could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the 
description that they are ratios.


was (Author: benedict):
bq. The value component generator uses the seed of the last clustering 
component so it always gets the same value for all rows in a partition, since 
the seeds are cached.

The seed is different for each row, though? So the seed at each 

bq. You can reproduce by changing the default clustering distribution to 
uniform(1..1024) 

Well, since there are 6 clustering components, a uniform(1..1024) default 
distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per 
partition. Not surprisingly this causes an overflow in calculations. Probably 
worth spotting and letting people know this is an absurdly large size if it 
happens, and also worth using double instead of float everywhere we calculate a 
probability.

bq. no_warmup option doesn't work

Good spot. I didn't wire it up.

bq. The value component generator uses the seed of the last clustering 
component so it always gets the same value for all rows in a partition, since 
the seeds are cached.

Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level 
clustering component) are the same? Well spotted, this is an off-by-1 bug, and 
I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they 
are the same for the whole partition.

bq. I'm concerned we won't be able to explain how to use this to joe user but 
perhaps if we come up with better terminology it and some visual examples it 
will make more sense. For example the clustering distribution is used to define 
the possible values in a single partition? if you have a population of 
uniform(1..1000) and clustering of fixed(1) you only see one value per partition

We may need to bikeshed the nomenclature. I don't think clustering is that 
tough though: it is the number of instances of that component for each instance 
of its parent (i.e. for C components with average N clustering, there will be 
N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we 
could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the 
description that they are ratios.

> CQL-native stress
> -
>
> Key: CASSANDRA-6146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: T Jake Luciani
> Fix For: 2.1.1
>
> Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt
>
>
> The existing CQL "support" in stress is not worth discussing.  We need to 
> start over, and we might as well kill two birds with one stone and move to 
> the native protocol while we're at it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress

2014-07-03 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051865#comment-14051865
 ] 

T Jake Luciani edited comment on CASSANDRA-6146 at 7/3/14 8:11 PM:
---

Some things I've discovered while reviewing:

 - no_warmup option doesn't work
 - The value component generator uses the seed of the last clustering component 
so it always gets the same value for all rows in a partition, since the seeds 
are cached.
 - If the clustering distribution is too large you end up with no rows being 
written. (You can reproduce by changing the default clustering distribution to 
uniform(1..1024) and running cqlstress-example.yaml)  

I'm concerned we won't be able to explain how to use this to joe user but 
perhaps if we come up with better terminology it and some visual examples it 
will make more sense. For example the clustering distribution is used to define 
the possible values in a single partition?   if you have a population of 
uniform(1..1000) and clustering of fixed(1) you only see one value per partition


was (Author: tjake):
Some things I've discovered while reviewing:

 - no_warmup option doesn't work
 - The value component generator uses the seed of the last clustering component 
it always gets the same value for all rows in a partition, since the seeds are 
cached.
 - If the clustering distribution is too large you end up with no rows being 
written. (You can reproduce by changing the default clustering distribution to 
uniform(1..1024) and running cqlstress-example.yaml)  

I'm concerned we won't be able to explain how to use this to joe user but 
perhaps if we come up with better terminology it and some visual examples it 
will make more sense. For example the clustering distribution is used to define 
the possible values in a single partition?   if you have a population of 
uniform(1..1000) and clustering of fixed(1) you only see one value per partition

> CQL-native stress
> -
>
> Key: CASSANDRA-6146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: T Jake Luciani
> Fix For: 2.1.1
>
> Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt
>
>
> The existing CQL "support" in stress is not worth discussing.  We need to 
> start over, and we might as well kill two birds with one stone and move to 
> the native protocol while we're at it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress

2014-07-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048994#comment-14048994
 ] 

Benedict edited comment on CASSANDRA-6146 at 7/1/14 4:03 PM:
-

bq. It sounds like writing to an entire partition at once is a step backwards 
from the original patch, since you can't test writing incrementally to a wide 
row. all clustering columns are written at once (unless I'm misunderstanding). 
Previously the population distribution of a column was not within a partition 
so you could make it very large.

The problem with the prior approach was that you could not control the size of 
partition you created, nor whether or not you were actually querying any data 
for the non-insert operations. The only control you had was the size of your 
population for each field, so the only way to perform incremental inserts to a 
partition was to constrain your partition key domain to a fraction of the 
domain of the clustering columns. This did not give you much capacity to 
control or reason about how much data was being inserted to a given partition, 
nor how this was distributed, nor, importantly, how many distinct partitions 
were updated for a single batch statement, and it meant that we would likely 
benchmark queries that returned (and even operated over) no data, with no way 
of knowing if this was correct or not. 

The new approach lets us validate the data we get back, be certain we are 
operating over data that should exist (so does real work), and even knows how 
much data it's operating over to report accurate statistics. It also lets us 
control how many cql rows we insert into a single partition in one batch. 
Modifying the current approach to write/generate only a portion of a partition 
at a time is relatively trivial; we can even support an extra "batch" option 
that supports splitting an insert for a single partition into multiple distinct 
batch statements so we can control very specifically how incrementally the data 
is written. I only left it out to put some kind of cap on the number of changes 
introduced in this ticket, but don't mind including it this round.

bq. I'm not sure how I feel about putting the batchsize and batchtype into the 
yaml. Those feel like command line args to me.

The problem with a command line option is it applies to all operations; whilst 
we don't currently support batching for anything other than inserts, it's quite 
likely we'll want to for, e.g., deletes and potentially also for queries with 
IN statements. But I'm not dead set against moving this out onto the command 
line.

bq. I think we should change the term identity to population as it seems 
clearer to me for the columnspec. and in the code identityDistribution to 
populationDistribution

Sure. We should comment that this is a unique seed population, and not the 
actual population, however.

bq. I'm trying to run with one of the yaml files and getting an error:

Whoops. Obviously I broke something in a final tweak somewhere :/




was (Author: benedict):
bq. It sounds like writing to an entire partition at once is a step backwards 
from the original patch, since you can't test writing incrementally to a wide 
row. all clustering columns are written at once (unless I'm misunderstanding). 
Previously the population distribution of a column was not within a partition 
so you could make it very large.

The problem with the prior approach was that you could not control the size of 
partition you created, nor whether or not you were actually querying any data 
for the non-insert operations. The only control you had was the size of your 
population for each field, so the only way to perform incremental inserts to a 
partition was to constrain your partition key domain to a fraction of the 
domain of the clustering columns. This did not give you much capacity to 
control or reason about how much data was being inserted to a given partition, 
nor how this was distributed, nor, importantly, how many distinct partitions 
were updated for a single batch statement, and it meant that we would likely 
benchmark queries that returned (and even operated over) no data, with no way 
of knowing if this was correct or not. 

The new approach lets us validate the data we get back, be certain we are 
operating over data that should exist (so does real work), and even knows how 
much data it's operating over to report accurate statistics. It also lets us 
control how many cql rows we insert into a single partition in one batch. 
Modifying the current approach to write/generate only a portion of a partition 
at a time is relatively trivial; we can even support an extra "batch" option 
that supports splitting an insert for a single partition into multiple distinct 
batch statements, I only left it out to put some kind of cap on the number of 
changes introduced in this ticket, but don't mind in

[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress

2014-07-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048601#comment-14048601
 ] 

Benedict edited comment on CASSANDRA-6146 at 7/1/14 7:43 AM:
-

It looks like I heavily overestimated how much of the changes were down to 
package moves, or under estimated how much I overhauled stress. I've rebased 
into 4 commits: jake's patch, package movements, deletion of old functionality, 
then the guts of the refactor. The last step is still a pretty significant 
chunk of changes (~2.5k +/-), and primarily revolves around the introduction of 
the concept of PartitionGenerator and SeedGenerator (and removal of the old 
RowGen/KeyGen), which subtly changes program flow pretty much everywhere. 
There's also the parallel introduction of OpDistribution which requires some 
annoying changes in the settings hierarchy, but simplifies the changes 
necessary outside to support mixed operations of both the old and new kind.
 
* ~0.8k +/- goes on in generate, which are pretty trivial changes;
* ~0.6k are refactoring the old operations to use the new generators, and is 
mostly straightforward; primarily it involves the introduction of a new 
PredefinedOperation class, and rewiring the old classes to use its slightly 
different methods
* ~.2k are refactoring the new insert/read statements to share the same common 
superclass, and use the new partition generator;
* ~.7k are in the settings classes, and are probably the most annoying changes 
to review, but also not super important
* the remainder are in the base classes Operation, StressAction and 
StressProfile

If this is too painful, I'll see what can be done to split the patch out 
further.

Branch can be found 
[here|https://github.com/belliottsmith/cassandra/commits/6146-cqlstress-inc]


was (Author: benedict):
It looks like I heavily overestimated how much of the changes were down to 
package moves, or under estimated how much I overhauled stress. I've rebased 
into 4 commits: jake's patch, package movements, deletion of old functionality, 
then the guts of the refactor. The last step is still a pretty significant 
chunk of changes (~2.5k +/-), and primarily revolves around the introduction of 
the concept of PartitionGenerator and SeedGenerator (and removal of the old 
RowGen/KeyGen), which subtly changes program flow pretty much everywhere. 
There's also the parallel introduction of OpDistribution which requires some 
annoying changes in the settings hierarchy, but simplifies the changes 
necessary outside to support mixed operations of both the old and new kind.
 
* ~0.8k +/- goes on in generate, which are pretty trivial changes;
* ~0.6k are refactoring the old operations to use the new generators, and is 
mostly straightforward; primarily it involves the introduction of a new 
PredefinedOperation class, and rewiring the old classes to use its slightly 
different methods
* ~.2k are refactoring the new insert/read statements to share the same common 
superclass, and use the new partition generator;
* ~.7k are in the settings classes, and are probably the most annoying changes 
to review, but also not super important
* the remainder are in the base classes Operation, StressAction and 
StressProfile

If this is too painful, I'll see what can be done to split the patch out 
further.

> CQL-native stress
> -
>
> Key: CASSANDRA-6146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: T Jake Luciani
> Fix For: 2.1.1
>
> Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt
>
>
> The existing CQL "support" in stress is not worth discussing.  We need to 
> start over, and we might as well kill two birds with one stone and move to 
> the native protocol while we're at it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress

2014-06-30 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047575#comment-14047575
 ] 

Benedict edited comment on CASSANDRA-6146 at 6/30/14 11:48 AM:
---

I've pushed a version of these changes 
[here|https://github.com/belliottsmith/cassandra/tree/6146-cqlstress]

I wanted to integrate the changes a bit more tightly with the old stress, so we 
didn't seem to simply have two different stresses only nominally related. At 
the same time I wanted to address a few things I felt were important to setup 
so that future improvements are easy to introduce:

# We now generate partitions predictably, so when we perform queries we can be 
sure we're using data that is relevant to the partition we're operating over
# We explicitly generate multi-row partitions, with configurable distirbution 
of clustering components 
# We can support multiple queries / inserts simultaneously in the new path
# The new path is executed with a more standard syntax (it's execute with 
stress user, instead of stress write/read; can perform e.g. inserts/queries 
with "stress user ops(insert=1,query=10)" for 90/10 read/write workload)
# I've switched configs to all support the range of distributions we could 
previously (including for size, etc.)
# All old paths use the same partition generators as the new paths to keep 
maintenance and extension simpler
# I've moved a few more config parameters into the yaml
# We report partition and row statistics now

Some other implications:
# To simplify matters and maintenance, I've stripped from the old paths support 
for super columns, indexes and multi-gets, as we did not typically seem to 
exercise these paths and these are probably best encapsulated with the new ones
# The old path now generates a lot more garbage, because the new path has to, 
so it will be slightly higher overhead than it was previously. We also only 
generate random data on the old path, so we may again see a decline in 
performance

Some things still to do in near future; all of which reasonably easy but wanted 
to limit scope of refactor:
# Support deletes
# Support partial inserts/deletes (currently insert only supports writing the 
whole partition)
# Support query result validation

The diff is quite big, but I think a lot of the changes are due to package 
movements. The basic functionality of your patch is left intact, so hopefully 
it shouldn't be too tricky to figure out what's happening now.


was (Author: benedict):
I've pushed a version of these changes 
[here|https://github.com/belliottsmith/cassandra/tree/6146-cqlstress]

I wanted to integrate the changes a bit more tightly with the old stress, so we 
didn't seem to simply have two different stresses only nominally related. At 
the same time I wanted to address a few things I felt were important to setup 
so that future improvements are easy to introduce:

# We now generate partitions predictably, so when we perform queries we can be 
sure we're using data that is relevant to the partition we're operating over
# We explicitly generate multi-row partitions, with configurable distirbution 
of clustering components 
# We can support multiple queries / inserts simultaneously in the new path
# The new path is executed with a more standard syntax (it's execute with 
stress user, instead of stress write/read; can perform e.g. inserts/queries 
with "stress user ops(insert=1,query=10)" for 90/10 read/write workload)
# I've switched configs to all support the range of distributions we could 
previously (including for size, etc.)
# All old paths use the same partition generators as the new paths to keep 
maintenance and extension simpler
# I've moved a few more config parameters into the yaml

Some other implications:
# To simplify matters and maintenance, I've stripped from the old paths support 
for super columns, indexes and multi-gets, as we did not typically seem to 
exercise these paths and these are probably best encapsulated with the new ones
# The old path now generates a lot more garbage, because the new path has to, 
so it will be slightly higher overhead than it was previously. We also only 
generate random data on the old path, so we may again see a decline in 
performance

Some things still to do in near future; all of which reasonably easy but wanted 
to limit scope of refactor:
# Support deletes
# Support partial inserts/deletes (currently insert only supports writing the 
whole partition)
# Support query result validation

The diff is quite big, but I think a lot of the changes are due to package 
movements.

> CQL-native stress
> -
>
> Key: CASSANDRA-6146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>A

[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress

2014-05-20 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004163#comment-14004163
 ] 

T Jake Luciani edited comment on CASSANDRA-6146 at 5/21/14 12:40 AM:
-

We can certainly support an existing schema without requiring it in a yaml 
file.  The thing I don't understand from your initial comment is how you would 
populate/query the table (especially tables with composite cells) without 
specifying a per column distribution and range?  Your example only considers 
they key distribution, but in the case of wide rows you want a distribution of 
the 'wideness' of the rows.

I think it's totally reasonable to expect someone to create a stress profile, 
it really is simpler to take the sample yaml and extend it vs trying to add 
lots of new command line flags to an already complicated.  If you want to make 
the keyspace/table DDL optional that's easy.


was (Author: tjake):
We can certainly support an existing schema without requiring it in a yaml 
file.  The thing don't understand from your initial comment is how you would 
populate/query the table (especially tables with composite cells) without 
specifying a per column distribution and range?  Your example only considers 
they key distribution, but in the case of wide rows you want a distribution of 
the 'wideness' of the rows.

I think it's totally reasonable to expect someone to create a stress profile, 
it really is simpler to take the sample yaml and extend it vs trying to add 
lots of new command line flags to an already complicated.  If you want to make 
the keyspace/table DDL optional that's easy.

> CQL-native stress
> -
>
> Key: CASSANDRA-6146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: T Jake Luciani
> Fix For: 2.1 rc1
>
> Attachments: 6146.txt
>
>
> The existing CQL "support" in stress is not worth discussing.  We need to 
> start over, and we might as well kill two birds with one stone and move to 
> the native protocol while we're at it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress

2014-03-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931630#comment-13931630
 ] 

Benedict edited comment on CASSANDRA-6146 at 3/12/14 10:51 AM:
---

Sorry for the glacial response on this [~mishail]. I've setup my JIRA search to 
include items awaiting my review so it shouldn't happen again.

First impressions:
# The installation instructions aren't as obvious as they could be. I tried a 
variety of fairly stupid things because it didn't make it clear (to me) exactly 
which releases I should be downloading/unpacking where. Would be good to state 
unambiguously that you should download the CqlJMeter binary release, and a 
JMeter binary release, and that tar -xjf \{CqlJMeter.tgz\} -C \{Unpacked JMeter 
Root Directory\} will get everything working.
# There is no support for prepared statements.
# If I simply tweak the example write plan, and have it run > 10k samples per 
thread, it pretty much immediately exhausts C* of its available file handles. 
Now, by default on my system the number of available file handles is not very 
many, but something is going wrong if I can exhaust them with only the default 
10 threads configured (cassandra-stress does not exhaust them with hundreds of 
threads).
# It would be good to have the examples unpack to one of JMeter's 
examples/templates directory. You see a mongodb template in there straight away 
(bundled), when you click "open", so it would be nice to have Cassandra there 
without having to go hunting somewhere else for it.

2 and 3 are pretty much show stoppers for me. Once they're sorted, it might be 
worth trying to package it with JMeter, since mongodb is.

Personally I also don't find JMeter to be an easy or intuitive way to get 
running quickly with stress testing, but that's a completely separate matter, 
and your bundled plan does make it much more straight forward than it would 
otherwise be. It looks like it is quite powerful and certainly very expressive, 
so for serious/complex stress testing and benchmarking it looks like it could 
be a great tool.


was (Author: benedict):
Sorry for the glacial response on this [~mishail]. I've setup my JIRA search to 
include items awaiting my review so it shouldn't happen again.

First impressions:
1) The installation instructions aren't as obvious as they could be. I tried a 
variety of fairly stupid things because it didn't make it clear (to me) exactly 
which releases I should be downloading/unpacking where. Would be good to state 
unambiguously that you should download the CqlJMeter binary release, and a 
JMeter binary release, and that tar -xjf \{CqlJMeter.tgz\} -C \{Unpacked JMeter 
Root Directory\} will get everything working.
2) There is no support for prepared statements.
3) If I simply tweak the example write plan, and have it run > 10k samples per 
thread, it pretty much immediately exhausts C* of its available file handles. 
Now, by default on my system the number of available file handles is not very 
many, but something is going wrong if I can exhaust them with only the default 
10 threads configured (cassandra-stress does not exhaust them with hundreds of 
threads).
4) It would be good to have the examples unpack to one of JMeter's 
examples/templates directory. You see a mongodb template in there straight away 
(bundled), when you click "open", so it would be nice to have Cassandra there 
without having to go hunting somewhere else for it.

2 and 3 are pretty much show stoppers for me. Once they're sorted, it might be 
worth trying to package it with JMeter, since mongodb is.

Personally I also don't find JMeter to be an easy or intuitive way to get 
running quickly with stress testing, but that's a completely separate matter, 
and your bundled plan does make it much more straight forward than it would 
otherwise be. It looks like it is quite powerful and certainly very expressive, 
so for serious/complex stress testing and benchmarking it looks like it could 
be a great tool.

> CQL-native stress
> -
>
> Key: CASSANDRA-6146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
> Fix For: 2.1 beta2
>
>
> The existing CQL "support" in stress is not worth discussing.  We need to 
> start over, and we might as well kill two birds with one stone and move to 
> the native protocol while we're at it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress

2013-11-03 Thread Mikhail Stepura (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812543#comment-13812543
 ] 

Mikhail Stepura edited comment on CASSANDRA-6146 at 11/4/13 2:20 AM:
-

Is it worth trying to implement CQL  (via DataStax Java Driver) plugin for 
JMeter? It would allow users to build their own test plans.

https://jmeter.apache.org/usermanual/build-db-test-plan.html
https://github.com/Netflix/CassJMeter



was (Author: mishail):
Is it worth trying to implement CQL  (via DataStax Java Driver) plugin for 
JMeter? It would allow users to build their own test plans?

https://jmeter.apache.org/usermanual/build-db-test-plan.html
https://github.com/Netflix/CassJMeter


> CQL-native stress
> -
>
> Key: CASSANDRA-6146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>
> The existing CQL "support" in stress is not worth discussing.  We need to 
> start over, and we might as well kill two birds with one stone and move to 
> the native protocol while we're at it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress

2013-10-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805709#comment-13805709
 ] 

Benedict edited comment on CASSANDRA-6146 at 10/25/13 9:20 PM:
---

Hi [~mishail],

You might want to take a look at my patch for 
[6199|https://issues.apache.org/jira/browse/CASSANDRA-6199], and make your 
changes there. Adding support for custom CQL operations like this should be as 
easy as copying the CqlReader class and, mostly, deleting a few unnecessary 
lines. The options parser may need a few minutes to figure out how to add 
another command. Take a look in Command, SettingsCommand and StressSettings.

The interesting bit will be inserts - automatic detection of table structure 
and selection of sensible generators for those columns should be fun, and a 
great feature. A couple of new data generators may be needed, but should be 
pretty easy.

If you want to get really exciting, support for interleaving multiple 
statements shouldn't be too difficult, as it's currently supported for other 
operations in mixed mode - this would require a little bit of rejigging, but 
probably not too much.

Any questions, feel free to ping me here, email or irc


was (Author: benedict):
Hi [~mishail],

You might want to take a look at my patch for 
[6199|https://issues.apache.org/jira/browse/CASSANDRA-6199], and make your 
changes there. Adding support for custom CQL operations like this should be as 
easy as copying the CqlReader class and, mostly, deleting a few unnecessary 
lines. The options parser may need a few minutes to figure out how to add 
another command. Take a look in Command, SettingsCommand and StressSettings.

The interesting bit will be inserts - automatic detection of table structure 
and selection of sensible generators for those columns should be fun, and a 
great feature. A couple of new data generators may be needed, but should be 
pretty easy.

If you want to get really exciting, support for interleaving multiple 
statements shouldn't be too difficult, as it's currently supported for other 
operations in mixed mode - this would require a little bit of rejigging, but 
probably not too much.

> CQL-native stress
> -
>
> Key: CASSANDRA-6146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6146
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>
> The existing CQL "support" in stress is not worth discussing.  We need to 
> start over, and we might as well kill two birds with one stone and move to 
> the native protocol while we're at it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)