[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051902#comment-14051902 ] Benedict edited comment on CASSANDRA-6146 at 7/3/14 9:09 PM: - bq. You can reproduce by changing the default clustering distribution to uniform(1..1024) Well, since there are 6 clustering components, a uniform(1..1024) default distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per partition. Not surprisingly this causes an overflow in calculations. Probably worth spotting and letting people know this is an absurdly large size if it happens, and also worth using double instead of float everywhere we calculate a probability. bq. no_warmup option doesn't work Good spot. I didn't wire it up. bq. The value component generator uses the seed of the last clustering component so it always gets the same value for all rows in a partition, since the seeds are cached. -Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level clustering component) are the same? Well spotted, this is an off-by-1 bug, and I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they are the same for the whole partition.- Ah, nuts, the off-by-1 would cause it to always generate the same seeds. Whoops bq. I'm concerned we won't be able to explain how to use this to joe user but perhaps if we come up with better terminology it and some visual examples it will make more sense. For example the clustering distribution is used to define the possible values in a single partition? if you have a population of uniform(1..1000) and clustering of fixed(1) you only see one value per partition We may need to bikeshed the nomenclature. I don't think clustering is that tough though: it is the number of instances of that component for each instance of its parent (i.e. for C components with average N clustering, there will be N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the description that they are ratios. was (Author: benedict): bq. You can reproduce by changing the default clustering distribution to uniform(1..1024) Well, since there are 6 clustering components, a uniform(1..1024) default distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per partition. Not surprisingly this causes an overflow in calculations. Probably worth spotting and letting people know this is an absurdly large size if it happens, and also worth using double instead of float everywhere we calculate a probability. bq. no_warmup option doesn't work Good spot. I didn't wire it up. bq. The value component generator uses the seed of the last clustering component so it always gets the same value for all rows in a partition, since the seeds are cached. Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level clustering component) are the same? Well spotted, this is an off-by-1 bug, and I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they are the same for the whole partition. bq. I'm concerned we won't be able to explain how to use this to joe user but perhaps if we come up with better terminology it and some visual examples it will make more sense. For example the clustering distribution is used to define the possible values in a single partition? if you have a population of uniform(1..1000) and clustering of fixed(1) you only see one value per partition We may need to bikeshed the nomenclature. I don't think clustering is that tough though: it is the number of instances of that component for each instance of its parent (i.e. for C components with average N clustering, there will be N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the description that they are ratios. > CQL-native stress > - > > Key: CASSANDRA-6146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6146 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: T Jake Luciani > Fix For: 2.1.1 > > Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt > > > The existing CQL "support" in stress is not worth discussing. We need to > start over, and we might as well kill two birds with one stone and move to > the native protocol while we're at it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051902#comment-14051902 ] Benedict edited comment on CASSANDRA-6146 at 7/3/14 8:58 PM: - bq. You can reproduce by changing the default clustering distribution to uniform(1..1024) Well, since there are 6 clustering components, a uniform(1..1024) default distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per partition. Not surprisingly this causes an overflow in calculations. Probably worth spotting and letting people know this is an absurdly large size if it happens, and also worth using double instead of float everywhere we calculate a probability. bq. no_warmup option doesn't work Good spot. I didn't wire it up. bq. The value component generator uses the seed of the last clustering component so it always gets the same value for all rows in a partition, since the seeds are cached. Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level clustering component) are the same? Well spotted, this is an off-by-1 bug, and I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they are the same for the whole partition. bq. I'm concerned we won't be able to explain how to use this to joe user but perhaps if we come up with better terminology it and some visual examples it will make more sense. For example the clustering distribution is used to define the possible values in a single partition? if you have a population of uniform(1..1000) and clustering of fixed(1) you only see one value per partition We may need to bikeshed the nomenclature. I don't think clustering is that tough though: it is the number of instances of that component for each instance of its parent (i.e. for C components with average N clustering, there will be N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the description that they are ratios. was (Author: benedict): bq. The value component generator uses the seed of the last clustering component so it always gets the same value for all rows in a partition, since the seeds are cached. The seed is different for each row, though? So the seed at each bq. You can reproduce by changing the default clustering distribution to uniform(1..1024) Well, since there are 6 clustering components, a uniform(1..1024) default distribution would yield 512^6 (=(2^9)^6 = 2^54) _average_ number of rows per partition. Not surprisingly this causes an overflow in calculations. Probably worth spotting and letting people know this is an absurdly large size if it happens, and also worth using double instead of float everywhere we calculate a probability. bq. no_warmup option doesn't work Good spot. I didn't wire it up. bq. The value component generator uses the seed of the last clustering component so it always gets the same value for all rows in a partition, since the seeds are cached. Ah, you mean all _leaf_ rows (i.e. those sharing the second-lowest level clustering component) are the same? Well spotted, this is an off-by-1 bug, and I wasn't using a clustering>1 for the leaf. It' shouldn't be the case that they are the same for the whole partition. bq. I'm concerned we won't be able to explain how to use this to joe user but perhaps if we come up with better terminology it and some visual examples it will make more sense. For example the clustering distribution is used to define the possible values in a single partition? if you have a population of uniform(1..1000) and clustering of fixed(1) you only see one value per partition We may need to bikeshed the nomenclature. I don't think clustering is that tough though: it is the number of instances of that component for each instance of its parent (i.e. for C components with average N clustering, there will be N^C rows). The only complex bit IMO is the updateratio and useratio; perhaps we could relabel these to 'rowspervisit' and 'rowsperbatch' and indicate in the description that they are ratios. > CQL-native stress > - > > Key: CASSANDRA-6146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6146 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: T Jake Luciani > Fix For: 2.1.1 > > Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt > > > The existing CQL "support" in stress is not worth discussing. We need to > start over, and we might as well kill two birds with one stone and move to > the native protocol while we're at it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051865#comment-14051865 ] T Jake Luciani edited comment on CASSANDRA-6146 at 7/3/14 8:11 PM: --- Some things I've discovered while reviewing: - no_warmup option doesn't work - The value component generator uses the seed of the last clustering component so it always gets the same value for all rows in a partition, since the seeds are cached. - If the clustering distribution is too large you end up with no rows being written. (You can reproduce by changing the default clustering distribution to uniform(1..1024) and running cqlstress-example.yaml) I'm concerned we won't be able to explain how to use this to joe user but perhaps if we come up with better terminology it and some visual examples it will make more sense. For example the clustering distribution is used to define the possible values in a single partition? if you have a population of uniform(1..1000) and clustering of fixed(1) you only see one value per partition was (Author: tjake): Some things I've discovered while reviewing: - no_warmup option doesn't work - The value component generator uses the seed of the last clustering component it always gets the same value for all rows in a partition, since the seeds are cached. - If the clustering distribution is too large you end up with no rows being written. (You can reproduce by changing the default clustering distribution to uniform(1..1024) and running cqlstress-example.yaml) I'm concerned we won't be able to explain how to use this to joe user but perhaps if we come up with better terminology it and some visual examples it will make more sense. For example the clustering distribution is used to define the possible values in a single partition? if you have a population of uniform(1..1000) and clustering of fixed(1) you only see one value per partition > CQL-native stress > - > > Key: CASSANDRA-6146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6146 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: T Jake Luciani > Fix For: 2.1.1 > > Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt > > > The existing CQL "support" in stress is not worth discussing. We need to > start over, and we might as well kill two birds with one stone and move to > the native protocol while we're at it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048994#comment-14048994 ] Benedict edited comment on CASSANDRA-6146 at 7/1/14 4:03 PM: - bq. It sounds like writing to an entire partition at once is a step backwards from the original patch, since you can't test writing incrementally to a wide row. all clustering columns are written at once (unless I'm misunderstanding). Previously the population distribution of a column was not within a partition so you could make it very large. The problem with the prior approach was that you could not control the size of partition you created, nor whether or not you were actually querying any data for the non-insert operations. The only control you had was the size of your population for each field, so the only way to perform incremental inserts to a partition was to constrain your partition key domain to a fraction of the domain of the clustering columns. This did not give you much capacity to control or reason about how much data was being inserted to a given partition, nor how this was distributed, nor, importantly, how many distinct partitions were updated for a single batch statement, and it meant that we would likely benchmark queries that returned (and even operated over) no data, with no way of knowing if this was correct or not. The new approach lets us validate the data we get back, be certain we are operating over data that should exist (so does real work), and even knows how much data it's operating over to report accurate statistics. It also lets us control how many cql rows we insert into a single partition in one batch. Modifying the current approach to write/generate only a portion of a partition at a time is relatively trivial; we can even support an extra "batch" option that supports splitting an insert for a single partition into multiple distinct batch statements so we can control very specifically how incrementally the data is written. I only left it out to put some kind of cap on the number of changes introduced in this ticket, but don't mind including it this round. bq. I'm not sure how I feel about putting the batchsize and batchtype into the yaml. Those feel like command line args to me. The problem with a command line option is it applies to all operations; whilst we don't currently support batching for anything other than inserts, it's quite likely we'll want to for, e.g., deletes and potentially also for queries with IN statements. But I'm not dead set against moving this out onto the command line. bq. I think we should change the term identity to population as it seems clearer to me for the columnspec. and in the code identityDistribution to populationDistribution Sure. We should comment that this is a unique seed population, and not the actual population, however. bq. I'm trying to run with one of the yaml files and getting an error: Whoops. Obviously I broke something in a final tweak somewhere :/ was (Author: benedict): bq. It sounds like writing to an entire partition at once is a step backwards from the original patch, since you can't test writing incrementally to a wide row. all clustering columns are written at once (unless I'm misunderstanding). Previously the population distribution of a column was not within a partition so you could make it very large. The problem with the prior approach was that you could not control the size of partition you created, nor whether or not you were actually querying any data for the non-insert operations. The only control you had was the size of your population for each field, so the only way to perform incremental inserts to a partition was to constrain your partition key domain to a fraction of the domain of the clustering columns. This did not give you much capacity to control or reason about how much data was being inserted to a given partition, nor how this was distributed, nor, importantly, how many distinct partitions were updated for a single batch statement, and it meant that we would likely benchmark queries that returned (and even operated over) no data, with no way of knowing if this was correct or not. The new approach lets us validate the data we get back, be certain we are operating over data that should exist (so does real work), and even knows how much data it's operating over to report accurate statistics. It also lets us control how many cql rows we insert into a single partition in one batch. Modifying the current approach to write/generate only a portion of a partition at a time is relatively trivial; we can even support an extra "batch" option that supports splitting an insert for a single partition into multiple distinct batch statements, I only left it out to put some kind of cap on the number of changes introduced in this ticket, but don't mind in
[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048601#comment-14048601 ] Benedict edited comment on CASSANDRA-6146 at 7/1/14 7:43 AM: - It looks like I heavily overestimated how much of the changes were down to package moves, or under estimated how much I overhauled stress. I've rebased into 4 commits: jake's patch, package movements, deletion of old functionality, then the guts of the refactor. The last step is still a pretty significant chunk of changes (~2.5k +/-), and primarily revolves around the introduction of the concept of PartitionGenerator and SeedGenerator (and removal of the old RowGen/KeyGen), which subtly changes program flow pretty much everywhere. There's also the parallel introduction of OpDistribution which requires some annoying changes in the settings hierarchy, but simplifies the changes necessary outside to support mixed operations of both the old and new kind. * ~0.8k +/- goes on in generate, which are pretty trivial changes; * ~0.6k are refactoring the old operations to use the new generators, and is mostly straightforward; primarily it involves the introduction of a new PredefinedOperation class, and rewiring the old classes to use its slightly different methods * ~.2k are refactoring the new insert/read statements to share the same common superclass, and use the new partition generator; * ~.7k are in the settings classes, and are probably the most annoying changes to review, but also not super important * the remainder are in the base classes Operation, StressAction and StressProfile If this is too painful, I'll see what can be done to split the patch out further. Branch can be found [here|https://github.com/belliottsmith/cassandra/commits/6146-cqlstress-inc] was (Author: benedict): It looks like I heavily overestimated how much of the changes were down to package moves, or under estimated how much I overhauled stress. I've rebased into 4 commits: jake's patch, package movements, deletion of old functionality, then the guts of the refactor. The last step is still a pretty significant chunk of changes (~2.5k +/-), and primarily revolves around the introduction of the concept of PartitionGenerator and SeedGenerator (and removal of the old RowGen/KeyGen), which subtly changes program flow pretty much everywhere. There's also the parallel introduction of OpDistribution which requires some annoying changes in the settings hierarchy, but simplifies the changes necessary outside to support mixed operations of both the old and new kind. * ~0.8k +/- goes on in generate, which are pretty trivial changes; * ~0.6k are refactoring the old operations to use the new generators, and is mostly straightforward; primarily it involves the introduction of a new PredefinedOperation class, and rewiring the old classes to use its slightly different methods * ~.2k are refactoring the new insert/read statements to share the same common superclass, and use the new partition generator; * ~.7k are in the settings classes, and are probably the most annoying changes to review, but also not super important * the remainder are in the base classes Operation, StressAction and StressProfile If this is too painful, I'll see what can be done to split the patch out further. > CQL-native stress > - > > Key: CASSANDRA-6146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6146 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: T Jake Luciani > Fix For: 2.1.1 > > Attachments: 6146-v2.txt, 6146.txt, 6164-v3.txt > > > The existing CQL "support" in stress is not worth discussing. We need to > start over, and we might as well kill two birds with one stone and move to > the native protocol while we're at it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047575#comment-14047575 ] Benedict edited comment on CASSANDRA-6146 at 6/30/14 11:48 AM: --- I've pushed a version of these changes [here|https://github.com/belliottsmith/cassandra/tree/6146-cqlstress] I wanted to integrate the changes a bit more tightly with the old stress, so we didn't seem to simply have two different stresses only nominally related. At the same time I wanted to address a few things I felt were important to setup so that future improvements are easy to introduce: # We now generate partitions predictably, so when we perform queries we can be sure we're using data that is relevant to the partition we're operating over # We explicitly generate multi-row partitions, with configurable distirbution of clustering components # We can support multiple queries / inserts simultaneously in the new path # The new path is executed with a more standard syntax (it's execute with stress user, instead of stress write/read; can perform e.g. inserts/queries with "stress user ops(insert=1,query=10)" for 90/10 read/write workload) # I've switched configs to all support the range of distributions we could previously (including for size, etc.) # All old paths use the same partition generators as the new paths to keep maintenance and extension simpler # I've moved a few more config parameters into the yaml # We report partition and row statistics now Some other implications: # To simplify matters and maintenance, I've stripped from the old paths support for super columns, indexes and multi-gets, as we did not typically seem to exercise these paths and these are probably best encapsulated with the new ones # The old path now generates a lot more garbage, because the new path has to, so it will be slightly higher overhead than it was previously. We also only generate random data on the old path, so we may again see a decline in performance Some things still to do in near future; all of which reasonably easy but wanted to limit scope of refactor: # Support deletes # Support partial inserts/deletes (currently insert only supports writing the whole partition) # Support query result validation The diff is quite big, but I think a lot of the changes are due to package movements. The basic functionality of your patch is left intact, so hopefully it shouldn't be too tricky to figure out what's happening now. was (Author: benedict): I've pushed a version of these changes [here|https://github.com/belliottsmith/cassandra/tree/6146-cqlstress] I wanted to integrate the changes a bit more tightly with the old stress, so we didn't seem to simply have two different stresses only nominally related. At the same time I wanted to address a few things I felt were important to setup so that future improvements are easy to introduce: # We now generate partitions predictably, so when we perform queries we can be sure we're using data that is relevant to the partition we're operating over # We explicitly generate multi-row partitions, with configurable distirbution of clustering components # We can support multiple queries / inserts simultaneously in the new path # The new path is executed with a more standard syntax (it's execute with stress user, instead of stress write/read; can perform e.g. inserts/queries with "stress user ops(insert=1,query=10)" for 90/10 read/write workload) # I've switched configs to all support the range of distributions we could previously (including for size, etc.) # All old paths use the same partition generators as the new paths to keep maintenance and extension simpler # I've moved a few more config parameters into the yaml Some other implications: # To simplify matters and maintenance, I've stripped from the old paths support for super columns, indexes and multi-gets, as we did not typically seem to exercise these paths and these are probably best encapsulated with the new ones # The old path now generates a lot more garbage, because the new path has to, so it will be slightly higher overhead than it was previously. We also only generate random data on the old path, so we may again see a decline in performance Some things still to do in near future; all of which reasonably easy but wanted to limit scope of refactor: # Support deletes # Support partial inserts/deletes (currently insert only supports writing the whole partition) # Support query result validation The diff is quite big, but I think a lot of the changes are due to package movements. > CQL-native stress > - > > Key: CASSANDRA-6146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6146 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >A
[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004163#comment-14004163 ] T Jake Luciani edited comment on CASSANDRA-6146 at 5/21/14 12:40 AM: - We can certainly support an existing schema without requiring it in a yaml file. The thing I don't understand from your initial comment is how you would populate/query the table (especially tables with composite cells) without specifying a per column distribution and range? Your example only considers they key distribution, but in the case of wide rows you want a distribution of the 'wideness' of the rows. I think it's totally reasonable to expect someone to create a stress profile, it really is simpler to take the sample yaml and extend it vs trying to add lots of new command line flags to an already complicated. If you want to make the keyspace/table DDL optional that's easy. was (Author: tjake): We can certainly support an existing schema without requiring it in a yaml file. The thing don't understand from your initial comment is how you would populate/query the table (especially tables with composite cells) without specifying a per column distribution and range? Your example only considers they key distribution, but in the case of wide rows you want a distribution of the 'wideness' of the rows. I think it's totally reasonable to expect someone to create a stress profile, it really is simpler to take the sample yaml and extend it vs trying to add lots of new command line flags to an already complicated. If you want to make the keyspace/table DDL optional that's easy. > CQL-native stress > - > > Key: CASSANDRA-6146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6146 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: T Jake Luciani > Fix For: 2.1 rc1 > > Attachments: 6146.txt > > > The existing CQL "support" in stress is not worth discussing. We need to > start over, and we might as well kill two birds with one stone and move to > the native protocol while we're at it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13931630#comment-13931630 ] Benedict edited comment on CASSANDRA-6146 at 3/12/14 10:51 AM: --- Sorry for the glacial response on this [~mishail]. I've setup my JIRA search to include items awaiting my review so it shouldn't happen again. First impressions: # The installation instructions aren't as obvious as they could be. I tried a variety of fairly stupid things because it didn't make it clear (to me) exactly which releases I should be downloading/unpacking where. Would be good to state unambiguously that you should download the CqlJMeter binary release, and a JMeter binary release, and that tar -xjf \{CqlJMeter.tgz\} -C \{Unpacked JMeter Root Directory\} will get everything working. # There is no support for prepared statements. # If I simply tweak the example write plan, and have it run > 10k samples per thread, it pretty much immediately exhausts C* of its available file handles. Now, by default on my system the number of available file handles is not very many, but something is going wrong if I can exhaust them with only the default 10 threads configured (cassandra-stress does not exhaust them with hundreds of threads). # It would be good to have the examples unpack to one of JMeter's examples/templates directory. You see a mongodb template in there straight away (bundled), when you click "open", so it would be nice to have Cassandra there without having to go hunting somewhere else for it. 2 and 3 are pretty much show stoppers for me. Once they're sorted, it might be worth trying to package it with JMeter, since mongodb is. Personally I also don't find JMeter to be an easy or intuitive way to get running quickly with stress testing, but that's a completely separate matter, and your bundled plan does make it much more straight forward than it would otherwise be. It looks like it is quite powerful and certainly very expressive, so for serious/complex stress testing and benchmarking it looks like it could be a great tool. was (Author: benedict): Sorry for the glacial response on this [~mishail]. I've setup my JIRA search to include items awaiting my review so it shouldn't happen again. First impressions: 1) The installation instructions aren't as obvious as they could be. I tried a variety of fairly stupid things because it didn't make it clear (to me) exactly which releases I should be downloading/unpacking where. Would be good to state unambiguously that you should download the CqlJMeter binary release, and a JMeter binary release, and that tar -xjf \{CqlJMeter.tgz\} -C \{Unpacked JMeter Root Directory\} will get everything working. 2) There is no support for prepared statements. 3) If I simply tweak the example write plan, and have it run > 10k samples per thread, it pretty much immediately exhausts C* of its available file handles. Now, by default on my system the number of available file handles is not very many, but something is going wrong if I can exhaust them with only the default 10 threads configured (cassandra-stress does not exhaust them with hundreds of threads). 4) It would be good to have the examples unpack to one of JMeter's examples/templates directory. You see a mongodb template in there straight away (bundled), when you click "open", so it would be nice to have Cassandra there without having to go hunting somewhere else for it. 2 and 3 are pretty much show stoppers for me. Once they're sorted, it might be worth trying to package it with JMeter, since mongodb is. Personally I also don't find JMeter to be an easy or intuitive way to get running quickly with stress testing, but that's a completely separate matter, and your bundled plan does make it much more straight forward than it would otherwise be. It looks like it is quite powerful and certainly very expressive, so for serious/complex stress testing and benchmarking it looks like it could be a great tool. > CQL-native stress > - > > Key: CASSANDRA-6146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6146 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis > Fix For: 2.1 beta2 > > > The existing CQL "support" in stress is not worth discussing. We need to > start over, and we might as well kill two birds with one stone and move to > the native protocol while we're at it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812543#comment-13812543 ] Mikhail Stepura edited comment on CASSANDRA-6146 at 11/4/13 2:20 AM: - Is it worth trying to implement CQL (via DataStax Java Driver) plugin for JMeter? It would allow users to build their own test plans. https://jmeter.apache.org/usermanual/build-db-test-plan.html https://github.com/Netflix/CassJMeter was (Author: mishail): Is it worth trying to implement CQL (via DataStax Java Driver) plugin for JMeter? It would allow users to build their own test plans? https://jmeter.apache.org/usermanual/build-db-test-plan.html https://github.com/Netflix/CassJMeter > CQL-native stress > - > > Key: CASSANDRA-6146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6146 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis > > The existing CQL "support" in stress is not worth discussing. We need to > start over, and we might as well kill two birds with one stone and move to > the native protocol while we're at it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (CASSANDRA-6146) CQL-native stress
[ https://issues.apache.org/jira/browse/CASSANDRA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805709#comment-13805709 ] Benedict edited comment on CASSANDRA-6146 at 10/25/13 9:20 PM: --- Hi [~mishail], You might want to take a look at my patch for [6199|https://issues.apache.org/jira/browse/CASSANDRA-6199], and make your changes there. Adding support for custom CQL operations like this should be as easy as copying the CqlReader class and, mostly, deleting a few unnecessary lines. The options parser may need a few minutes to figure out how to add another command. Take a look in Command, SettingsCommand and StressSettings. The interesting bit will be inserts - automatic detection of table structure and selection of sensible generators for those columns should be fun, and a great feature. A couple of new data generators may be needed, but should be pretty easy. If you want to get really exciting, support for interleaving multiple statements shouldn't be too difficult, as it's currently supported for other operations in mixed mode - this would require a little bit of rejigging, but probably not too much. Any questions, feel free to ping me here, email or irc was (Author: benedict): Hi [~mishail], You might want to take a look at my patch for [6199|https://issues.apache.org/jira/browse/CASSANDRA-6199], and make your changes there. Adding support for custom CQL operations like this should be as easy as copying the CqlReader class and, mostly, deleting a few unnecessary lines. The options parser may need a few minutes to figure out how to add another command. Take a look in Command, SettingsCommand and StressSettings. The interesting bit will be inserts - automatic detection of table structure and selection of sensible generators for those columns should be fun, and a great feature. A couple of new data generators may be needed, but should be pretty easy. If you want to get really exciting, support for interleaving multiple statements shouldn't be too difficult, as it's currently supported for other operations in mixed mode - this would require a little bit of rejigging, but probably not too much. > CQL-native stress > - > > Key: CASSANDRA-6146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6146 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis > > The existing CQL "support" in stress is not worth discussing. We need to > start over, and we might as well kill two birds with one stone and move to > the native protocol while we're at it. -- This message was sent by Atlassian JIRA (v6.1#6144)