Re: High Compactions Pending
Whats the output of 'nodetool compactionstats'? Is concurrent_compactors not set in your cassandra.yaml? Any Exception or Error 's in the system.log or output.log? --- Chris Lohfink On Sep 22, 2014, at 9:50 PM, Arun wrote: > Its constant since 4 hours. Remaining nodes have around 10 compactions. We > have 4 column families. > > > On Sep 22, 2014, at 19:39, Chris Lohfink wrote: > >> 35 isn't that high really in some scenarios (ie, theres a lot of column >> families), is it continuing to climb or does it drop down shortly after? >> >> --- >> Chris Lohfink >> >> On Sep 22, 2014, at 7:57 PM, arun sirimalla wrote: >> >>> I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE running on it. I >>> notice high compaction pending on one of the node around 35. >>> Compaction throughput set to 64 MB and flush writes to 4. Any suggestion is >>> much appreciated. >>> >>> -- >>> Arun >>> Senior Hadoop Engineer >>> Cloudwick >>> >>> Champion of Big Data >>> http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html >>
Re: High Compactions Pending
Its constant since 4 hours. Remaining nodes have around 10 compactions. We have 4 column families. > On Sep 22, 2014, at 19:39, Chris Lohfink wrote: > > 35 isn't that high really in some scenarios (ie, theres a lot of column > families), is it continuing to climb or does it drop down shortly after? > > --- > Chris Lohfink > >> On Sep 22, 2014, at 7:57 PM, arun sirimalla wrote: >> >> I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE running on it. I >> notice high compaction pending on one of the node around 35. >> Compaction throughput set to 64 MB and flush writes to 4. Any suggestion is >> much appreciated. >> >> -- >> Arun >> Senior Hadoop Engineer >> Cloudwick >> >> Champion of Big Data >> http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html >
Re: High Compactions Pending
35 isn't that high really in some scenarios (ie, theres a lot of column families), is it continuing to climb or does it drop down shortly after? --- Chris Lohfink On Sep 22, 2014, at 7:57 PM, arun sirimalla wrote: > I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE running on it. I > notice high compaction pending on one of the node around 35. > Compaction throughput set to 64 MB and flush writes to 4. Any suggestion is > much appreciated. > > -- > Arun > Senior Hadoop Engineer > Cloudwick > > Champion of Big Data > http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html
How to avoid column family duplication (when query requires multiple restrictions)
Hi, I have a column family storing very large blobs that I would not like to duplicate, if possible. Here's a simplified version: CREATE TABLE timeline ( key text, a int, b int, value blob, PRIMARY KEY (key, a, b) ); On this, I run exactly two types of query. Both of them must have a query range on 'a', and just one must have 'b' restricted. First query: cqlsh> SELECT * FROM timeline where key = 'event' and a >= 2 and a <= 3; This one runs fine. Second query: cqlsh> SELECT * FROM timeline where key = 'event' and a >= 2 and a <= 3 and b = 12; code=2200 [Invalid query] message="PRIMARY KEY column "b" cannot be restricted (preceding column "ColumnDefinition{name=a, type=org.apache.cassandra.db.marshal.Int32Type, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}" is either not restricted or by a non-EQ relation)" This fails. Even if I create an index: CREATE INDEX timeline_b ON timeline (b); cqlsh> SELECT * FROM timeline where key = 'event' and a >= 2 and a <= 3 and b = 12; code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING" I solved this problem by duplicating the column family (in "timeline_by_a" and "timeline_by_b" where a and b are in opposite order), but I'm wondering if there's a better solution, as this tends to grow pretty big. In particular, from the little understanding that I have of the Cassandra internals, it seems like even the second query should be fairly efficient since the clustering columns are stored in order on disk, thus I don't understand the ALLOW FILTERING requirement. Another alternative that I'm thinking is just keeping another column family that will serve as an "index" and I'll manually manage it in the application. Thanks.
High Compactions Pending
I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE running on it. I notice high compaction pending on one of the node around 35. Compaction throughput set to 64 MB and flush writes to 4. Any suggestion is much appreciated. -- Arun Senior Hadoop Engineer Cloudwick Champion of Big Data http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html
Re: CPU consumption of Cassandra
Eric, We have a new stress tool to help you share your schema for wider bench marking. see http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema If you wouldn't mind creating a yaml for your schema I would be happy to take a look. -Jake On Mon, Sep 22, 2014 at 12:39 PM, Leleu Eric wrote: > Hi, > > > > > > I’m currently testing Cassandra 2.0.9 (and since the last week 2.1) under > some read heavy load… > > > > I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM > and 8 Cores. > > I have around 93GB of data per node (one Disk of 300GB with SAS interface > and a Rotational Speed of 10500) > > > > I have 300 active client threads and they request the C* nodes with a > Consitency level set to ONE (I’m using the CQL datastax driver). > > > > During my tests I saw a lot of CPU consumption (70% user / 6%sys / 4% > iowait / 20%idle). > > C* nodes respond to around 5000 op/s (sometime up to 6000op/s) > > > > I try to profile a node and at the first look, 60% of the CPU is passed in > the “sun.nio.ch” package. (SelectorImpl.select or Channel.read) > > > > I know that Benchmark results are highly dependent of the Dataset and use > cases, but according to my point of view this CPU consumption is normal > according to the load. > > Someone can confirm that point ? > > According to my Hardware configuration, can I expect to have more than > 6000 read op/s ? > > > > > > Regards, > > Eric > > > > > > > > > > -- > > Ce message et les pièces jointes sont confidentiels et réservés à l'usage > exclusif de ses destinataires. Il peut également être protégé par le secret > professionnel. Si vous recevez ce message par erreur, merci d'en avertir > immédiatement l'expéditeur et de le détruire. L'intégrité du message ne > pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra > être recherchée quant au contenu de ce message. Bien que les meilleurs > efforts soient faits pour maintenir cette transmission exempte de tout > virus, l'expéditeur ne donne aucune garantie à cet égard et sa > responsabilité ne saurait être recherchée pour tout dommage résultant d'un > virus transmis. > > This e-mail and the documents attached are confidential and intended > solely for the addressee; it may also be privileged. If you receive this > e-mail in error, please notify the sender immediately and destroy it. As > its integrity cannot be secured on the Internet, the Worldline liability > cannot be triggered for the message content. Although the sender endeavours > to maintain a computer virus-free network, the sender does not warrant that > this transmission is virus-free and will not be liable for any damages > resulting from any virus transmitted. > -- http://twitter.com/tjake
Re: cassandra 2.1.0 unable to use cqlsh
Hi Adam, Ok thanks again for the tips there! So I fell back to the stock configuration of cassandra 2.1.0 and setup my environment variables... and I was able to get cqlsh to work! [root@beta-new:~] #cqlsh Connected to mydomain Cluster at beta-new.mydomain.com:9042. [cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh> Thanks! Tim On Mon, Sep 22, 2014 at 11:05 AM, Adam Holmberg wrote: > cqlsh in Cassandra 2.1.0 uses the DataStax python driver. The > "cassandra.metadata" module is provided by this package. By default it uses > the driver from an archive included in the Cassandra distribution > (.../lib/cassandra-driver-internal-only-2.1.0.zip). > > See /usr/local/apache-cassandra-2.1.0/bin/cqlsh for how everything gets > setup -- it's possible your wrapper or environment are not playing well > with that. > > Also note that "9160" will not apply anymore since this driver uses the > native protocol (9042). > > Adam > > On Sun, Sep 21, 2014 at 7:53 PM, Tim Dunphy wrote: > >> Hey all, >> >> I've just upgraded to the latest cassandra on my site with version 2.1.0. >> >> But now when I run the command I am getting the following error: >> >> [root@beta-new:/usr/local] #cqlsh >> Traceback (most recent call last): >> File "/etc/alternatives/cassandrahome/bin/cqlsh-old", line 113, in >> >> from cqlshlib import cqlhandling, cql3handling, pylexotron >> File >> "/usr/local/apache-cassandra-2.1.0/bin/../pylib/cqlshlib/cql3handling.py", >> line 18, in >> from cassandra.metadata import maybe_escape_name >> ImportError: No module named cassandra.metadata >> >> Just to clarify some of the above output, all my 'cqlsh' command does is >> automatically fill in some values I'd like to use as defaults and then >> invoke the real command which I've named 'cqlsh-old'. Just a quirk of my >> setup that's always allowed cqlsh to be invoked without issue across >> multiple upgrades. >> >> [root@beta-new:/usr/local] #cat /etc/alternatives/cassandrahome/bin/cqlsh >> #!/bin/sh >> /etc/alternatives/cassandrahome/bin/cqlsh-old beta-new.mydomain.com 9160 >> --cqlversion="3.0.0" >> >> I'd appreciate any advice you could spare on how to get around this >> error! >> >> Thanks >> Tim >> >> -- >> GPG me!! >> >> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B >> >> > -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: CPU consumption of Cassandra
Its going to depend a lot on your data model but 5-6k is on the low end of what I would expect. N=RF=2 is not really something I would recommend. That said 93GB is not much data so the bottleneck may exist more in your data model, queries, or client. What profiler are you using? The cpu on the select/read is marked as RUNNABLE but its really more of a wait state that may throw some profilers off, it may be a red haring. --- Chris Lohfink On Sep 22, 2014, at 11:39 AM, Leleu Eric wrote: > Hi, > > > I’m currently testing Cassandra 2.0.9 (and since the last week 2.1) under > some read heavy load… > > I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM and > 8 Cores. > I have around 93GB of data per node (one Disk of 300GB with SAS interface and > a Rotational Speed of 10500) > > I have 300 active client threads and they request the C* nodes with a > Consitency level set to ONE (I’m using the CQL datastax driver). > > During my tests I saw a lot of CPU consumption (70% user / 6%sys / 4% iowait > / 20%idle). > C* nodes respond to around 5000 op/s (sometime up to 6000op/s) > > I try to profile a node and at the first look, 60% of the CPU is passed in > the “sun.nio.ch” package. (SelectorImpl.select or Channel.read) > > I know that Benchmark results are highly dependent of the Dataset and use > cases, but according to my point of view this CPU consumption is normal > according to the load. > Someone can confirm that point ? > According to my Hardware configuration, can I expect to have more than 6000 > read op/s ? > > > Regards, > Eric > > > > > > > Ce message et les pièces jointes sont confidentiels et réservés à l'usage > exclusif de ses destinataires. Il peut également être protégé par le secret > professionnel. Si vous recevez ce message par erreur, merci d'en avertir > immédiatement l'expéditeur et de le détruire. L'intégrité du message ne > pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra > être recherchée quant au contenu de ce message. Bien que les meilleurs > efforts soient faits pour maintenir cette transmission exempte de tout virus, > l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne > saurait être recherchée pour tout dommage résultant d'un virus transmis. > > This e-mail and the documents attached are confidential and intended solely > for the addressee; it may also be privileged. If you receive this e-mail in > error, please notify the sender immediately and destroy it. As its integrity > cannot be secured on the Internet, the Worldline liability cannot be > triggered for the message content. Although the sender endeavours to maintain > a computer virus-free network, the sender does not warrant that this > transmission is virus-free and will not be liable for any damages resulting > from any virus transmitted.
Re: Help with approach to remove RDBMS schema from code to move to C*?
Thanks everyone for the responses. One thing I'd forgotten about was the need to model the CFs with regard to the kind of queries that are needed. Fortunately this is primarily a write-once/read-many type of application, so deletions are not currently a concern, but worth keeping in mind for the future. Les On Sat, Sep 20, 2014 at 6:45 AM, Brice Dutheil wrote: > I’m fairly new to cassandra, but here’s my input. > > Think of your column families as a projection of how the application needs > them. Thinking with CQRS in mind helps. So with more CFs that may require > more space, as data may be written differently in different column families > for different usage. For that reason you have to think about the disk > usage, considering the growth of the data, the space needed for cassandra > to perform compaction and other stuff. > > Also on the modeling front, pay attention to growing wide rows, i.e. when > updating or deleting column in such row may adds too many tombstones ( > tombstone_failure_threshold default is 100 000), which may cause > cassandra to abort queries on such rows (before compaction) because it have > to load this partition in memory to actually output the actual data. > This is especially important for time series. We had to rework our model > to bucket by period, to avoid such cases. However this will require some > work on the business code to query such a column family. > > Avoid secondary indexes, which somehow relate to modeling per usage hence > removing their need. > > Cheers, > — Brice > > On Sat, Sep 20, 2014 at 6:55 AM, Jack Krupansky > wrote: > > Start by asking how you intend to query the data. That should drive the >> data model. >> >> Is there existing app client code or an app layer that is already using >> the current schema, or are you intending to rewrite that as well. >> >> FWIW, you could place the numeric columns in a numeric map collection, >> and the string columns in a string map collection, but... it’s best to >> first step back and look at the big picture of what the data actually looks >> like as well as how you want to query it. >> >> -- Jack Krupansky >> >> *From:* Les Hartzman >> *Sent:* Friday, September 19, 2014 5:46 PM >> *To:* user@cassandra.apache.org >> *Subject:* Help with approach to remove RDBMS schema from code to move >> to C*? >> >> My company is using an RDBMS for storing time-series data. This >> application was developed before Cassandra and NoSQL. I'd like to move to >> C*, but ... >> >> The application supports data coming from multiple models of devices. >> Because there is enough variability in the data, the main table to hold the >> device data only has some core columns defined. The other columns are >> non-specific; a set of columns for numeric and a set for character. So for >> these non-specific columns, their use is defined in the code. The use of >> column 'numeric_1' might hold a millisecond time for one device and a fault >> code for another device. This appears to have been done to keep from >> modifying the schema whenever a new device was introduced. And they rolled >> their own db interface to support this mess. >> >> Now, we could just use C* like an RDBMS - defining CFs to mimic the >> tables. But this just pushes a bad design from one platform to another. >> >> Clearly there needs to be a code re-write. But what suggestions does >> anyone have on how to make this shift to C*? >> >> Would you just layout all of the columns represented by the different >> devices, naming them as they are used, and having jagged rows? Or is there >> some other way to approach this? >> >> Of course, the data miners already have scripts/methods for accessing the >> data from the RDBMS now in the user-unfriendly form it's in now. This would >> have to be addressed as well, but until I know how to store it, mining it >> gets ahead of things. >> >> Thanks. >> >> Les >> >> > >
Re: Named Parameters in Prepared Statement
Yes, you can bind parameters by name: ``` INSERT INTO songs (id, title, album, artist) VALUES (:id, :title, :album, :artist) ``` All DataStax drivers for Cassandra support this feature. In Java it looks like // prepare only once PreparedStatememt pstmt = session.prepare("INSERT INTO songs (id, title, album, artist) VALUES (:id, :title, :album, :artist)") // later BoundStatement stmt = new BoundStatement(pstmt); stmt.setLong("id", 1234); stmt.setString("title", "Example title"); On Mon, Sep 22, 2014 at 4:41 AM, Timmy Turner wrote: > Looking through the CQL 3.1 grammar in Cassandra, I found a "':' ident" > alternative in the "value" rule (line 961). > > Is this for binding named parameters in prepared statements? Is this > currently supported by any of the drivers or in Cassandra (2.1) itself? > > Looking at the docs and the current Java driver it doesn't seem that way. > -- :- a) Alex Popescu Sen. Product Manager @ DataStax @al3xandru
CPU consumption of Cassandra
Hi, I'm currently testing Cassandra 2.0.9 (and since the last week 2.1) under some read heavy load... I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM and 8 Cores. I have around 93GB of data per node (one Disk of 300GB with SAS interface and a Rotational Speed of 10500) I have 300 active client threads and they request the C* nodes with a Consitency level set to ONE (I'm using the CQL datastax driver). During my tests I saw a lot of CPU consumption (70% user / 6%sys / 4% iowait / 20%idle). C* nodes respond to around 5000 op/s (sometime up to 6000op/s) I try to profile a node and at the first look, 60% of the CPU is passed in the "sun.nio.ch" package. (SelectorImpl.select or Channel.read) I know that Benchmark results are highly dependent of the Dataset and use cases, but according to my point of view this CPU consumption is normal according to the load. Someone can confirm that point ? According to my Hardware configuration, can I expect to have more than 6000 read op/s ? Regards, Eric Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant ?tre assur?e sur Internet, la responsabilit? de Worldline ne pourra ?tre recherch?e quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne saurait ?tre recherch?e pour tout dommage r?sultant d'un virus transmis. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.
Re: cassandra 2.1.0 unable to use cqlsh
> > cqlsh in Cassandra 2.1.0 uses the DataStax python driver. The > "cassandra.metadata" module is provided by this package. By default it uses > the driver from an archive included in the Cassandra distribution > (.../lib/cassandra-driver-internal-only-2.1.0.zip). Ok that's really good to know. > > See /usr/local/apache-cassandra-2.1.0/bin/cqlsh for how everything gets > setup -- it's possible your wrapper or environment are not playing well > with that. > Also note that "9160" will not apply anymore since this driver uses the > native protocol (9042). OK yes very possible. I'll try working with what's originally there and if need be make any alterations I'll need to. Thanks! Tim On Mon, Sep 22, 2014 at 11:05 AM, Adam Holmberg wrote: > cqlsh in Cassandra 2.1.0 uses the DataStax python driver. The > "cassandra.metadata" module is provided by this package. By default it uses > the driver from an archive included in the Cassandra distribution > (.../lib/cassandra-driver-internal-only-2.1.0.zip). > > See /usr/local/apache-cassandra-2.1.0/bin/cqlsh for how everything gets > setup -- it's possible your wrapper or environment are not playing well > with that. > > Also note that "9160" will not apply anymore since this driver uses the > native protocol (9042). > > Adam > > On Sun, Sep 21, 2014 at 7:53 PM, Tim Dunphy wrote: > >> Hey all, >> >> I've just upgraded to the latest cassandra on my site with version 2.1.0. >> >> But now when I run the command I am getting the following error: >> >> [root@beta-new:/usr/local] #cqlsh >> Traceback (most recent call last): >> File "/etc/alternatives/cassandrahome/bin/cqlsh-old", line 113, in >> >> from cqlshlib import cqlhandling, cql3handling, pylexotron >> File >> "/usr/local/apache-cassandra-2.1.0/bin/../pylib/cqlshlib/cql3handling.py", >> line 18, in >> from cassandra.metadata import maybe_escape_name >> ImportError: No module named cassandra.metadata >> >> Just to clarify some of the above output, all my 'cqlsh' command does is >> automatically fill in some values I'd like to use as defaults and then >> invoke the real command which I've named 'cqlsh-old'. Just a quirk of my >> setup that's always allowed cqlsh to be invoked without issue across >> multiple upgrades. >> >> [root@beta-new:/usr/local] #cat /etc/alternatives/cassandrahome/bin/cqlsh >> #!/bin/sh >> /etc/alternatives/cassandrahome/bin/cqlsh-old beta-new.mydomain.com 9160 >> --cqlversion="3.0.0" >> >> I'd appreciate any advice you could spare on how to get around this >> error! >> >> Thanks >> Tim >> >> -- >> GPG me!! >> >> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B >> >> > -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: cassandra 2.1.0 unable to use cqlsh
cqlsh in Cassandra 2.1.0 uses the DataStax python driver. The "cassandra.metadata" module is provided by this package. By default it uses the driver from an archive included in the Cassandra distribution (.../lib/cassandra-driver-internal-only-2.1.0.zip). See /usr/local/apache-cassandra-2.1.0/bin/cqlsh for how everything gets setup -- it's possible your wrapper or environment are not playing well with that. Also note that "9160" will not apply anymore since this driver uses the native protocol (9042). Adam On Sun, Sep 21, 2014 at 7:53 PM, Tim Dunphy wrote: > Hey all, > > I've just upgraded to the latest cassandra on my site with version 2.1.0. > > But now when I run the command I am getting the following error: > > [root@beta-new:/usr/local] #cqlsh > Traceback (most recent call last): > File "/etc/alternatives/cassandrahome/bin/cqlsh-old", line 113, in > > from cqlshlib import cqlhandling, cql3handling, pylexotron > File > "/usr/local/apache-cassandra-2.1.0/bin/../pylib/cqlshlib/cql3handling.py", > line 18, in > from cassandra.metadata import maybe_escape_name > ImportError: No module named cassandra.metadata > > Just to clarify some of the above output, all my 'cqlsh' command does is > automatically fill in some values I'd like to use as defaults and then > invoke the real command which I've named 'cqlsh-old'. Just a quirk of my > setup that's always allowed cqlsh to be invoked without issue across > multiple upgrades. > > [root@beta-new:/usr/local] #cat /etc/alternatives/cassandrahome/bin/cqlsh > #!/bin/sh > /etc/alternatives/cassandrahome/bin/cqlsh-old beta-new.mydomain.com 9160 > --cqlversion="3.0.0" > > I'd appreciate any advice you could spare on how to get around this error! > > Thanks > Tim > > -- > GPG me!! > > gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B > >
Fwd: Casssandra cluster setup.
> Hi All > > > > I am trying to configure a Cassandra cluster with two nodes. I am new to Cassandra. > > > > I am using datastax distribution of Cassandra ( windows). I have installed the same in two nodes and configured it works as a separate instance but not as cluster. > > > > The key changes I made in Cassandra.yaml is as follows as suggested by http://www.datastax.com/documentation/cassandra/1.2/cassandra/initialize/initializeSingleDS.html > > > > Configuration setting for 10.144.32.134 > > > > num_tokens: 256 > > - seeds: "10.144.32.134,10.137.12.84" > > listen_address: 10.144.32.134 > > endpoint_snitch: RackInferringSnitch > > rpc_address: 0.0.0.0 > > > > Configuration setting for 10.137.12.84 > > > > num_tokens: 256 > > - seeds: "10.144.32.134,10.137.12.84" > > listen_address: 10.137.12.84 > > endpoint_snitch: RackInferringSnitch > > rpc_address: 0.0.0.0 > > > > post this configuration am able to start the services as usual and see the status as up. > > > > Nodetool Status from 134 ( server) > > > > D:\Program Files\DataStax Community\apache-cassandra\bin>nodetool -h localhost status > > Starting NodeTool > > Note: Ownership information does not include topology; for complete information, specify a keyspace > > Datacenter: 144 > > > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/Moving > > -- AddressLoad Tokens Owns Host ID Rack > > UN 10.144.32.134 40.03 MB 256 100.0% c791918a-8fec-4c5c-ab83-1a3525c51b70 32 > > > > Nodetool Status from 84 > > > > C:\Program Files\DataStax Community\apache-cassandra\bin>nodetool.bat status > > Starting NodeTool > > Note: Ownership information does not include topology; for complete information, specify a keyspace > > Datacenter: 137 > > > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/Moving > > -- Address Load Tokens Owns Host ID Rack > > UN 10.137.12.84 69.58 KB 2 100.0% f842ea74-8eef-4c82-80d4-3e06e7a00deb 12 > > C:\Program Files\DataStax Community\apache-cassandra\bin> > > > > > > Can you please suggest me how to fix this issue. > > > > Regards > > Muthukumar.S >
Re: Cassandra Data Model design
Cassandra partitions data across the cluster based on PK, thus is optimized for WHERE PK=... You are doing table scans, the opposite of what a distributed system is designed for. However, some users find Solr helps with queries like yours. To learn what C* is good at, read this: http://planetcassandra.org/blog/getting-started-with-time-series-data-modeling/ Thanks, James Briggs. -- Cassandra/MySQL DBA. Available in San Jose area or remote. cass_top: https://github.com/jamesbriggs/cassandra-top From: Check Peck To: user Sent: Wednesday, September 17, 2014 3:35 PM Subject: Re: Cassandra Data Model design It takes around more than 50 seconds to return back 500 records from cqlsh command not from the code so that's why I am saying it is pretty slow. On Wed, Sep 17, 2014 at 3:17 PM, Hao Cheng wrote: How slow is slow? Regardless of the data model question, in my experience 500 rows of relatively light content should be lightning fast. Looking at my performance results on a test cluster of 3x r3.large AWS instances, we reach an op rate on Cassandra's stress test of at least 1000 operations per second and on average 7500 operations for second over the stress test data set. > > >More broadly, it seems like you would benefit from either deltas (only >retrieve new data) or something like paging (only retrieve currently relevant >data), although its really difficult to say without more information. > > >On Wed, Sep 17, 2014 at 1:01 PM, Check Peck wrote: > >I have recently started working with Cassandra. We have cassandra cluster >which is using DSE 4.0 version and has VNODES enabled. We have a tables like >this - >> >>Below is my first table - >> >>CREATE TABLE customers ( >> customer_id int PRIMARY KEY, >> last_modified_date timeuuid, >> customer_value text >>) >> >>Read query pattern is like this on above table as of now since we need to get >>everything from above table and load it into our application memory every x >>minutes. >> >>select customer_id, customer_value from datakeyspace.customers; >> >>We have second table like this - >> >>CREATE TABLE client_data ( >> client_name text PRIMARY KEY, >> client_id text, >> creation_date timestamp, >> is_valid int, >> last_modified_date timestamp >>) >> >>Right now in the above table, we have 500 records and all those records has >>"is_valid" column value set as 1. And the read query pattern is like this on >>above table as of now since we need to get everything from above table and >>load it into our application memory every x minutes so the below query will >>return me all 500 records since everything has is_valid set to 1. >> >>select client_name, client_id from datakeyspace.client_data where >> is_valid=1; >> >>Since our cluster is VNODES enabled so my above query pattern is not >>efficient at all and it is taking lot of time to get the data from Cassandra. >>We are reading from these table with consistency level QUORUM. >> >>Is there any possibility of improving our data model? >> >>Any suggestions will be greatly appreciated. >> >
Named Parameters in Prepared Statement
Looking through the CQL 3.1 grammar in Cassandra, I found a "':' ident" alternative in the "value" rule (line 961). Is this for binding named parameters in prepared statements? Is this currently supported by any of the drivers or in Cassandra (2.1) itself? Looking at the docs and the current Java driver it doesn't seem that way.