Doubt
Dear All, We have a requirement to store 'N' columns of an entity in a CF. Mostly this is write once and read many times. What is the best way to store the data? Composite CF Simple CF with value as protobuf extracted data Both provides extendable columns which is a requirement for our usage. But I want to know which one is efficient, assuming there is bound to be say 5% of updates? Regards, Jagan
Re: Doubt
Generally Ive seen it recommended to do a composite CF since it gives you more flexibility and its easier to debug. You can get some performance improvements by storing a serialized blob (a lot of data you can represent much smaller this way by factor of 10 or more if clever) to represent your entity but the complexity is rarely worth it. It is likely a premature optimization but I have seen cases its shown a good improvement. either case, the data will ultimately be read sequentially from disk per sstable (normal bottleneck) so the only benefit you gain is - potentially disk space (if serialization is efficient) and network bandwidth - Cassandra won’t have to deserialize as many columns, but I’m fairly certain this is utterly irrelevant - if stored in a mechanism that you can deserialize efficiently (like protobufs) it can make a big difference on your app side keep in mind if serializing data though you will have to always maintain code that will be able to read old versions, it can become very complex and lead to weird bugs. --- Chris Lohfink On Apr 21, 2014, at 3:53 AM, Jagan Ranganathan wrote: > Dear All, > > We have a requirement to store 'N' columns of an entity in a CF. Mostly this > is write once and read many times. What is the best way to store the data? > Composite CF > Simple CF with value as protobuf extracted data > Both provides extendable columns which is a requirement for our usage. > > But I want to know which one is efficient, assuming there is bound to be say > 5% of updates? > > Regards, > Jagan
RE: Doubt regarding CQL
FYI .. I am using 1.0.7 version on Ubuntu 11.10 Need help asap From: Rishabh Agrawal Sent: Wednesday, February 22, 2012 11:49 AM To: user@cassandra.apache.org Subject: Doubt regarding CQL Hello I have installed CQL drivers for python. When I try execute cqlsh I get following error cql-1.0.3$ cqlsh localhost 9160 Traceback (most recent call last): File "/usr/local/bin/cqlsh", line 33, in import cql File "/usr/local/lib/python2.7/dist-packages/cql/__init__.py", line 22, in import connection File "/usr/local/lib/python2.7/dist-packages/cql/connection.py", line 18, in < module> from cursor import Cursor File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 24, in from cql.cassandra.ttypes import ( File "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py", line 7, in from thrift.Thrift import * ImportError: No module named thrift.Thrift Kindly help me with that asap. Thanks and Regards Rishabh Agrawal Impetus' Head of Innovation labs, Vineet Tyagi will be presenting on 'Big Data Big Costs?' at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7. Listen to our webcast 'Hybrid Approach to Extend Web Apps to Tablets & Smartphones' available at http://bit.ly/yQC1oD. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. Impetus' Head of Innovation labs, Vineet Tyagi will be presenting on 'Big Data Big Costs?' at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7. Listen to our webcast 'Hybrid Approach to Extend Web Apps to Tablets & Smartphones' available at http://bit.ly/yQC1oD. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Doubt regarding CQL
On Wednesday 22 of February 2012, Rishabh Agrawal wrote: > I have installed CQL drivers for python. When I try execute cqlsh I get > following error > cql-1.0.3$ cqlsh localhost 9160 > (...) > File "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py", > line 7, in from thrift.Thrift import * > ImportError: No module named thrift.Thrift Seems you do not have installed python thrift module. In my distro (PLD) it is: Package:python-thrift-0.5.0-4.i686 /usr/lib/python2.7/site-packages: Thrift-0.1-py2.7.egg-info, /usr/lib/python2.7/site-packages/thrift: TSCons.pyc, TSCons.pyo, TSerialization.pyc, TSerialization.pyo, Thrift.pyc, Thrift.pyo, __init__.pyc, __init__.pyo, /usr/lib/python2.7/site-packages/thrift/protocol: TBinaryProtocol.pyc, TBinaryProtocol.pyo, TCompactProtocol.pyc, TCompactProtocol.pyo, TProtocol.pyc, TProtocol.pyo, __init__.pyc, __init__.pyo, fastbinary.so /usr/lib/python2.7/site-packages/thrift/server: THttpServer.pyc, THttpServer.pyo, TNonblockingServer.pyc, TNonblockingServer.pyo, TServer.pyc, TServer.pyo, __init__.pyc, __init__.pyo /usr/lib/python2.7/site-packages/thrift/transport: THttpClient.pyc, THttpClient.pyo, TSocket.pyc, TSocket.pyo, TTransport.pyc, TTransport.pyo, TTwisted.pyc, TTwisted.pyo, __init__.pyc, __init__.pyo Regards, -- Mateusz Korniak
RE: Doubt regarding CQL
Thanks for the reply I installed 0.8.0 drift package. But still problem persists. -Original Message- From: Mateusz Korniak [mailto:mateusz-li...@ant.gliwice.pl] Sent: Wednesday, February 22, 2012 1:47 PM To: user@cassandra.apache.org Subject: Re: Doubt regarding CQL On Wednesday 22 of February 2012, Rishabh Agrawal wrote: > I have installed CQL drivers for python. When I try execute cqlsh I > get following error cql-1.0.3$ cqlsh localhost 9160 > (...) > File > "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py", > line 7, in from thrift.Thrift import * > ImportError: No module named thrift.Thrift Seems you do not have installed python thrift module. In my distro (PLD) it is: Package:python-thrift-0.5.0-4.i686 /usr/lib/python2.7/site-packages: Thrift-0.1-py2.7.egg-info, /usr/lib/python2.7/site-packages/thrift: TSCons.pyc, TSCons.pyo, TSerialization.pyc, TSerialization.pyo, Thrift.pyc, Thrift.pyo, __init__.pyc, __init__.pyo, /usr/lib/python2.7/site-packages/thrift/protocol: TBinaryProtocol.pyc, TBinaryProtocol.pyo, TCompactProtocol.pyc, TCompactProtocol.pyo, TProtocol.pyc, TProtocol.pyo, __init__.pyc, __init__.pyo, fastbinary.so /usr/lib/python2.7/site-packages/thrift/server: THttpServer.pyc, THttpServer.pyo, TNonblockingServer.pyc, TNonblockingServer.pyo, TServer.pyc, TServer.pyo, __init__.pyc, __init__.pyo /usr/lib/python2.7/site-packages/thrift/transport: THttpClient.pyc, THttpClient.pyo, TSocket.pyc, TSocket.pyo, TTransport.pyc, TTransport.pyo, TTwisted.pyc, TTwisted.pyo, __init__.pyc, __init__.pyo Regards, -- Mateusz Korniak Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big Data Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7. Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets & Smartphones’ available at http://bit.ly/yQC1oD. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Doubt regarding CQL
Rishabh- It looks like you're not actually using the cqlsh that comes with Cassandra 1.0.7. Are you using an old version of the Python CQL driver? Old versions of the driver had cqlsh bundled with it, instead of with Cassandra. The 1.0.7 Debian/Ubuntu packages do not include cqlsh, because of some packaging+distribution difficulties (resolved in 1.1). One easy way to get cqlsh as part of a package is to use the free DataStax Community Edition: see http://www.datastax.com/products/community . Cqlsh is included in the "dsc" package. That package will also bring in thrift and any other dependencies you need. p On Wed, Feb 22, 2012 at 3:00 AM, Rishabh Agrawal < rishabh.agra...@impetus.co.in> wrote: > Thanks for the reply > I installed 0.8.0 drift package. But still problem persists. > > -Original Message- > From: Mateusz Korniak [mailto:mateusz-li...@ant.gliwice.pl] > Sent: Wednesday, February 22, 2012 1:47 PM > To: user@cassandra.apache.org > Subject: Re: Doubt regarding CQL > > On Wednesday 22 of February 2012, Rishabh Agrawal wrote: > > I have installed CQL drivers for python. When I try execute cqlsh I > > get following error cql-1.0.3$ cqlsh localhost 9160 > > (...) > > File > > "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py", > > line 7, in from thrift.Thrift import * > > ImportError: No module named thrift.Thrift > > Seems you do not have installed python thrift module. > > In my distro (PLD) it is: > Package:python-thrift-0.5.0-4.i686 > /usr/lib/python2.7/site-packages: Thrift-0.1-py2.7.egg-info, > /usr/lib/python2.7/site-packages/thrift: TSCons.pyc, TSCons.pyo, > TSerialization.pyc, TSerialization.pyo, Thrift.pyc, Thrift.pyo, > __init__.pyc, __init__.pyo, > /usr/lib/python2.7/site-packages/thrift/protocol: TBinaryProtocol.pyc, > TBinaryProtocol.pyo, TCompactProtocol.pyc, TCompactProtocol.pyo, > TProtocol.pyc, TProtocol.pyo, __init__.pyc, __init__.pyo, fastbinary.so > /usr/lib/python2.7/site-packages/thrift/server: THttpServer.pyc, > THttpServer.pyo, TNonblockingServer.pyc, TNonblockingServer.pyo, > TServer.pyc, TServer.pyo, __init__.pyc, __init__.pyo > /usr/lib/python2.7/site-packages/thrift/transport: THttpClient.pyc, > THttpClient.pyo, TSocket.pyc, TSocket.pyo, TTransport.pyc, TTransport.pyo, > TTwisted.pyc, TTwisted.pyo, __init__.pyc, __init__.pyo > > > Regards, > > -- > Mateusz Korniak > > > > Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big > Data Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1) > http://bit.ly/bSMWd7. > > Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets & > Smartphones’ available at http://bit.ly/yQC1oD. > > > NOTE: This message may contain information that is confidential, > proprietary, privileged or otherwise protected by law. The message is > intended solely for the named addressee. If received in error, please > destroy and notify the sender. Any use of this email is prohibited when > received in error. Impetus does not represent, warrant and/or guarantee, > that the integrity of this communication has been maintained nor that the > communication is free of errors, virus, interception or interference. >
Good partition key doubt
Hello folks I am studying Cassandra for a short a period of time and now I am modeling a database for study purposes. During my modeling I have faced a doubt, what is a good partition key? Is partition key direct related with my query performance? What is the best practices? Just to study case, let's suppose I have a column family where is inserted all kind of logs ( http server, application server, application logs, etc ) data from different servers. In this column family I have server_id ( unique identifier for each server ) column, log_type ( http server, application server, application log ) column and log_info column. Is a good ideia create a partition key using server_id and log_type columns to store all logs data from a specific type and server in a physical row? And if do I want a physical row for each day? Is a good idea add a third column with the date in the partition key? And if I want to query all logs in a period of time how can I select I range o rows? Do I have to duplicate date column ( considering I have to use = operator with partition key ) ? All the best -- Att. José Guilherme Vanz br.linkedin.com/pub/josé-guilherme-vanz/51/b27/58b/ <http://br.linkedin.com/pub/jos%C3%A9-guilherme-vanz/51/b27/58b/> "O sofrimento é passageiro, desistir é para sempre" - Bernardo Fonseca, recordista da Antarctic Ice Marathon.
Cassandra Delete Query Doubt
HI Team, I have one table below and want to delete data on this table. DELETE FROM game.tournament USING TIMESTAMP 161692578000 WHERE tournament_id = 1 AND version_id = 1 AND partition_id = 1; Cassandra internally manages the timestamp of each column when some data is updated on the same column. My Query is , *USING TIMESTAMP 161692578000* picks up a timestamp of which column ? CREATE TABLE game.tournament ( tournament_id bigint, version_id bigint, partition_id bigint, user_id bigint, created_at timestamp, rank bigint, score bigint, updated_at timestamp, PRIMARY KEY ((tournament_id, version_id, partition_id), user_id) ) WITH CLUSTERING ORDER BY (user_id ASC) -- Raman Gugnani
multi-node cassandra config doubt
Hi All, This is regarding multi-node cluster configuration doubt. I have configured 3 nodes of cluster using Cassandra-0.8.4 and getting error when I ran Map/Reduce job which uploads records from HDFS to Cassandra. Here are my 3 nodes cluster config file (cassandra.yaml) for Cassandra: node01: seeds: "node01,node02,node03" auto_bootstrap: false listen_address: 192.168.0.1 rpc_address: 192.168.0.1 node02: seeds: "node01,node02,node03" auto_bootstrap: true listen_address: 192.168.0.2 rpc_address: 192.168.0.2 node03: seeds: "node01,node02,node03" auto_bootstrap: true listen_address: 192.168.0.3 rpc_address: 192.168.0.3 When I ran M/R program, I am getting below error 11/08/23 04:37:00 INFO mapred.JobClient: map 100% reduce 11% 11/08/23 04:37:06 INFO mapred.JobClient: map 100% reduce 22% 11/08/23 04:37:09 INFO mapred.JobClient: map 100% reduce 33% 11/08/23 04:37:14 INFO mapred.JobClient: Task Id : attempt_201104211044_0719_r_00_0, Status : FAILED java.lang.NullPointerException at org.apache.cassandra.client.RingCache.getRange(RingCache.java:130) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:125) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:60) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:90) at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:1) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:563) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170) Is anything wrong on my cassandra.yaml file? I followed http://wiki.apache.org/cassandra/MultinodeCluster for cluster configuration. Regards, Thamizhannal
Re: Good partition key doubt
"what is a good partition key? Is partition key direct related with my query performance? What is the best practices?" A good partition key is a partition key that will scale with your data. An example: if you have a business involving individuals, it is likely that your business will scale as soon as the number of users will grow. In this case user_id is a good partition key because all the users will be uniformly distributed over all the Cassandra nodes. For your log example, using only server_id for partition key is clearly not enough because what will scale is the log lines, not the number of server. >From the point of view of scalability (not taking about query-ability), adding the log_type will not scale either, because the number of different log types is likely to be a small set. For great scalability (not taking about query-ability), the couple (server_id,log_timestamp) is likely a good combination. Now for query, as you should know, it is not possible to have range query (using <, ≤, ≥, >) over partition key, you must always use equality (=) so you won't be able to leverage the log_timestamp component in the partition key for your query. Bucketing by date is a good idea though, and the date resolution will depends on the log generation rate. If logs are generated very often, maybe a bucket by hour. If the generation rate is smaller, maybe a day or a week bucket is fine. Talking about log_type, putting it into the partition key will help partitioning further, in addition of the date bucket. However it forces you to always provide a log_type whenever you want to query, be aware of this. An example of data model for your logs could be CREATE TABLE logs_by_server_and_type_and_date( server_id int, log_type text, date_bucket int, //Date bucket using format MMDD or MMDDHH or ... log_timestamp timeuuid, log_info text, PRIMARY KEY((server_id,log_type,date_bucket),log_timestamp) ); "And if I want to query all logs in a period of time how can I select I range o rows?" --> New query path = new table CREATE TABLE logs_by_date( date_bucket int, //Date bucket using format MMDD or MMDDHH or ... log_timestamp timeuuid, server_id int, log_type text, log_info text, PRIMARY KEY((date_bucket),log_timestamp) // you may add server_id or log_type as clustering column optionally ); For this table, the date_bucket should be chosen very carefully because for the same bucket, we're going to store logs of ALL servers and all types ... For the query, you should provide the date bucket as partition key, and then use (<, ≤, ≥, >) on the log_timestamp column On Thu, Dec 11, 2014 at 12:00 PM, José Guilherme Vanz < guilherme@gmail.com> wrote: > Hello folks > > I am studying Cassandra for a short a period of time and now I am modeling > a database for study purposes. During my modeling I have faced a doubt, > what is a good partition key? Is partition key direct related with my query > performance? What is the best practices? > > Just to study case, let's suppose I have a column family where is inserted > all kind of logs ( http server, application server, application logs, etc ) > data from different servers. In this column family I have server_id ( > unique identifier for each server ) column, log_type ( http server, > application server, application log ) column and log_info column. Is a good > ideia create a partition key using server_id and log_type columns to store > all logs data from a specific type and server in a physical row? And if do > I want a physical row for each day? Is a good idea add a third column with > the date in the partition key? And if I want to query all logs in a period > of time how can I select I range o rows? Do I have to duplicate date column > ( considering I have to use = operator with partition key ) ? > > All the best > -- > Att. José Guilherme Vanz > br.linkedin.com/pub/josé-guilherme-vanz/51/b27/58b/ > <http://br.linkedin.com/pub/jos%C3%A9-guilherme-vanz/51/b27/58b/> > "O sofrimento é passageiro, desistir é para sempre" - Bernardo Fonseca, > recordista da Antarctic Ice Marathon. >
Re: Good partition key doubt
Nice, I got it. =] If I have more questions I'll send other emails. xD Thank you On Thu, Dec 11, 2014 at 12:17 PM, DuyHai Doan wrote: > > "what is a good partition key? Is partition key direct related with my > query performance? What is the best practices?" > > A good partition key is a partition key that will scale with your data. An > example: if you have a business involving individuals, it is likely that > your business will scale as soon as the number of users will grow. In this > case user_id is a good partition key because all the users will > be uniformly distributed over all the Cassandra nodes. > > For your log example, using only server_id for partition key is clearly > not enough because what will scale is the log lines, not the number of > server. > > From the point of view of scalability (not taking about query-ability), > adding the log_type will not scale either, because the number of different > log types is likely to be a small set. For great scalability (not taking > about query-ability), the couple (server_id,log_timestamp) is likely a good > combination. > > Now for query, as you should know, it is not possible to have range query > (using <, ≤, ≥, >) over partition key, you must always use equality (=) so > you won't be able to leverage the log_timestamp component in the partition > key for your query. > > Bucketing by date is a good idea though, and the date resolution will > depends on the log generation rate. If logs are generated very often, maybe > a bucket by hour. If the generation rate is smaller, maybe a day or a week > bucket is fine. > > Talking about log_type, putting it into the partition key will help > partitioning further, in addition of the date bucket. However it forces you > to always provide a log_type whenever you want to query, be aware of this. > > An example of data model for your logs could be > > CREATE TABLE logs_by_server_and_type_and_date( >server_id int, >log_type text, >date_bucket int, //Date bucket using format MMDD or MMDDHH or > ... >log_timestamp timeuuid, >log_info text, >PRIMARY KEY((server_id,log_type,date_bucket),log_timestamp) > ); > > > "And if I want to query all logs in a period of time how can I select I > range o rows?" --> New query path = new table > > CREATE TABLE logs_by_date( >date_bucket int, //Date bucket using format MMDD or MMDDHH or > ... >log_timestamp timeuuid, >server_id int, >log_type text, >log_info text, >PRIMARY KEY((date_bucket),log_timestamp) // you may add server_id or > log_type as clustering column optionally > ); > > For this table, the date_bucket should be chosen very carefully because > for the same bucket, we're going to store logs of ALL servers and all types > ... > > For the query, you should provide the date bucket as partition key, and > then use (<, ≤, ≥, >) on the log_timestamp column > > > On Thu, Dec 11, 2014 at 12:00 PM, José Guilherme Vanz < > guilherme@gmail.com> wrote: > >> Hello folks >> >> I am studying Cassandra for a short a period of time and now I am >> modeling a database for study purposes. During my modeling I have faced a >> doubt, what is a good partition key? Is partition key direct related with >> my query performance? What is the best practices? >> >> Just to study case, let's suppose I have a column family where is >> inserted all kind of logs ( http server, application server, application >> logs, etc ) data from different servers. In this column family I have >> server_id ( unique identifier for each server ) column, log_type ( http >> server, application server, application log ) column and log_info column. >> Is a good ideia create a partition key using server_id and log_type columns >> to store all logs data from a specific type and server in a physical row? >> And if do I want a physical row for each day? Is a good idea add a third >> column with the date in the partition key? And if I want to query all logs >> in a period of time how can I select I range o rows? Do I have to duplicate >> date column ( considering I have to use = operator with partition key ) ? >> >> All the best >> -- >> Att. José Guilherme Vanz >> br.linkedin.com/pub/josé-guilherme-vanz/51/b27/58b/ >> <http://br.linkedin.com/pub/jos%C3%A9-guilherme-vanz/51/b27/58b/> >> "O sofrimento é passageiro, desistir é para sempre" - Bernardo Fonseca, >> recordista da Antarctic Ice Marathon. >> > > -- Att. José Guilherme Vanz br.linkedin.com/pub/josé-guilherme-vanz/51/b27/58b/ <http://br.linkedin.com/pub/jos%C3%A9-guilherme-vanz/51/b27/58b/> "O sofrimento é passageiro, desistir é para sempre" - Bernardo Fonseca, recordista da Antarctic Ice Marathon.
Re: Cassandra Delete Query Doubt
This type of delete - which doesnt supply a user_id, so it's deleting a range of rows - creates what is known as a range tombstone. It's not tied to any given cell, as it covers a range of cells, and supersedes/shadows them when merged (either in the read path or compaction path). On Wed, Nov 10, 2021 at 4:27 AM raman gugnani wrote: > HI Team, > > > I have one table below and want to delete data on this table. > > > DELETE FROM game.tournament USING TIMESTAMP 161692578000 WHERE > tournament_id = 1 AND version_id = 1 AND partition_id = 1; > > > Cassandra internally manages the timestamp of each column when some data > is updated on the same column. > > > My Query is , *USING TIMESTAMP 161692578000* picks up a timestamp of > which column ? > > > > CREATE TABLE game.tournament ( > > tournament_id bigint, > > version_id bigint, > > partition_id bigint, > > user_id bigint, > > created_at timestamp, > > rank bigint, > > score bigint, > > updated_at timestamp, > > PRIMARY KEY ((tournament_id, version_id, partition_id), user_id) > > ) WITH CLUSTERING ORDER BY (user_id ASC) > > > > > > > > -- > Raman Gugnani >
Re: Cassandra Delete Query Doubt
Thanks Jeff for the information. On Wed, 10 Nov 2021 at 21:08, Jeff Jirsa wrote: > This type of delete - which doesnt supply a user_id, so it's deleting a > range of rows - creates what is known as a range tombstone. It's not tied > to any given cell, as it covers a range of cells, and supersedes/shadows > them when merged (either in the read path or compaction path). > > > > On Wed, Nov 10, 2021 at 4:27 AM raman gugnani > wrote: > >> HI Team, >> >> >> I have one table below and want to delete data on this table. >> >> >> DELETE FROM game.tournament USING TIMESTAMP 161692578000 WHERE >> tournament_id = 1 AND version_id = 1 AND partition_id = 1; >> >> >> Cassandra internally manages the timestamp of each column when some data >> is updated on the same column. >> >> >> My Query is , *USING TIMESTAMP 161692578000* picks up a timestamp of >> which column ? >> >> >> >> CREATE TABLE game.tournament ( >> >> tournament_id bigint, >> >> version_id bigint, >> >> partition_id bigint, >> >> user_id bigint, >> >> created_at timestamp, >> >> rank bigint, >> >> score bigint, >> >> updated_at timestamp, >> >> PRIMARY KEY ((tournament_id, version_id, partition_id), user_id) >> >> ) WITH CLUSTERING ORDER BY (user_id ASC) >> >> >> >> >> >> >> >> -- >> Raman Gugnani >> > -- Raman Gugnani
Re: multi-node cassandra config doubt
Did you get this sorted ? At a guess I would say there are no nodes listed in the Hadoop JobConf. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 9:51 PM, Thamizh wrote: > Hi All, > > This is regarding multi-node cluster configuration doubt. > > I have configured 3 nodes of cluster using Cassandra-0.8.4 and getting error > when I ran Map/Reduce job which uploads records from HDFS to Cassandra. > > Here are my 3 nodes cluster config file (cassandra.yaml) for Cassandra: > > node01: > seeds: "node01,node02,node03" > auto_bootstrap: false > listen_address: 192.168.0.1 > rpc_address: 192.168.0.1 > > > node02: > > seeds: "node01,node02,node03" > auto_bootstrap: true > listen_address: 192.168.0.2 > rpc_address: 192.168.0.2 > > > node03: > seeds: "node01,node02,node03" > auto_bootstrap: true > listen_address: 192.168.0.3 > rpc_address: 192.168.0.3 > > When I ran M/R program, I am getting below error > 11/08/23 04:37:00 INFO mapred.JobClient: map 100% reduce 11% > 11/08/23 04:37:06 INFO mapred.JobClient: map 100% reduce 22% > 11/08/23 04:37:09 INFO mapred.JobClient: map 100% reduce 33% > 11/08/23 04:37:14 INFO mapred.JobClient: Task Id : > attempt_201104211044_0719_r_00_0, Status : FAILED > java.lang.NullPointerException > at org.apache.cassandra.client.RingCache.getRange(RingCache.java:130) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:125) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:60) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:90) > at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:1) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:563) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > Is anything wrong on my cassandra.yaml file? > > I followed http://wiki.apache.org/cassandra/MultinodeCluster for cluster > configuration. > > Regards, > Thamizhannal
Re: multi-node cassandra config doubt
Hi Aaron, This is yet to be resolved. I have set-up Cassandra multi node clustering and facing issues in pushing HDFS data to Cassandra. When I ran "MapReduce" progrma I am getting UnknownHostException. In hadoop(0.20.1), I have configured node01-as master and node01, node02 & node03 as slaves. In Cassandra(0.8.4), the installation & configurations has been done. when I issue nodetool ring command I could see the ring and also the KEYSPACES & COLUMNFAMILYS have got distributed. o/p: nodetool $bin/nodetool -h node02 ring Address DC Rack Status State Load Owns Token 161930152162677484001961360738128229499 198.168.0.1 datacenter1 rack1 Up Normal 132.28 MB 12.48% 13027320554261208311902766005835168982 198.168.0.2 datacenter1 rack1 Up Normal 99.34 MB 75.07% 140745249930211229277235689500208693608 198.168.0.3 datacenter1 rack1 Up Normal 66.21 KB 12.45% 161930152162677484001961360738128229499 nutch@lab02:/code/apache-cassandra-0.8.4$ Here are the hadoop config. job4.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setOutputColumnFamily(job4.getConfiguration(), KEYSPACE,COLUMN_FAMILY ); ConfigHelper.setRpcPort(job4.getConfiguration(), ""9160); ConfigHelper.setInitialAddress(job4.getConfiguration(), "node01"); ConfigHelper.setPartitioner(job4.getConfiguration(), "org.apache.cassandra.dht.RandomPartitioner"); Bleow is an exception message: Error: java.net.UnknownHostException: /198.168.0.3 at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:849) at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1200) at java.net.InetAddress.getAllByName0(InetAddress.java:1153) at java.net.InetAddress.getAllByName(InetAddress.java:1083) at java.net.InetAddress.getAllByName(InetAddress.java:1019) at java.net.InetAddress.getByName(InetAddress.java:969) at org.apache.cassandra.client.RingCache.refreshEndpointMap(RingCache.java:93) at org.apache.cassandra.client.RingCache.(RingCache.java:67) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:98) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:92) at org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:132) at org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:62) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170) note: Same /etc/hosts file has been used across all the nodes. Kindly help me to resolve this issue? Regards, Thamizhannal P --- On Wed, 24/8/11, aaron morton wrote: From: aaron morton Subject: Re: multi-node cassandra config doubt To: user@cassandra.apache.org Date: Wednesday, 24 August, 2011, 2:40 PM Did you get this sorted ? At a guess I would say there are no nodes listed in the Hadoop JobConf. Cheers -Aaron MortonFreelance Cassandra Developer@aaronmortonhttp://www.thelastpickle.com On 23/08/2011, at 9:51 PM, Thamizh wrote: Hi All, This is regarding multi-node cluster configuration doubt. I have configured 3 nodes of cluster using Cassandra-0.8.4 and getting error when I ran Map/Reduce job which uploads records from HDFS to Cassandra. Here are my 3 nodes cluster config file (cassandra.yaml) for Cassandra: node01: seeds: "node01,node02,node03" auto_bootstrap: false listen_address: 192.168.0.1 rpc_address: 192.168.0.1 node02: seeds: "node01,node02,node03" auto_bootstrap: true listen_address: 192.168.0.2 rpc_address: 192.168.0.2 node03: seeds: "node01,node02,node03" auto_bootstrap: true listen_address: 192.168.0.3 rpc_address: 192.168.0.3 When I ran M/R program, I am getting below error 11/08/23 04:37:00 INFO mapred.JobClient: map 100% reduce 11% 11/08/23 04:37:06 INFO mapred.JobClient: map 100% reduce 22% 11/08/23 04:37:09 INFO mapred.JobClient: map 100% reduce 33% 11/08/23 04:37:14 INFO mapred.JobClient: Task Id : attempt_201104211044_0719_r_00_0, Status : FAILED java.lang.NullPointerException at org.apache.cassandra.client.RingCache.getRange(RingCache.java:130) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:125) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:60) at org.apache.hadoop.mapreduce.TaskInputOutputContext.wr
Re: multi-node cassandra config doubt
Jump on the machine that raised the error and see if you can ssh to node01. or try using ip address to see if they work. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 24/08/2011, at 11:34 PM, Thamizh wrote: > Hi Aaron, > > This is yet to be resolved. > > I have set-up Cassandra multi node clustering and facing issues in pushing > HDFS data to Cassandra. When I ran "MapReduce" progrma I am getting > UnknownHostException. > > In hadoop(0.20.1), I have configured node01-as master and node01, node02 & > node03 as slaves. > > In Cassandra(0.8.4), the installation & configurations has been done. when I > issue nodetool ring command I could see the ring and also the KEYSPACES & > COLUMNFAMILYS have got distributed. > > o/p: nodetool > $bin/nodetool -h node02 ring > Address DC RackStatus State LoadOwns > Token > > 161930152162677484001961360738128229499 > 198.168.0.1 datacenter1 rack1 Up Normal 132.28 MB 12.48% > 13027320554261208311902766005835168982 > 198.168.0.2 datacenter1 rack1 Up Normal 99.34 MB75.07% > 140745249930211229277235689500208693608 > 198.168.0.3 datacenter1 rack1 Up Normal 66.21 KB12.45% > 161930152162677484001961360738128229499 > nutch@lab02:/code/apache-cassandra-0.8.4$ > > > Here are the hadoop config. > > job4.setOutputFormatClass(ColumnFamilyOutputFormat.class); > ConfigHelper.setOutputColumnFamily(job4.getConfiguration(), > KEYSPACE,COLUMN_FAMILY ); > ConfigHelper.setRpcPort(job4.getConfiguration(), ""9160); > ConfigHelper.setInitialAddress(job4.getConfiguration(), "node01"); > ConfigHelper.setPartitioner(job4.getConfiguration(), > "org.apache.cassandra.dht.RandomPartitioner"); > > Bleow is an exception message: > > Error: java.net.UnknownHostException: /198.168.0.3 > at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:849) > at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1200) > at java.net.InetAddress.getAllByName0(InetAddress.java:1153) > at java.net.InetAddress.getAllByName(InetAddress.java:1083) > at java.net.InetAddress.getAllByName(InetAddress.java:1019) > at java.net.InetAddress.getByName(InetAddress.java:969) > at > org.apache.cassandra.client.RingCache.refreshEndpointMap(RingCache.java:93) > at org.apache.cassandra.client.RingCache.(RingCache.java:67) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:98) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:92) > at > org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:132) > at > org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:62) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > note: Same /etc/hosts file has been used across all the nodes. > > Kindly help me to resolve this issue? > > > Regards, > Thamizhannal P > > --- On Wed, 24/8/11, aaron morton wrote: > > From: aaron morton > Subject: Re: multi-node cassandra config doubt > To: user@cassandra.apache.org > Date: Wednesday, 24 August, 2011, 2:40 PM > > Did you get this sorted ? > > At a guess I would say there are no nodes listed in the Hadoop JobConf. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 23/08/2011, at 9:51 PM, Thamizh wrote: > >> Hi All, >> >> This is regarding multi-node cluster configuration doubt. >> >> I have configured 3 nodes of cluster using Cassandra-0.8.4 and getting error >> when I ran Map/Reduce job which uploads records from HDFS to Cassandra. >> >> Here are my 3 nodes cluster config file (cassandra.yaml) for Cassandra: >> >> node01: >> seeds: "node01,node02,node03" >> auto_bootstrap: false >> listen_address: 192.168.0.1 >> rpc_address: 192.168.0.1 >> >> >> node02: >> >> seeds: "node01,node02,node03" >&g
Re: multi-node cassandra config doubt
Hi Aaron, Thanks a lot for your suggestions. I have got exhausted with below error. It would great if you point me what went wrong with my approach. I wanted to install cassandra-0.8.4 on 3 nodes and to run Map/Reduce job that uploads data from HDFS to Cassandra. I have installed Cassnadra on 3 nodes lab02(199.168.0.2),lab03(199.168.0.3) & lab04(199.168.0.4) respectively and can create a keyspace & column family and they got distributed across the cluster. When I run my map/reduce program it ended up with "UnknownHostException". the same map/reduce program works well on single node cluster. Here are the steps which I have followed. 1. cassandra.yaml details lab02(199.168.0.2): (seed node) auto_bootstrap: false seeds: "199.168.0.2" listen_address: 199.168.0.2 rpc_address: 199.168.0.2 lab03(199.168.0.3): auto_bootstrap: true seeds: "199.168.0.2" listen_address: 199.168.0.3 rpc_address: 199.168.0.3 lab04(199.168.0.4): auto_bootstrap: true seeds: "199.168.0.2" listen_address: 199.168.0.4 rpc_address: 199.168.0.4 2. O/P of bin/cassandra : -- -- INFO 11:59:40,602 Node /199.168.0.2 is now part of the cluster INFO 11:59:40,604 InetAddress /199.168.0.2 is now UP INFO 11:59:55,667 Node /199.168.0.4 is now part of the cluster INFO 11:59:55,669 InetAddress /199.168.0.4 is now UP INFO 12:01:08,389 Joining: getting bootstrap token INFO 12:01:08,410 New token will be 43083119672609054510947312506340649252 to assume load from /199.168.0.2 INFO 12:01:08,412 Enqueuing flush of Memtable-LocationInfo@6824966(123/153 serialized/live bytes, 4 ops) INFO 12:01:08,413 Writing Memtable-LocationInfo@6824966(123/153 serialized/live bytes, 4 ops) INFO 12:01:08,461 Completed flushing /var/lib/cassandra/data/system/LocationInfo-g-2-Data.db (287 bytes) INFO 12:01:08,477 Node /199.168.0.3 state jump to normal INFO 12:01:08,480 Enqueuing flush of Memtable-LocationInfo@10141941(53/66 serialized/live bytes, 2 ops) INFO 12:01:08,482 Writing Memtable-LocationInfo@10141941(53/66 serialized/live bytes, 2 ops) INFO 12:01:08,514 Completed flushing /var/lib/cassandra/data/system/LocationInfo-g-3-Data.db (163 bytes) INFO 12:01:08,527 Node /199.168.0.3 state jump to normal INFO 12:01:08,652 mx4j successfuly loaded HttpAdaptor version 3.0.1 started on port 8081 3. When I run my map/reduce program it ended up with "UnknownHostException" Error: java.net.UnknownHostException: /199.168.0.2 at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:849) at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1200) at java.net.InetAddress.getAllByName0(InetAddress.java:1153) at java.net.InetAddress.getAllByName(InetAddress.java:1083) at java.net.InetAddress.getAllByName(InetAddress.java:1019) at java.net.InetAddress.getByName(InetAddress.java:969) at org.apache.cassandra.client.RingCache.refreshEndpointMap(RingCache.java:93) at org.apache.cassandra.client.RingCache.(RingCache.java:67) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:98) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:92) at org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:132) at org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:62) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170) Here are the config line for map/reduce. job4.setReducerClass(TblUploadReducer.class ); job4.setOutputKeyClass(ByteBuffer.class); job4.setOutputValueClass(List.class); job4.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setOutputColumnFamily(job4.getConfiguration(), args[1],args[3] ); ConfigHelper.setRpcPort(job4.getConfiguration(), args[7]); // 9160 ConfigHelper.setInitialAddress(job4.getConfiguration(), args[9]); // 199.168.0.2 ConfigHelper.setPartitioner(job4.getConfiguration(), "org.apache.cassandra.dht.RandomPartitioner"); Steps which I have verified, 1. There is a passwordless ssh has been configured b/w lab02,lab03 &lab04. All the nodes can ping each other with out any issues. 2. When I ran "InetAddress.getLocalHost()" from java program on lab02 it prints "lab02/199.168.0.2". 3. When I over looked "o/p" of bin/cassandra it prints couple of messages and under InetAddress field "/199.168.0.3" etc. Here it does not print "hostname/IP". Is that problem? Kindly help me. Regards, Thamizhannal --- On Thu, 25/8/11, aaron morton wrote: From: aaron morton Subject: Re: multi-node cass
Re: multi-node cassandra config doubt
Hi All, It looks it is know issue with Cassandra-0.8.4. So either I have to wait till 0.8.5 to be released or have to switch to 0.7.8 if this has been resolved in that. Ref: https://issues.apache.org/jira/browse/CASSANDRA-3044 Regards, Thamizhannal P --- On Thu, 25/8/11, Thamizh wrote: From: Thamizh Subject: Re: multi-node cassandra config doubt To: user@cassandra.apache.org Date: Thursday, 25 August, 2011, 9:01 PM Hi Aaron, Thanks a lot for your suggestions. I have got exhausted with below error. It would great if you point me what went wrong with my approach. I wanted to install cassandra-0.8.4 on 3 nodes and to run Map/Reduce job that uploads data from HDFS to Cassandra. I have installed Cassnadra on 3 nodes lab02(199.168.0.2),lab03(199.168.0.3) & lab04(199.168.0.4) respectively and can create a keyspace & column family and they got distributed across the cluster. When I run my map/reduce program it ended up with "UnknownHostException". the same map/reduce program works well on single node cluster. Here are the steps which I have followed. 1. cassandra.yaml details lab02(199.168.0.2): (seed node) auto_bootstrap: false seeds: "199.168.0.2" listen_address: 199.168.0.2 rpc_address: 199.168.0.2 lab03(199.168.0.3): auto_bootstrap: true seeds: "199.168.0.2" listen_address: 199.168.0.3 rpc_address: 199.168.0.3 lab04(199.168.0.4): auto_bootstrap: true seeds: "199.168.0.2" listen_address: 199.168.0.4 rpc_address: 199.168.0.4 2. O/P of bin/cassandra : -- -- INFO 11:59:40,602 Node /199.168.0.2 is now part of the cluster INFO 11:59:40,604 InetAddress /199.168.0.2 is now UP INFO 11:59:55,667 Node /199.168.0.4 is now part of the cluster INFO 11:59:55,669 InetAddress /199.168.0.4 is now UP INFO 12:01:08,389 Joining: getting bootstrap token INFO 12:01:08,410 New token will be 43083119672609054510947312506340649252 to assume load from /199.168.0.2 INFO 12:01:08,412 Enqueuing flush of Memtable-LocationInfo@6824966(123/153 serialized/live bytes, 4 ops) INFO 12:01:08,413 Writing Memtable-LocationInfo@6824966(123/153 serialized/live bytes, 4 ops) INFO 12:01:08,461 Completed flushing /var/lib/cassandra/data/system/LocationInfo-g-2-Data.db (287 bytes) INFO 12:01:08,477 Node /199.168.0.3 state jump to normal INFO 12:01:08,480 Enqueuing flush of Memtable-LocationInfo@10141941(53/66 serialized/live bytes, 2 ops) INFO 12:01:08,482 Writing Memtable-LocationInfo@10141941(53/66 serialized/live bytes, 2 ops) INFO 12:01:08,514 Completed flushing /var/lib/cassandra/data/system/LocationInfo-g-3-Data.db (163 bytes) INFO 12:01:08,527 Node /199.168.0.3 state jump to normal INFO 12:01:08,652 mx4j successfuly loaded HttpAdaptor version 3.0.1 started on port 8081 3. When I run my map/reduce program it ended up with "UnknownHostException" Error: java.net.UnknownHostException: /199.168.0.2 at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:849) at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1200) at java.net.InetAddress.getAllByName0(InetAddress.java:1153) at java.net.InetAddress.getAllByName(InetAddress.java:1083) at java.net.InetAddress.getAllByName(InetAddress.java:1019) at java.net.InetAddress.getByName(InetAddress.java:969) at org.apache.cassandra.client.RingCache.refreshEndpointMap(RingCache.java:93) at org.apache.cassandra.client.RingCache.(RingCache.java:67) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:98) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:92) at org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:132) at org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:62) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170) Here are the config line for map/reduce. job4.setReducerClass(TblUploadReducer.class ); job4.setOutputKeyClass(ByteBuffer.class); job4.setOutputValueClass(List.class); job4.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setOutputColumnFamily(job4.getConfiguration(), args[1],args[3] ); ConfigHelper.setRpcPort(job4.getConfiguration(), args[7]); // 9160 ConfigHelper.setInitialAddress(job4.getConfiguration(), args[9]); // 199.168.0.2 ConfigHelper.setPartitioner(job4.getConfiguration(), "org.apache.cassandra.dht.RandomPartitioner"); Steps which I have verified, 1. There is a passwordless ssh has been configured b/w lab02,lab03 &lab04. All the nodes can ping each other with out any issues. 2. When I ran "InetAdd
Doubt in Row key range scan
Dear all I have stored my data into Cassandra database in the format "tickerID_date". Now when I specify the row key range like 1_2012/05/24(start) to 1_2012/05/27(end) it says that the end key md5 value is lesser than start key md5 value. So I changed my start key to 1_2012/05/27 and end key to 1_2012/05/24, then I got all the keys even which are not in my range like 67_2012/05/23 and 54_2012/05/28. I am using Thrift API. Please help me as I want only the columns of 1_2012/05/24, 1_2012/05/25 , 1_2012/05/26 and 1_2012/05/27. Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Doubt in Row key range scan
Hi, It's normal. Keys to replicas are determined with a hash (md5) when using the random partitionner (which you are using I guess). You probably want to switch to the order preserving partionner or tweak your data model in order to rely on 2nd index for such filtering. - Pierre -Original Message- From: Prakrati Agrawal Date: Mon, 28 May 2012 04:39:46 To: user@cassandra.apache.org Reply-To: user@cassandra.apache.org Subject: Doubt in Row key range scan Dear all I have stored my data into Cassandra database in the format "tickerID_date". Now when I specify the row key range like 1_2012/05/24(start) to 1_2012/05/27(end) it says that the end key md5 value is lesser than start key md5 value. So I changed my start key to 1_2012/05/27 and end key to 1_2012/05/24, then I got all the keys even which are not in my range like 67_2012/05/23 and 54_2012/05/28. I am using Thrift API. Please help me as I want only the columns of 1_2012/05/24, 1_2012/05/25 , 1_2012/05/26 and 1_2012/05/27. Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Doubt in Row key range scan
You are using the Random Partitioner. Using the RP is a good thing because you avoid hot spots, but it has its defaults too. You can't scan a slice of row, they won't be ordered because all your keys are stored using their md5 values. You should review your data model to use columns to order your data. Alain 2012/5/28 Prakrati Agrawal : > Dear all > > > > I have stored my data into Cassandra database in the format “tickerID_date”. > Now when I specify the row key range like 1_2012/05/24(start) to > 1_2012/05/27(end) it says that the end key md5 value is lesser than start > key md5 value. So I changed my start key to 1_2012/05/27 and end key to > 1_2012/05/24, then I got all the keys even which are not in my range like > 67_2012/05/23 and 54_2012/05/28. I am using Thrift API. > > Please help me as I want only the columns of 1_2012/05/24, 1_2012/05/25 , > 1_2012/05/26 and 1_2012/05/27. > > > > Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com > > > > > > This email message may contain proprietary, private and confidential > information. The information transmitted is intended only for the person(s) > or entities to which it is addressed. Any review, retransmission, > dissemination or other use of, or taking of any action in reliance upon, > this information by persons or entities other than the intended recipient is > prohibited and may be illegal. If you received this in error, please contact > the sender and delete the message from your system. > > Mu Sigma takes all reasonable steps to ensure that its electronic > communications are free from viruses. However, given Internet accessibility, > the Company cannot accept liability for any virus introduced by this e-mail > or any attachment and you are advised to use up-to-date virus checking > software.
RE: Doubt in Row key range scan
Please could you tell me how to tweak my data model to rely on 2nd index ? Thank you Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com From: Pierre Chalamet [mailto:pie...@chalamet.net] Sent: Monday, May 28, 2012 3:31 PM To: user@cassandra.apache.org Subject: Re: Doubt in Row key range scan Hi, It's normal. Keys to replicas are determined with a hash (md5) when using the random partitionner (which you are using I guess). You probably want to switch to the order preserving partionner or tweak your data model in order to rely on 2nd index for such filtering. - Pierre From: Prakrati Agrawal Date: Mon, 28 May 2012 04:39:46 -0500 To: user@cassandra.apache.org ReplyTo: user@cassandra.apache.org Subject: Doubt in Row key range scan Dear all I have stored my data into Cassandra database in the format "tickerID_date". Now when I specify the row key range like 1_2012/05/24(start) to 1_2012/05/27(end) it says that the end key md5 value is lesser than start key md5 value. So I changed my start key to 1_2012/05/27 and end key to 1_2012/05/24, then I got all the keys even which are not in my range like 67_2012/05/23 and 54_2012/05/28. I am using Thrift API. Please help me as I want only the columns of 1_2012/05/24, 1_2012/05/25 , 1_2012/05/26 and 1_2012/05/27. Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Doubt in Row key range scan
Check this out: http://www.anuff.com/2011/02/indexing-in-cassandra.html#more Or just google for wide row indexes. On May 28, 2012, at 11:22 AM, Prakrati Agrawal wrote: > Please could you tell me how to tweak my data model to rely on 2nd index ? > Thank you > > > Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com > > From: Pierre Chalamet [mailto:pie...@chalamet.net] > Sent: Monday, May 28, 2012 3:31 PM > To: user@cassandra.apache.org > Subject: Re: Doubt in Row key range scan > > Hi, > > It's normal. > > Keys to replicas are determined with a hash (md5) when using the random > partitionner (which you are using I guess). > > You probably want to switch to the order preserving partionner or tweak your > data model in order to rely on 2nd index for such filtering. > - Pierre > From: Prakrati Agrawal > Date: Mon, 28 May 2012 04:39:46 -0500 > To: user@cassandra.apache.org > ReplyTo: user@cassandra.apache.org > Subject: Doubt in Row key range scan > > Dear all > > I have stored my data into Cassandra database in the format “tickerID_date”. > Now when I specify the row key range like 1_2012/05/24(start) to > 1_2012/05/27(end) it says that the end key md5 value is lesser than start key > md5 value. So I changed my start key to 1_2012/05/27 and end key to > 1_2012/05/24, then I got all the keys even which are not in my range like > 67_2012/05/23 and 54_2012/05/28. I am using Thrift API. > Please help me as I want only the columns of 1_2012/05/24, 1_2012/05/25 , > 1_2012/05/26 and 1_2012/05/27. > > Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com > > > This email message may contain proprietary, private and confidential > information. The information transmitted is intended only for the person(s) > or entities to which it is addressed. Any review, retransmission, > dissemination or other use of, or taking of any action in reliance upon, this > information by persons or entities other than the intended recipient is > prohibited and may be illegal. If you received this in error, please contact > the sender and delete the message from your system. > > Mu Sigma takes all reasonable steps to ensure that its electronic > communications are free from viruses. However, given Internet accessibility, > the Company cannot accept liability for any virus introduced by this e-mail > or any attachment and you are advised to use up-to-date virus checking > software. > > This email message may contain proprietary, private and confidential > information. The information transmitted is intended only for the person(s) > or entities to which it is addressed. Any review, retransmission, > dissemination or other use of, or taking of any action in reliance upon, this > information by persons or entities other than the intended recipient is > prohibited and may be illegal. If you received this in error, please contact > the sender and delete the message from your system. > > Mu Sigma takes all reasonable steps to ensure that its electronic > communications are free from viruses. However, given Internet accessibility, > the Company cannot accept liability for any virus introduced by this e-mail > or any attachment and you are advised to use up-to-date virus checking > software. Cumprimentos, Luís Ferreira
Doubt regarding consistency-level in Cassandra-2.1.10
Hi All. I have a 2*2 Network-Topology Replication setup, and I run my application via DataStax-driver. I frequently get the errors of type :: *Cassandra timeout during write query at consistency SERIAL (3 replica were required but only 0 acknowledged the write)* I have already tried passing a "write-options with LOCAL_QUORUM consistency-level" in all create/save statements, but I still get this error. Does something else need to be changed in /etc/cassandra/cassandra.yaml too? Or may be some another place? -- Regards, Ajay
Re: Doubt regarding consistency-level in Cassandra-2.1.10
Serial consistency gets invoked at the protocol level when doing lightweight transactions such as CAS operations. If you're expecting that your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so there aren't enough nodes available to satisfy serial consistency. See http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg wrote: > Hi All. > > I have a 2*2 Network-Topology Replication setup, and I run my application > via DataStax-driver. > > I frequently get the errors of type :: > *Cassandra timeout during write query at consistency SERIAL (3 replica > were required but only 0 acknowledged the write)* > > I have already tried passing a "write-options with LOCAL_QUORUM > consistency-level" in all create/save statements, but I still get this > error. > > Does something else need to be changed in /etc/cassandra/cassandra.yaml > too? > Or may be some another place? > > > -- > Regards, > Ajay >
Re: Doubt regarding consistency-level in Cassandra-2.1.10
Hi Eric, I am sorry, but I don't understand. If there had been some issue in the configuration, then the consistency-issue would be seen everytime (I guess). As of now, the error is seen sometimes (probably 30% of times). On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens wrote: > Serial consistency gets invoked at the protocol level when doing > lightweight transactions such as CAS operations. If you're expecting that > your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so > there aren't enough nodes available to satisfy serial consistency. > > See > http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html > > On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg wrote: > >> Hi All. >> >> I have a 2*2 Network-Topology Replication setup, and I run my application >> via DataStax-driver. >> >> I frequently get the errors of type :: >> *Cassandra timeout during write query at consistency SERIAL (3 replica >> were required but only 0 acknowledged the write)* >> >> I have already tried passing a "write-options with LOCAL_QUORUM >> consistency-level" in all create/save statements, but I still get this >> error. >> >> Does something else need to be changed in /etc/cassandra/cassandra.yaml >> too? >> Or may be some another place? >> >> >> -- >> Regards, >> Ajay >> > -- Regards, Ajay
Re: Doubt regarding consistency-level in Cassandra-2.1.10
What Eric means is that SERIAL consistency is a special type of consistency that is only invoked for a subset of operations: those that use CAS/lightweight transactions, for example "IF NOT EXISTS" queries. The differences between CAS operations and standard operations are significant and there are large repercussions for tunable consistency. The amount of time such an operation takes is greatly increased as well; you may need to increase your internal node-to-node timeouts . On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg wrote: > Hi Eric, > > I am sorry, but I don't understand. > > If there had been some issue in the configuration, then the > consistency-issue would be seen everytime (I guess). > As of now, the error is seen sometimes (probably 30% of times). > > On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens wrote: > >> Serial consistency gets invoked at the protocol level when doing >> lightweight transactions such as CAS operations. If you're expecting that >> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so >> there aren't enough nodes available to satisfy serial consistency. >> >> See >> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html >> >> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg wrote: >> >>> Hi All. >>> >>> I have a 2*2 Network-Topology Replication setup, and I run my >>> application via DataStax-driver. >>> >>> I frequently get the errors of type :: >>> *Cassandra timeout during write query at consistency SERIAL (3 replica >>> were required but only 0 acknowledged the write)* >>> >>> I have already tried passing a "write-options with LOCAL_QUORUM >>> consistency-level" in all create/save statements, but I still get this >>> error. >>> >>> Does something else need to be changed in /etc/cassandra/cassandra.yaml >>> too? >>> Or may be some another place? >>> >>> >>> -- >>> Regards, >>> Ajay >>> >> > > > -- > Regards, > Ajay >
Re: Doubt regarding consistency-level in Cassandra-2.1.10
Hmm... ok. Ideally, we require :: a) The intra-DC-node-syncing takes place at the statement/query level. b) The inter-DC-node-syncing takes place at cassandra level. That way, we don't spend too much delay at the statement/query level. For the so-called CAS/lightweight transactions, the above are impossible then? On Wed, Nov 4, 2015 at 5:58 AM, Bryan Cheng wrote: > What Eric means is that SERIAL consistency is a special type of > consistency that is only invoked for a subset of operations: those that use > CAS/lightweight transactions, for example "IF NOT EXISTS" queries. > > The differences between CAS operations and standard operations are > significant and there are large repercussions for tunable consistency. The > amount of time such an operation takes is greatly increased as well; you > may need to increase your internal node-to-node timeouts . > > On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg wrote: > >> Hi Eric, >> >> I am sorry, but I don't understand. >> >> If there had been some issue in the configuration, then the >> consistency-issue would be seen everytime (I guess). >> As of now, the error is seen sometimes (probably 30% of times). >> >> On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens wrote: >> >>> Serial consistency gets invoked at the protocol level when doing >>> lightweight transactions such as CAS operations. If you're expecting that >>> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so >>> there aren't enough nodes available to satisfy serial consistency. >>> >>> See >>> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html >>> >>> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg wrote: >>> Hi All. I have a 2*2 Network-Topology Replication setup, and I run my application via DataStax-driver. I frequently get the errors of type :: *Cassandra timeout during write query at consistency SERIAL (3 replica were required but only 0 acknowledged the write)* I have already tried passing a "write-options with LOCAL_QUORUM consistency-level" in all create/save statements, but I still get this error. Does something else need to be changed in /etc/cassandra/cassandra.yaml too? Or may be some another place? -- Regards, Ajay >>> >> >> >> -- >> Regards, >> Ajay >> > > -- Regards, Ajay
Re: Doubt regarding consistency-level in Cassandra-2.1.10
Hi All. I think we got the root-cause. One of the fields in one of the class was marked with "@Version" annotation, which was causing the Cassandra-Java-Driver to insert "If Not Exists" in the insert query, thus invoking SERIAL consistency-level. We removed the annotation (didn't really need that), and we have not observed the error since about an hour or so. Thanks Eric and Bryan for the help !!! Thanks and Regards, Ajay On Wed, Nov 4, 2015 at 8:51 AM, Ajay Garg wrote: > Hmm... ok. > > Ideally, we require :: > > a) > The intra-DC-node-syncing takes place at the statement/query level. > > b) > The inter-DC-node-syncing takes place at cassandra level. > > > That way, we don't spend too much delay at the statement/query level. > > > For the so-called CAS/lightweight transactions, the above are impossible > then? > > On Wed, Nov 4, 2015 at 5:58 AM, Bryan Cheng wrote: > >> What Eric means is that SERIAL consistency is a special type of >> consistency that is only invoked for a subset of operations: those that use >> CAS/lightweight transactions, for example "IF NOT EXISTS" queries. >> >> The differences between CAS operations and standard operations are >> significant and there are large repercussions for tunable consistency. The >> amount of time such an operation takes is greatly increased as well; you >> may need to increase your internal node-to-node timeouts . >> >> On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg wrote: >> >>> Hi Eric, >>> >>> I am sorry, but I don't understand. >>> >>> If there had been some issue in the configuration, then the >>> consistency-issue would be seen everytime (I guess). >>> As of now, the error is seen sometimes (probably 30% of times). >>> >>> On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens wrote: >>> Serial consistency gets invoked at the protocol level when doing lightweight transactions such as CAS operations. If you're expecting that your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so there aren't enough nodes available to satisfy serial consistency. See http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg wrote: > Hi All. > > I have a 2*2 Network-Topology Replication setup, and I run my > application via DataStax-driver. > > I frequently get the errors of type :: > *Cassandra timeout during write query at consistency SERIAL (3 replica > were required but only 0 acknowledged the write)* > > I have already tried passing a "write-options with LOCAL_QUORUM > consistency-level" in all create/save statements, but I still get this > error. > > Does something else need to be changed in > /etc/cassandra/cassandra.yaml too? > Or may be some another place? > > > -- > Regards, > Ajay > >>> >>> >>> -- >>> Regards, >>> Ajay >>> >> >> > > > -- > Regards, > Ajay > -- Regards, Ajay
Re: Doubt regarding consistency-level in Cassandra-2.1.10
Glad you got it figured out, but I'm confused about the @Version annotation. The DataStax Java Driver just handles statements, as far as I know it's never going to modify statement text. It sounds like you're using an entity mapping framework on top of the java driver, which uses @Version for optimistic locking, and that upgraded the generated statement to a CAS operation. On Wed, Nov 4, 2015 at 1:20 AM Ajay Garg wrote: > Hi All. > > I think we got the root-cause. > > One of the fields in one of the class was marked with "@Version" > annotation, which was causing the Cassandra-Java-Driver to insert "If Not > Exists" in the insert query, thus invoking SERIAL consistency-level. > > We removed the annotation (didn't really need that), and we have not > observed the error since about an hour or so. > > > Thanks Eric and Bryan for the help !!! > > > Thanks and Regards, > Ajay > > On Wed, Nov 4, 2015 at 8:51 AM, Ajay Garg wrote: > >> Hmm... ok. >> >> Ideally, we require :: >> >> a) >> The intra-DC-node-syncing takes place at the statement/query level. >> >> b) >> The inter-DC-node-syncing takes place at cassandra level. >> >> >> That way, we don't spend too much delay at the statement/query level. >> >> >> For the so-called CAS/lightweight transactions, the above are impossible >> then? >> >> On Wed, Nov 4, 2015 at 5:58 AM, Bryan Cheng >> wrote: >> >>> What Eric means is that SERIAL consistency is a special type of >>> consistency that is only invoked for a subset of operations: those that use >>> CAS/lightweight transactions, for example "IF NOT EXISTS" queries. >>> >>> The differences between CAS operations and standard operations are >>> significant and there are large repercussions for tunable consistency. The >>> amount of time such an operation takes is greatly increased as well; you >>> may need to increase your internal node-to-node timeouts . >>> >>> On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg >>> wrote: >>> Hi Eric, I am sorry, but I don't understand. If there had been some issue in the configuration, then the consistency-issue would be seen everytime (I guess). As of now, the error is seen sometimes (probably 30% of times). On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens wrote: > Serial consistency gets invoked at the protocol level when doing > lightweight transactions such as CAS operations. If you're expecting that > your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so > there aren't enough nodes available to satisfy serial consistency. > > See > http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html > > On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg > wrote: > >> Hi All. >> >> I have a 2*2 Network-Topology Replication setup, and I run my >> application via DataStax-driver. >> >> I frequently get the errors of type :: >> *Cassandra timeout during write query at consistency SERIAL (3 >> replica were required but only 0 acknowledged the write)* >> >> I have already tried passing a "write-options with LOCAL_QUORUM >> consistency-level" in all create/save statements, but I still get this >> error. >> >> Does something else need to be changed in >> /etc/cassandra/cassandra.yaml too? >> Or may be some another place? >> >> >> -- >> Regards, >> Ajay >> > -- Regards, Ajay >>> >>> >> >> >> -- >> Regards, >> Ajay >> > > > > -- > Regards, > Ajay >