Doubt

2014-04-21 Thread Jagan Ranganathan
Dear All,

We have a requirement to store 'N' columns of an entity in a CF. Mostly this is 
write once and read many times. What is the best way to store the data?
Composite CF
Simple CF with value as protobuf extracted data
Both provides extendable columns which is a requirement for our usage. 


But I want to know which one is efficient, assuming there is bound to be say 5% 
of updates?


Regards,

Jagan



Re: Doubt

2014-04-22 Thread Chris Lohfink
Generally Ive seen it recommended to do a composite CF since it gives you more 
flexibility and its easier to debug.  You can get some performance improvements 
by storing a serialized blob (a lot of data you can represent much smaller this 
way by factor of 10 or more if clever) to represent your entity but the 
complexity is rarely worth it.  It is likely a premature optimization but I 
have seen cases its shown a good improvement.

either case, the data will ultimately be read sequentially from disk per 
sstable (normal bottleneck) so the only benefit you gain is 
- potentially disk space (if serialization is efficient) and network bandwidth
- Cassandra won’t have to deserialize as many columns, but I’m fairly certain 
this is utterly irrelevant
- if stored in a mechanism that you can deserialize efficiently (like 
protobufs) it can make a big difference on your app side

keep in mind if serializing data though you will have to always maintain code 
that will be able to read old versions, it can become very complex and lead to 
weird bugs.

---
Chris Lohfink

On Apr 21, 2014, at 3:53 AM, Jagan Ranganathan  wrote:

> Dear All,
> 
> We have a requirement to store 'N' columns of an entity in a CF. Mostly this 
> is write once and read many times. What is the best way to store the data?
> Composite CF
> Simple CF with value as protobuf extracted data
> Both provides extendable columns which is a requirement for our usage. 
> 
> But I want to know which one is efficient, assuming there is bound to be say 
> 5% of updates?
> 
> Regards,
> Jagan



RE: Doubt regarding CQL

2012-02-21 Thread Rishabh Agrawal
FYI .. I am using 1.0.7 version on Ubuntu 11.10
Need help asap

From: Rishabh Agrawal
Sent: Wednesday, February 22, 2012 11:49 AM
To: user@cassandra.apache.org
Subject: Doubt regarding CQL

Hello

I have installed CQL drivers for python. When I try execute cqlsh I get 
following error

cql-1.0.3$ cqlsh localhost 9160
Traceback (most recent call last):
  File "/usr/local/bin/cqlsh", line 33, in 
import cql
  File "/usr/local/lib/python2.7/dist-packages/cql/__init__.py", line 22, in 

import connection
  File "/usr/local/lib/python2.7/dist-packages/cql/connection.py", line 18, in 
< module>
from cursor import Cursor
  File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 24, in 

from cql.cassandra.ttypes import (
  File "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py", line 
7, in 
from thrift.Thrift import *
ImportError: No module named thrift.Thrift

Kindly help me with that asap.

Thanks and Regards
Rishabh Agrawal




Impetus' Head of Innovation labs, Vineet Tyagi will be presenting on 'Big Data 
Big Costs?' at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7.

Listen to our webcast 'Hybrid Approach to Extend Web Apps to Tablets & 
Smartphones' available at http://bit.ly/yQC1oD.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.



Impetus' Head of Innovation labs, Vineet Tyagi will be presenting on 'Big Data 
Big Costs?' at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7.

Listen to our webcast 'Hybrid Approach to Extend Web Apps to Tablets & 
Smartphones' available at http://bit.ly/yQC1oD.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Doubt regarding CQL

2012-02-22 Thread Mateusz Korniak
On Wednesday 22 of February 2012, Rishabh Agrawal wrote:
> I have installed CQL drivers for python. When I try execute cqlsh I get
> following error
> cql-1.0.3$ cqlsh localhost 9160
> (...)
>   File "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py",
> line 7, in  from thrift.Thrift import *
> ImportError: No module named thrift.Thrift

Seems you do not have installed python thrift module.

In my distro (PLD) it is:
Package:python-thrift-0.5.0-4.i686
/usr/lib/python2.7/site-packages:  Thrift-0.1-py2.7.egg-info,
/usr/lib/python2.7/site-packages/thrift:  TSCons.pyc, TSCons.pyo, 
TSerialization.pyc, TSerialization.pyo, Thrift.pyc, Thrift.pyo, __init__.pyc, 
__init__.pyo,
/usr/lib/python2.7/site-packages/thrift/protocol:  TBinaryProtocol.pyc, 
TBinaryProtocol.pyo, TCompactProtocol.pyc, TCompactProtocol.pyo, 
TProtocol.pyc, TProtocol.pyo, __init__.pyc, __init__.pyo, fastbinary.so
/usr/lib/python2.7/site-packages/thrift/server:  THttpServer.pyc, 
THttpServer.pyo, TNonblockingServer.pyc, TNonblockingServer.pyo, TServer.pyc, 
TServer.pyo, __init__.pyc, __init__.pyo
/usr/lib/python2.7/site-packages/thrift/transport:  THttpClient.pyc, 
THttpClient.pyo, TSocket.pyc, TSocket.pyo, TTransport.pyc, TTransport.pyo, 
TTwisted.pyc, TTwisted.pyo, __init__.pyc, __init__.pyo


Regards,

-- 
Mateusz Korniak


RE: Doubt regarding CQL

2012-02-22 Thread Rishabh Agrawal
Thanks for the reply
I installed 0.8.0 drift package. But still problem persists.

-Original Message-
From: Mateusz Korniak [mailto:mateusz-li...@ant.gliwice.pl]
Sent: Wednesday, February 22, 2012 1:47 PM
To: user@cassandra.apache.org
Subject: Re: Doubt regarding CQL

On Wednesday 22 of February 2012, Rishabh Agrawal wrote:
> I have installed CQL drivers for python. When I try execute cqlsh I
> get following error cql-1.0.3$ cqlsh localhost 9160
> (...)
>   File
> "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py",
> line 7, in  from thrift.Thrift import *
> ImportError: No module named thrift.Thrift

Seems you do not have installed python thrift module.

In my distro (PLD) it is:
Package:python-thrift-0.5.0-4.i686
/usr/lib/python2.7/site-packages:  Thrift-0.1-py2.7.egg-info,
/usr/lib/python2.7/site-packages/thrift:  TSCons.pyc, TSCons.pyo, 
TSerialization.pyc, TSerialization.pyo, Thrift.pyc, Thrift.pyo, __init__.pyc, 
__init__.pyo,
/usr/lib/python2.7/site-packages/thrift/protocol:  TBinaryProtocol.pyc, 
TBinaryProtocol.pyo, TCompactProtocol.pyc, TCompactProtocol.pyo, TProtocol.pyc, 
TProtocol.pyo, __init__.pyc, __init__.pyo, fastbinary.so
/usr/lib/python2.7/site-packages/thrift/server:  THttpServer.pyc, 
THttpServer.pyo, TNonblockingServer.pyc, TNonblockingServer.pyo, TServer.pyc, 
TServer.pyo, __init__.pyc, __init__.pyo
/usr/lib/python2.7/site-packages/thrift/transport:  THttpClient.pyc, 
THttpClient.pyo, TSocket.pyc, TSocket.pyo, TTransport.pyc, TTransport.pyo, 
TTwisted.pyc, TTwisted.pyo, __init__.pyc, __init__.pyo


Regards,

--
Mateusz Korniak



Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big Data 
Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7.

Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets & 
Smartphones’ available at http://bit.ly/yQC1oD.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Doubt regarding CQL

2012-02-22 Thread paul cannon
Rishabh-

It looks like you're not actually using the cqlsh that comes with Cassandra
1.0.7.  Are you using an old version of the Python CQL driver?  Old
versions of the driver had cqlsh bundled with it, instead of with Cassandra.

The 1.0.7 Debian/Ubuntu packages do not include cqlsh, because of some
packaging+distribution difficulties (resolved in 1.1).  One easy way to get
cqlsh as part of a package is to use the free DataStax Community Edition:
see http://www.datastax.com/products/community .  Cqlsh is included in the
"dsc" package.  That package will also bring in thrift and any other
dependencies you need.

p


On Wed, Feb 22, 2012 at 3:00 AM, Rishabh Agrawal <
rishabh.agra...@impetus.co.in> wrote:

> Thanks for the reply
> I installed 0.8.0 drift package. But still problem persists.
>
> -Original Message-
> From: Mateusz Korniak [mailto:mateusz-li...@ant.gliwice.pl]
> Sent: Wednesday, February 22, 2012 1:47 PM
> To: user@cassandra.apache.org
> Subject: Re: Doubt regarding CQL
>
> On Wednesday 22 of February 2012, Rishabh Agrawal wrote:
> > I have installed CQL drivers for python. When I try execute cqlsh I
> > get following error cql-1.0.3$ cqlsh localhost 9160
> > (...)
> >   File
> > "/usr/local/lib/python2.7/dist-packages/cql/cassandra/ttypes.py",
> > line 7, in  from thrift.Thrift import *
> > ImportError: No module named thrift.Thrift
>
> Seems you do not have installed python thrift module.
>
> In my distro (PLD) it is:
> Package:python-thrift-0.5.0-4.i686
> /usr/lib/python2.7/site-packages:  Thrift-0.1-py2.7.egg-info,
> /usr/lib/python2.7/site-packages/thrift:  TSCons.pyc, TSCons.pyo,
> TSerialization.pyc, TSerialization.pyo, Thrift.pyc, Thrift.pyo,
> __init__.pyc, __init__.pyo,
> /usr/lib/python2.7/site-packages/thrift/protocol:  TBinaryProtocol.pyc,
> TBinaryProtocol.pyo, TCompactProtocol.pyc, TCompactProtocol.pyo,
> TProtocol.pyc, TProtocol.pyo, __init__.pyc, __init__.pyo, fastbinary.so
> /usr/lib/python2.7/site-packages/thrift/server:  THttpServer.pyc,
> THttpServer.pyo, TNonblockingServer.pyc, TNonblockingServer.pyo,
> TServer.pyc, TServer.pyo, __init__.pyc, __init__.pyo
> /usr/lib/python2.7/site-packages/thrift/transport:  THttpClient.pyc,
> THttpClient.pyo, TSocket.pyc, TSocket.pyo, TTransport.pyc, TTransport.pyo,
> TTwisted.pyc, TTwisted.pyo, __init__.pyc, __init__.pyo
>
>
> Regards,
>
> --
> Mateusz Korniak
>
> 
>
> Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big
> Data Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1)
> http://bit.ly/bSMWd7.
>
> Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets &
> Smartphones’ available at http://bit.ly/yQC1oD.
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


Good partition key doubt

2014-12-11 Thread José Guilherme Vanz
Hello folks

I am studying Cassandra for a short a period of time and now I am modeling
a database for study purposes. During my modeling I have faced a doubt,
what is a good partition key? Is partition key direct related with my query
performance? What is the best practices?

Just to study case, let's suppose I have a column family where is inserted
all kind of logs ( http server, application server, application logs, etc )
data from different servers. In this column family I have server_id (
unique identifier for each server ) column, log_type ( http server,
application server, application log ) column and log_info column. Is a good
ideia create a partition key using server_id and log_type columns to store
all logs data from a specific type and server in a physical row? And if do
I want a physical row for each day? Is a good idea add a third column with
the date in the partition key? And if I want to query all logs in a period
of time how can I select I range o rows? Do I have to duplicate date column
( considering I have to use = operator with partition key ) ?

All the best
-- 
Att. José Guilherme Vanz
br.linkedin.com/pub/josé-guilherme-vanz/51/b27/58b/
<http://br.linkedin.com/pub/jos%C3%A9-guilherme-vanz/51/b27/58b/>
"O sofrimento é passageiro, desistir é para sempre" - Bernardo Fonseca,
recordista da Antarctic Ice Marathon.


Cassandra Delete Query Doubt

2021-11-10 Thread raman gugnani
HI Team,


I have one table below and want to delete data on this table.


DELETE  FROM game.tournament USING TIMESTAMP 161692578000 WHERE
tournament_id = 1 AND version_id = 1 AND partition_id = 1;


Cassandra internally manages the timestamp of each column when some data is
updated on the same column.


My Query is , *USING TIMESTAMP 161692578000* picks up a timestamp of
which column ?



CREATE TABLE game.tournament (

tournament_id bigint,

version_id bigint,

partition_id bigint,

user_id bigint,

created_at timestamp,

rank bigint,

score bigint,

updated_at timestamp,

PRIMARY KEY ((tournament_id, version_id, partition_id), user_id)

) WITH CLUSTERING ORDER BY (user_id ASC)







-- 
Raman Gugnani


multi-node cassandra config doubt

2011-08-23 Thread Thamizh
Hi All,

This is regarding multi-node cluster configuration doubt.

I have configured 3 nodes of cluster using Cassandra-0.8.4 and getting error 
when I ran Map/Reduce job which uploads records from HDFS to Cassandra.

Here are my 3 nodes cluster config file (cassandra.yaml) for Cassandra:

node01:
    seeds: "node01,node02,node03"
    auto_bootstrap: false
    listen_address: 192.168.0.1
    rpc_address: 192.168.0.1


node02:

seeds: "node01,node02,node03"
auto_bootstrap: true
listen_address: 192.168.0.2
rpc_address: 192.168.0.2


node03:
seeds: "node01,node02,node03"
auto_bootstrap: true
listen_address: 192.168.0.3
rpc_address: 192.168.0.3

When I ran M/R program, I am getting below error
11/08/23 04:37:00 INFO mapred.JobClient:  map 100% reduce 11%
11/08/23 04:37:06 INFO mapred.JobClient:  map 100% reduce 22%
11/08/23 04:37:09 INFO mapred.JobClient:  map 100% reduce 33%
11/08/23 04:37:14 INFO mapred.JobClient: Task Id : 
attempt_201104211044_0719_r_00_0, Status : FAILED
java.lang.NullPointerException
    at org.apache.cassandra.client.RingCache.getRange(RingCache.java:130)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:125)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:60)
    at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:90)
    at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:1)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:563)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)


Is anything wrong on my cassandra.yaml file?

I followed http://wiki.apache.org/cassandra/MultinodeCluster for cluster 
configuration.

Regards,
Thamizhannal

Re: Good partition key doubt

2014-12-11 Thread DuyHai Doan
"what is a good partition key? Is partition key direct related with my
query performance? What is the best practices?"

A good partition key is a partition key that will scale with your data. An
example: if you have a business involving individuals, it is likely that
your business will scale as soon as the number of users will grow. In this
case user_id is a good partition key because all the users will
be uniformly distributed over all the Cassandra nodes.

For your log example, using only server_id for partition key is clearly not
enough because what will scale is the log lines, not the number of server.

>From the point of view of scalability (not taking about query-ability),
adding the log_type will not scale either, because the number of different
log types is likely to be a small set. For great scalability (not taking
about query-ability), the couple (server_id,log_timestamp) is likely a good
combination.

 Now for query, as you should know, it is not possible to have range query
(using <, ≤, ≥, >) over partition key, you must always use equality (=) so
you won't be able to leverage the log_timestamp component in the partition
key for your query.

Bucketing by date is a good idea though, and the date resolution will
depends on the log generation rate. If logs are generated very often, maybe
a bucket by hour. If the generation rate is smaller, maybe a day or a week
bucket is fine.

Talking about log_type, putting it into the partition key will help
partitioning further, in addition of the date bucket. However it forces you
to always provide a log_type whenever you want to query, be aware of this.

An example of data model for your logs could be

CREATE TABLE logs_by_server_and_type_and_date(
   server_id int,
   log_type text,
   date_bucket int, //Date bucket using format MMDD or MMDDHH or ...
   log_timestamp timeuuid,
   log_info text,
   PRIMARY KEY((server_id,log_type,date_bucket),log_timestamp)
);


"And if I want to query all logs in a period of time how can I select I
range o rows?" --> New query path = new table

CREATE TABLE logs_by_date(
   date_bucket int, //Date bucket using format MMDD or MMDDHH or ...
   log_timestamp timeuuid,
   server_id int,
   log_type text,
   log_info text,
   PRIMARY KEY((date_bucket),log_timestamp) // you may add server_id or
log_type as clustering column optionally
);

For this table, the date_bucket should be chosen very carefully because for
the same bucket, we're going to store logs of ALL servers and all types ...

For the query, you should provide the date bucket as partition key, and
then use (<, ≤, ≥, >) on the log_timestamp column


On Thu, Dec 11, 2014 at 12:00 PM, José Guilherme Vanz <
guilherme@gmail.com> wrote:

> Hello folks
>
> I am studying Cassandra for a short a period of time and now I am modeling
> a database for study purposes. During my modeling I have faced a doubt,
> what is a good partition key? Is partition key direct related with my query
> performance? What is the best practices?
>
> Just to study case, let's suppose I have a column family where is inserted
> all kind of logs ( http server, application server, application logs, etc )
> data from different servers. In this column family I have server_id (
> unique identifier for each server ) column, log_type ( http server,
> application server, application log ) column and log_info column. Is a good
> ideia create a partition key using server_id and log_type columns to store
> all logs data from a specific type and server in a physical row? And if do
> I want a physical row for each day? Is a good idea add a third column with
> the date in the partition key? And if I want to query all logs in a period
> of time how can I select I range o rows? Do I have to duplicate date column
> ( considering I have to use = operator with partition key ) ?
>
> All the best
> --
> Att. José Guilherme Vanz
> br.linkedin.com/pub/josé-guilherme-vanz/51/b27/58b/
> <http://br.linkedin.com/pub/jos%C3%A9-guilherme-vanz/51/b27/58b/>
> "O sofrimento é passageiro, desistir é para sempre" - Bernardo Fonseca,
> recordista da Antarctic Ice Marathon.
>


Re: Good partition key doubt

2014-12-15 Thread José Guilherme Vanz
Nice, I got it. =]
If I have more questions I'll send other emails. xD
Thank you

On Thu, Dec 11, 2014 at 12:17 PM, DuyHai Doan  wrote:
>
> "what is a good partition key? Is partition key direct related with my
> query performance? What is the best practices?"
>
> A good partition key is a partition key that will scale with your data. An
> example: if you have a business involving individuals, it is likely that
> your business will scale as soon as the number of users will grow. In this
> case user_id is a good partition key because all the users will
> be uniformly distributed over all the Cassandra nodes.
>
> For your log example, using only server_id for partition key is clearly
> not enough because what will scale is the log lines, not the number of
> server.
>
> From the point of view of scalability (not taking about query-ability),
> adding the log_type will not scale either, because the number of different
> log types is likely to be a small set. For great scalability (not taking
> about query-ability), the couple (server_id,log_timestamp) is likely a good
> combination.
>
>  Now for query, as you should know, it is not possible to have range query
> (using <, ≤, ≥, >) over partition key, you must always use equality (=) so
> you won't be able to leverage the log_timestamp component in the partition
> key for your query.
>
> Bucketing by date is a good idea though, and the date resolution will
> depends on the log generation rate. If logs are generated very often, maybe
> a bucket by hour. If the generation rate is smaller, maybe a day or a week
> bucket is fine.
>
> Talking about log_type, putting it into the partition key will help
> partitioning further, in addition of the date bucket. However it forces you
> to always provide a log_type whenever you want to query, be aware of this.
>
> An example of data model for your logs could be
>
> CREATE TABLE logs_by_server_and_type_and_date(
>server_id int,
>log_type text,
>date_bucket int, //Date bucket using format MMDD or MMDDHH or
> ...
>log_timestamp timeuuid,
>log_info text,
>PRIMARY KEY((server_id,log_type,date_bucket),log_timestamp)
> );
>
>
> "And if I want to query all logs in a period of time how can I select I
> range o rows?" --> New query path = new table
>
> CREATE TABLE logs_by_date(
>date_bucket int, //Date bucket using format MMDD or MMDDHH or
> ...
>log_timestamp timeuuid,
>server_id int,
>log_type text,
>log_info text,
>PRIMARY KEY((date_bucket),log_timestamp) // you may add server_id or
> log_type as clustering column optionally
> );
>
> For this table, the date_bucket should be chosen very carefully because
> for the same bucket, we're going to store logs of ALL servers and all types
> ...
>
> For the query, you should provide the date bucket as partition key, and
> then use (<, ≤, ≥, >) on the log_timestamp column
>
>
> On Thu, Dec 11, 2014 at 12:00 PM, José Guilherme Vanz <
> guilherme@gmail.com> wrote:
>
>> Hello folks
>>
>> I am studying Cassandra for a short a period of time and now I am
>> modeling a database for study purposes. During my modeling I have faced a
>> doubt, what is a good partition key? Is partition key direct related with
>> my query performance? What is the best practices?
>>
>> Just to study case, let's suppose I have a column family where is
>> inserted all kind of logs ( http server, application server, application
>> logs, etc ) data from different servers. In this column family I have
>> server_id ( unique identifier for each server ) column, log_type ( http
>> server,  application server, application log ) column and log_info column.
>> Is a good ideia create a partition key using server_id and log_type columns
>> to store all logs data from a specific type and server in a physical row?
>> And if do I want a physical row for each day? Is a good idea add a third
>> column with the date in the partition key? And if I want to query all logs
>> in a period of time how can I select I range o rows? Do I have to duplicate
>> date column ( considering I have to use = operator with partition key ) ?
>>
>> All the best
>> --
>> Att. José Guilherme Vanz
>> br.linkedin.com/pub/josé-guilherme-vanz/51/b27/58b/
>> <http://br.linkedin.com/pub/jos%C3%A9-guilherme-vanz/51/b27/58b/>
>> "O sofrimento é passageiro, desistir é para sempre" - Bernardo Fonseca,
>> recordista da Antarctic Ice Marathon.
>>
>
>

-- 
Att. José Guilherme Vanz
br.linkedin.com/pub/josé-guilherme-vanz/51/b27/58b/
<http://br.linkedin.com/pub/jos%C3%A9-guilherme-vanz/51/b27/58b/>
"O sofrimento é passageiro, desistir é para sempre" - Bernardo Fonseca,
recordista da Antarctic Ice Marathon.


Re: Cassandra Delete Query Doubt

2021-11-10 Thread Jeff Jirsa
This type of delete - which doesnt supply a user_id, so it's deleting a
range of rows - creates what is known as a range tombstone. It's not tied
to any given cell, as it covers a range of cells, and supersedes/shadows
them when merged (either in the read path or compaction path).



On Wed, Nov 10, 2021 at 4:27 AM raman gugnani 
wrote:

> HI Team,
>
>
> I have one table below and want to delete data on this table.
>
>
> DELETE  FROM game.tournament USING TIMESTAMP 161692578000 WHERE
> tournament_id = 1 AND version_id = 1 AND partition_id = 1;
>
>
> Cassandra internally manages the timestamp of each column when some data
> is updated on the same column.
>
>
> My Query is , *USING TIMESTAMP 161692578000* picks up a timestamp of
> which column ?
>
>
>
> CREATE TABLE game.tournament (
>
> tournament_id bigint,
>
> version_id bigint,
>
> partition_id bigint,
>
> user_id bigint,
>
> created_at timestamp,
>
> rank bigint,
>
> score bigint,
>
> updated_at timestamp,
>
> PRIMARY KEY ((tournament_id, version_id, partition_id), user_id)
>
> ) WITH CLUSTERING ORDER BY (user_id ASC)
>
>
>
>
>
>
>
> --
> Raman Gugnani
>


Re: Cassandra Delete Query Doubt

2021-11-10 Thread raman gugnani
Thanks Jeff for the information.

On Wed, 10 Nov 2021 at 21:08, Jeff Jirsa  wrote:

> This type of delete - which doesnt supply a user_id, so it's deleting a
> range of rows - creates what is known as a range tombstone. It's not tied
> to any given cell, as it covers a range of cells, and supersedes/shadows
> them when merged (either in the read path or compaction path).
>
>
>
> On Wed, Nov 10, 2021 at 4:27 AM raman gugnani 
> wrote:
>
>> HI Team,
>>
>>
>> I have one table below and want to delete data on this table.
>>
>>
>> DELETE  FROM game.tournament USING TIMESTAMP 161692578000 WHERE
>> tournament_id = 1 AND version_id = 1 AND partition_id = 1;
>>
>>
>> Cassandra internally manages the timestamp of each column when some data
>> is updated on the same column.
>>
>>
>> My Query is , *USING TIMESTAMP 161692578000* picks up a timestamp of
>> which column ?
>>
>>
>>
>> CREATE TABLE game.tournament (
>>
>> tournament_id bigint,
>>
>> version_id bigint,
>>
>> partition_id bigint,
>>
>> user_id bigint,
>>
>> created_at timestamp,
>>
>> rank bigint,
>>
>> score bigint,
>>
>> updated_at timestamp,
>>
>> PRIMARY KEY ((tournament_id, version_id, partition_id), user_id)
>>
>> ) WITH CLUSTERING ORDER BY (user_id ASC)
>>
>>
>>
>>
>>
>>
>>
>> --
>> Raman Gugnani
>>
>

-- 
Raman Gugnani


Re: multi-node cassandra config doubt

2011-08-24 Thread aaron morton
Did you get this sorted ? 

At a guess I would say there are no nodes listed in the Hadoop JobConf.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2011, at 9:51 PM, Thamizh wrote:

> Hi All,
> 
> This is regarding multi-node cluster configuration doubt.
> 
> I have configured 3 nodes of cluster using Cassandra-0.8.4 and getting error 
> when I ran Map/Reduce job which uploads records from HDFS to Cassandra.
> 
> Here are my 3 nodes cluster config file (cassandra.yaml) for Cassandra:
> 
> node01:
> seeds: "node01,node02,node03"
> auto_bootstrap: false
> listen_address: 192.168.0.1
> rpc_address: 192.168.0.1
> 
> 
> node02:
> 
> seeds: "node01,node02,node03"
> auto_bootstrap: true
> listen_address: 192.168.0.2
> rpc_address: 192.168.0.2
> 
> 
> node03:
> seeds: "node01,node02,node03"
> auto_bootstrap: true
> listen_address: 192.168.0.3
> rpc_address: 192.168.0.3
> 
> When I ran M/R program, I am getting below error
> 11/08/23 04:37:00 INFO mapred.JobClient:  map 100% reduce 11%
> 11/08/23 04:37:06 INFO mapred.JobClient:  map 100% reduce 22%
> 11/08/23 04:37:09 INFO mapred.JobClient:  map 100% reduce 33%
> 11/08/23 04:37:14 INFO mapred.JobClient: Task Id : 
> attempt_201104211044_0719_r_00_0, Status : FAILED
> java.lang.NullPointerException
> at org.apache.cassandra.client.RingCache.getRange(RingCache.java:130)
> at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:125)
> at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:60)
> at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:90)
> at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:1)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:563)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> 
> Is anything wrong on my cassandra.yaml file?
> 
> I followed http://wiki.apache.org/cassandra/MultinodeCluster for cluster 
> configuration.
> 
> Regards,
> Thamizhannal



Re: multi-node cassandra config doubt

2011-08-24 Thread Thamizh
Hi Aaron,

This is yet to be resolved. 

I have set-up Cassandra multi node clustering and facing issues in pushing HDFS 
data to Cassandra. When I ran "MapReduce" progrma I am getting 
UnknownHostException.

In hadoop(0.20.1), I have configured node01-as master and node01, node02 & 
node03 as slaves.

In Cassandra(0.8.4), the installation & configurations has been done. when I 
issue nodetool ring command I could see the ring and also the KEYSPACES & 
COLUMNFAMILYS have got distributed.

o/p: nodetool
$bin/nodetool -h node02 ring
Address DC  Rack    Status State   Load    Owns    
Token   
   
161930152162677484001961360738128229499 
198.168.0.1 datacenter1 rack1   Up Normal  132.28 MB   12.48%  
13027320554261208311902766005835168982  
198.168.0.2 datacenter1 rack1   Up Normal  99.34 MB    75.07%  
140745249930211229277235689500208693608 
198.168.0.3 datacenter1 rack1   Up Normal  66.21 KB    12.45%  
161930152162677484001961360738128229499 
nutch@lab02:/code/apache-cassandra-0.8.4$ 


Here are the hadoop config.

        job4.setOutputFormatClass(ColumnFamilyOutputFormat.class);
        ConfigHelper.setOutputColumnFamily(job4.getConfiguration(), 
KEYSPACE,COLUMN_FAMILY );
        ConfigHelper.setRpcPort(job4.getConfiguration(), ""9160);
        ConfigHelper.setInitialAddress(job4.getConfiguration(), "node01");
        ConfigHelper.setPartitioner(job4.getConfiguration(), 
"org.apache.cassandra.dht.RandomPartitioner");

Bleow is an exception message:

Error: java.net.UnknownHostException: /198.168.0.3
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:849)
    at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1200)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1153)
    at java.net.InetAddress.getAllByName(InetAddress.java:1083)
    at java.net.InetAddress.getAllByName(InetAddress.java:1019)
    at java.net.InetAddress.getByName(InetAddress.java:969)
    at 
org.apache.cassandra.client.RingCache.refreshEndpointMap(RingCache.java:93)
    at org.apache.cassandra.client.RingCache.(RingCache.java:67)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:98)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:92)
    at 
org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:132)
    at 
org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:62)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

note: Same /etc/hosts file has been used across all the nodes.

Kindly help me to resolve this issue?


Regards,

  Thamizhannal P

--- On Wed, 24/8/11, aaron morton  wrote:

From: aaron morton 
Subject: Re: multi-node cassandra config doubt
To: user@cassandra.apache.org
Date: Wednesday, 24 August, 2011, 2:40 PM

Did you get this sorted ? 
At a guess I would say there are no nodes listed in the Hadoop JobConf.
Cheers

-Aaron MortonFreelance Cassandra 
Developer@aaronmortonhttp://www.thelastpickle.com



On 23/08/2011, at 9:51 PM, Thamizh wrote:
Hi All,

This is regarding multi-node cluster configuration doubt.

I have configured 3 nodes of cluster using Cassandra-0.8.4 and getting error 
when I ran Map/Reduce job which uploads records from HDFS to Cassandra.

Here are my 3 nodes cluster config file (cassandra.yaml) for Cassandra:

node01:
    seeds: "node01,node02,node03"
    auto_bootstrap: false
    listen_address: 192.168.0.1
    rpc_address: 192.168.0.1


node02:

seeds: "node01,node02,node03"
auto_bootstrap: true
listen_address: 192.168.0.2
rpc_address: 192.168.0.2


node03:
seeds: "node01,node02,node03"
auto_bootstrap: true
listen_address: 192.168.0.3
rpc_address: 192.168.0.3

When I ran M/R program, I am getting below error
11/08/23 04:37:00 INFO
 mapred.JobClient:  map 100% reduce 11%
11/08/23 04:37:06 INFO mapred.JobClient:  map 100% reduce 22%
11/08/23 04:37:09 INFO mapred.JobClient:  map 100% reduce 33%
11/08/23 04:37:14 INFO mapred.JobClient: Task Id : 
attempt_201104211044_0719_r_00_0, Status : FAILED
java.lang.NullPointerException
    at org.apache.cassandra.client.RingCache.getRange(RingCache.java:130)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:125)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:60)
    at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.wr

Re: multi-node cassandra config doubt

2011-08-24 Thread aaron morton
Jump on the machine that raised the error and see if you can ssh to node01. 

or try using ip address to see if they work. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2011, at 11:34 PM, Thamizh wrote:

> Hi Aaron,
> 
> This is yet to be resolved. 
> 
> I have set-up Cassandra multi node clustering and facing issues in pushing 
> HDFS data to Cassandra. When I ran "MapReduce" progrma I am getting 
> UnknownHostException.
> 
> In hadoop(0.20.1), I have configured node01-as master and node01, node02 & 
> node03 as slaves.
> 
> In Cassandra(0.8.4), the installation & configurations has been done. when I 
> issue nodetool ring command I could see the ring and also the KEYSPACES & 
> COLUMNFAMILYS have got distributed.
> 
> o/p: nodetool
> $bin/nodetool -h node02 ring
> Address DC  RackStatus State   LoadOwns   
>  Token   
>   
>  161930152162677484001961360738128229499 
> 198.168.0.1 datacenter1 rack1   Up Normal  132.28 MB   12.48% 
>  13027320554261208311902766005835168982  
> 198.168.0.2 datacenter1 rack1   Up Normal  99.34 MB75.07% 
>  140745249930211229277235689500208693608 
> 198.168.0.3 datacenter1 rack1   Up Normal  66.21 KB12.45% 
>  161930152162677484001961360738128229499 
> nutch@lab02:/code/apache-cassandra-0.8.4$ 
> 
> 
> Here are the hadoop config.
> 
> job4.setOutputFormatClass(ColumnFamilyOutputFormat.class);
> ConfigHelper.setOutputColumnFamily(job4.getConfiguration(), 
> KEYSPACE,COLUMN_FAMILY );
> ConfigHelper.setRpcPort(job4.getConfiguration(), ""9160);
> ConfigHelper.setInitialAddress(job4.getConfiguration(), "node01");
> ConfigHelper.setPartitioner(job4.getConfiguration(), 
> "org.apache.cassandra.dht.RandomPartitioner");
> 
> Bleow is an exception message:
> 
> Error: java.net.UnknownHostException: /198.168.0.3
> at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:849)
> at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1200)
> at java.net.InetAddress.getAllByName0(InetAddress.java:1153)
> at java.net.InetAddress.getAllByName(InetAddress.java:1083)
> at java.net.InetAddress.getAllByName(InetAddress.java:1019)
> at java.net.InetAddress.getByName(InetAddress.java:969)
> at 
> org.apache.cassandra.client.RingCache.refreshEndpointMap(RingCache.java:93)
> at org.apache.cassandra.client.RingCache.(RingCache.java:67)
> at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:98)
> at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:92)
> at 
> org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:132)
> at 
> org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:62)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> note: Same /etc/hosts file has been used across all the nodes.
> 
> Kindly help me to resolve this issue?
> 
> 
> Regards,
> Thamizhannal P
> 
> --- On Wed, 24/8/11, aaron morton  wrote:
> 
> From: aaron morton 
> Subject: Re: multi-node cassandra config doubt
> To: user@cassandra.apache.org
> Date: Wednesday, 24 August, 2011, 2:40 PM
> 
> Did you get this sorted ? 
> 
> At a guess I would say there are no nodes listed in the Hadoop JobConf.
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 23/08/2011, at 9:51 PM, Thamizh wrote:
> 
>> Hi All,
>> 
>> This is regarding multi-node cluster configuration doubt.
>> 
>> I have configured 3 nodes of cluster using Cassandra-0.8.4 and getting error 
>> when I ran Map/Reduce job which uploads records from HDFS to Cassandra.
>> 
>> Here are my 3 nodes cluster config file (cassandra.yaml) for Cassandra:
>> 
>> node01:
>> seeds: "node01,node02,node03"
>> auto_bootstrap: false
>> listen_address: 192.168.0.1
>> rpc_address: 192.168.0.1
>> 
>> 
>> node02:
>> 
>> seeds: "node01,node02,node03"
>&g

Re: multi-node cassandra config doubt

2011-08-25 Thread Thamizh
Hi Aaron,

Thanks a lot for your suggestions. I have got exhausted with below error. It 
would great if you point me what went wrong with my approach.

I wanted to install cassandra-0.8.4 on 3 nodes and to run Map/Reduce job that 
uploads data from HDFS to Cassandra.

I have installed Cassnadra on 3 nodes lab02(199.168.0.2),lab03(199.168.0.3) & 
lab04(199.168.0.4) respectively and can create a keyspace & column family and 
they got distributed across the cluster.

When I run my map/reduce program it ended up with "UnknownHostException". the 
same map/reduce program works well on single node cluster.


Here are the steps which I have followed.

1. cassandra.yaml details

lab02(199.168.0.2): (seed node)

auto_bootstrap: false
seeds: "199.168.0.2"
listen_address: 199.168.0.2
rpc_address: 199.168.0.2

lab03(199.168.0.3):
auto_bootstrap: true
seeds: "199.168.0.2"
listen_address: 199.168.0.3
rpc_address: 199.168.0.3

lab04(199.168.0.4):
auto_bootstrap: true
seeds: "199.168.0.2"
listen_address: 199.168.0.4
rpc_address: 199.168.0.4


2.
O/P of bin/cassandra :
    --
    --
 INFO 11:59:40,602 Node /199.168.0.2 is now part of the cluster
 INFO 11:59:40,604 InetAddress /199.168.0.2 is now UP
 INFO 11:59:55,667 Node /199.168.0.4 is now part of the cluster
 INFO 11:59:55,669 InetAddress /199.168.0.4 is now UP
 INFO 12:01:08,389 Joining: getting bootstrap token
 INFO 12:01:08,410 New token will be 43083119672609054510947312506340649252 to 
assume load from /199.168.0.2
 INFO 12:01:08,412 Enqueuing flush of Memtable-LocationInfo@6824966(123/153 
serialized/live bytes, 4 ops)
 INFO 12:01:08,413 Writing Memtable-LocationInfo@6824966(123/153 
serialized/live bytes, 4 ops)
 INFO 12:01:08,461 Completed flushing 
/var/lib/cassandra/data/system/LocationInfo-g-2-Data.db (287 bytes)
 INFO 12:01:08,477 Node /199.168.0.3 state jump to normal
 INFO 12:01:08,480 Enqueuing flush of Memtable-LocationInfo@10141941(53/66 
serialized/live bytes, 2 ops)
 INFO 12:01:08,482 Writing Memtable-LocationInfo@10141941(53/66 serialized/live 
bytes, 2 ops)
 INFO 12:01:08,514 Completed flushing 
/var/lib/cassandra/data/system/LocationInfo-g-3-Data.db (163 bytes)
 INFO 12:01:08,527 Node /199.168.0.3 state jump to normal
 INFO 12:01:08,652 mx4j successfuly loaded
HttpAdaptor version 3.0.1 started on port 8081

3.
When I run my map/reduce program it ended up with "UnknownHostException"

Error: java.net.UnknownHostException: /199.168.0.2
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:849)
    at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1200)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1153)
    at java.net.InetAddress.getAllByName(InetAddress.java:1083)
    at java.net.InetAddress.getAllByName(InetAddress.java:1019)
    at java.net.InetAddress.getByName(InetAddress.java:969)
    at 
org.apache.cassandra.client.RingCache.refreshEndpointMap(RingCache.java:93)
    at org.apache.cassandra.client.RingCache.(RingCache.java:67)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:98)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:92)
    at 
org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:132)
    at 
org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:62)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

Here are the config line for map/reduce.

        job4.setReducerClass(TblUploadReducer.class );
        job4.setOutputKeyClass(ByteBuffer.class);
        job4.setOutputValueClass(List.class);
        job4.setOutputFormatClass(ColumnFamilyOutputFormat.class);
        ConfigHelper.setOutputColumnFamily(job4.getConfiguration(), 
args[1],args[3] );
        ConfigHelper.setRpcPort(job4.getConfiguration(),  args[7]); // 9160
        ConfigHelper.setInitialAddress(job4.getConfiguration(), args[9]); // 
199.168.0.2
        ConfigHelper.setPartitioner(job4.getConfiguration(), 
"org.apache.cassandra.dht.RandomPartitioner");

Steps which I have verified,
1. There is a passwordless ssh has been configured b/w lab02,lab03 &lab04. All 
the nodes can ping each other with out any issues.
2. When I ran "InetAddress.getLocalHost()" from java program on lab02 it prints 
"lab02/199.168.0.2".
3. When I over looked "o/p" of bin/cassandra it prints couple of messages and 
under InetAddress field "/199.168.0.3" etc.
Here it does not print "hostname/IP". Is that problem?

Kindly help me.

Regards,
Thamizhannal 

--- On Thu, 25/8/11, aaron morton  wrote:

From: aaron morton 
Subject: Re: multi-node cass

Re: multi-node cassandra config doubt

2011-08-25 Thread Thamizh
Hi All,

It looks it is know issue with Cassandra-0.8.4. So either I have to wait till 
0.8.5 to be released or have to switch to 0.7.8 if this has been resolved in 
that.
Ref: https://issues.apache.org/jira/browse/CASSANDRA-3044

Regards,

  Thamizhannal P

--- On Thu, 25/8/11, Thamizh  wrote:

From: Thamizh 
Subject: Re: multi-node cassandra config doubt
To: user@cassandra.apache.org
Date: Thursday, 25 August, 2011, 9:01 PM

Hi Aaron,

Thanks a lot for your suggestions. I have got exhausted with below error. It 
would great if you point me what went wrong with my approach.

I wanted to install cassandra-0.8.4 on 3 nodes and to run Map/Reduce job that 
uploads data from HDFS to Cassandra.

I have installed Cassnadra on 3 nodes lab02(199.168.0.2),lab03(199.168.0.3) & 
lab04(199.168.0.4) respectively and can create a keyspace & column family and 
they got distributed across the cluster.

When I run my map/reduce program it ended up with "UnknownHostException". the 
same map/reduce program works well on single node cluster.


Here are the steps which I have followed.

1. cassandra.yaml details

lab02(199.168.0.2): (seed node)

auto_bootstrap: false
seeds: "199.168.0.2"
listen_address: 199.168.0.2
rpc_address:
 199.168.0.2

lab03(199.168.0.3):
auto_bootstrap: true
seeds: "199.168.0.2"
listen_address: 199.168.0.3
rpc_address: 199.168.0.3

lab04(199.168.0.4):
auto_bootstrap: true
seeds: "199.168.0.2"
listen_address: 199.168.0.4
rpc_address: 199.168.0.4


2.
O/P of bin/cassandra :
    --
    --
 INFO 11:59:40,602 Node /199.168.0.2 is now part of the cluster
 INFO 11:59:40,604 InetAddress /199.168.0.2 is now UP
 INFO 11:59:55,667 Node /199.168.0.4 is now part of the cluster
 INFO 11:59:55,669 InetAddress /199.168.0.4 is now UP
 INFO 12:01:08,389 Joining: getting bootstrap token
 INFO 12:01:08,410 New token will be 43083119672609054510947312506340649252 to 
assume load from /199.168.0.2
 INFO 12:01:08,412 Enqueuing flush of Memtable-LocationInfo@6824966(123/153 
serialized/live bytes, 4 ops)
 INFO 12:01:08,413
 Writing Memtable-LocationInfo@6824966(123/153 serialized/live bytes, 4 ops)
 INFO 12:01:08,461 Completed flushing 
/var/lib/cassandra/data/system/LocationInfo-g-2-Data.db (287 bytes)
 INFO 12:01:08,477 Node /199.168.0.3 state jump to normal
 INFO 12:01:08,480 Enqueuing flush of Memtable-LocationInfo@10141941(53/66 
serialized/live bytes, 2 ops)
 INFO 12:01:08,482 Writing Memtable-LocationInfo@10141941(53/66 serialized/live 
bytes, 2 ops)
 INFO 12:01:08,514 Completed flushing 
/var/lib/cassandra/data/system/LocationInfo-g-3-Data.db (163 bytes)
 INFO 12:01:08,527 Node /199.168.0.3 state jump to normal
 INFO 12:01:08,652 mx4j successfuly loaded
HttpAdaptor version 3.0.1 started on port 8081

3.
When I run my map/reduce program it ended up with "UnknownHostException"

Error: java.net.UnknownHostException: /199.168.0.2
    at
 java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:849)
    at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1200)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1153)
    at java.net.InetAddress.getAllByName(InetAddress.java:1083)
    at java.net.InetAddress.getAllByName(InetAddress.java:1019)
    at java.net.InetAddress.getByName(InetAddress.java:969)
    at 
org.apache.cassandra.client.RingCache.refreshEndpointMap(RingCache.java:93)
    at org.apache.cassandra.client.RingCache.(RingCache.java:67)
    at 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:98)
    at
 
org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.(ColumnFamilyRecordWriter.java:92)
    at 
org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:132)
    at 
org.apache.cassandra.hadoop.ColumnFamilyOutputFormat.getRecordWriter(ColumnFamilyOutputFormat.java:62)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

Here are the config line for map/reduce.

        job4.setReducerClass(TblUploadReducer.class );
        job4.setOutputKeyClass(ByteBuffer.class);
        job4.setOutputValueClass(List.class);
       
 job4.setOutputFormatClass(ColumnFamilyOutputFormat.class);
        ConfigHelper.setOutputColumnFamily(job4.getConfiguration(), 
args[1],args[3] );
        ConfigHelper.setRpcPort(job4.getConfiguration(),  args[7]); // 9160
        ConfigHelper.setInitialAddress(job4.getConfiguration(), args[9]); // 
199.168.0.2
        ConfigHelper.setPartitioner(job4.getConfiguration(), 
"org.apache.cassandra.dht.RandomPartitioner");

Steps which I have verified,
1. There is a passwordless ssh has been configured b/w lab02,lab03 &lab04. All 
the nodes can ping each other with out any issues.
2. When I ran "InetAdd

Doubt in Row key range scan

2012-05-28 Thread Prakrati Agrawal
Dear all

I have stored my data into Cassandra database in the format "tickerID_date". 
Now when I specify the row key range like 1_2012/05/24(start) to 
1_2012/05/27(end) it says that the end key md5 value is lesser than start key 
md5 value. So I changed my start key to  1_2012/05/27 and end key to 
1_2012/05/24, then I got all the keys even which are not in my range like 
67_2012/05/23 and 54_2012/05/28. I am  using Thrift API.
Please help me as I want only the columns of 1_2012/05/24, 1_2012/05/25 , 
1_2012/05/26 and 1_2012/05/27.

Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Doubt in Row key range scan

2012-05-28 Thread Pierre Chalamet
Hi,

It's normal.

Keys to replicas are determined with a hash (md5) when using the random 
partitionner (which you are using I guess).
 
You probably want to switch to the order preserving partionner or tweak your 
data model in order to rely on 2nd index for such filtering.

- Pierre

-Original Message-
From: Prakrati Agrawal 
Date: Mon, 28 May 2012 04:39:46 
To: user@cassandra.apache.org
Reply-To: user@cassandra.apache.org
Subject: Doubt in Row key range scan 

Dear all

I have stored my data into Cassandra database in the format "tickerID_date". 
Now when I specify the row key range like 1_2012/05/24(start) to 
1_2012/05/27(end) it says that the end key md5 value is lesser than start key 
md5 value. So I changed my start key to  1_2012/05/27 and end key to 
1_2012/05/24, then I got all the keys even which are not in my range like 
67_2012/05/23 and 54_2012/05/28. I am  using Thrift API.
Please help me as I want only the columns of 1_2012/05/24, 1_2012/05/25 , 
1_2012/05/26 and 1_2012/05/27.

Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



Re: Doubt in Row key range scan

2012-05-28 Thread Alain RODRIGUEZ
You are using the Random Partitioner.

Using the RP is a good thing because you avoid hot spots, but it has
its defaults too. You can't scan a slice of row, they won't be ordered
because all your keys are stored using their md5 values.

You should review your data model to use columns to order your data.

Alain

2012/5/28 Prakrati Agrawal :
> Dear all
>
>
>
> I have stored my data into Cassandra database in the format “tickerID_date”.
> Now when I specify the row key range like 1_2012/05/24(start) to
> 1_2012/05/27(end) it says that the end key md5 value is lesser than start
> key md5 value. So I changed my start key to  1_2012/05/27 and end key to
> 1_2012/05/24, then I got all the keys even which are not in my range like
> 67_2012/05/23 and 54_2012/05/28. I am  using Thrift API.
>
> Please help me as I want only the columns of 1_2012/05/24, 1_2012/05/25 ,
> 1_2012/05/26 and 1_2012/05/27.
>
>
>
> Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com
>
>
>
>
> 
> This email message may contain proprietary, private and confidential
> information. The information transmitted is intended only for the person(s)
> or entities to which it is addressed. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient is
> prohibited and may be illegal. If you received this in error, please contact
> the sender and delete the message from your system.
>
> Mu Sigma takes all reasonable steps to ensure that its electronic
> communications are free from viruses. However, given Internet accessibility,
> the Company cannot accept liability for any virus introduced by this e-mail
> or any attachment and you are advised to use up-to-date virus checking
> software.


RE: Doubt in Row key range scan

2012-05-28 Thread Prakrati Agrawal
Please could you tell me how to tweak my data model to rely on 2nd index ?
Thank you


Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com

From: Pierre Chalamet [mailto:pie...@chalamet.net]
Sent: Monday, May 28, 2012 3:31 PM
To: user@cassandra.apache.org
Subject: Re: Doubt in Row key range scan

Hi,

It's normal.

Keys to replicas are determined with a hash (md5) when using the random 
partitionner (which you are using I guess).

You probably want to switch to the order preserving partionner or tweak your 
data model in order to rely on 2nd index for such filtering.
- Pierre

From: Prakrati Agrawal 
Date: Mon, 28 May 2012 04:39:46 -0500
To: user@cassandra.apache.org
ReplyTo: user@cassandra.apache.org
Subject: Doubt in Row key range scan

Dear all

I have stored my data into Cassandra database in the format "tickerID_date". 
Now when I specify the row key range like 1_2012/05/24(start) to 
1_2012/05/27(end) it says that the end key md5 value is lesser than start key 
md5 value. So I changed my start key to  1_2012/05/27 and end key to 
1_2012/05/24, then I got all the keys even which are not in my range like 
67_2012/05/23 and 54_2012/05/28. I am  using Thrift API.
Please help me as I want only the columns of 1_2012/05/24, 1_2012/05/25 , 
1_2012/05/26 and 1_2012/05/27.

Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Doubt in Row key range scan

2012-05-28 Thread Luís Ferreira
Check this out: http://www.anuff.com/2011/02/indexing-in-cassandra.html#more

Or just google for wide row indexes.
On May 28, 2012, at 11:22 AM, Prakrati Agrawal wrote:

> Please could you tell me how to tweak my data model to rely on 2nd index ?
> Thank you
>  
>  
> Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com
>  
> From: Pierre Chalamet [mailto:pie...@chalamet.net] 
> Sent: Monday, May 28, 2012 3:31 PM
> To: user@cassandra.apache.org
> Subject: Re: Doubt in Row key range scan
>  
> Hi,
> 
> It's normal.
> 
> Keys to replicas are determined with a hash (md5) when using the random 
> partitionner (which you are using I guess).
> 
> You probably want to switch to the order preserving partionner or tweak your 
> data model in order to rely on 2nd index for such filtering.
> - Pierre
> From: Prakrati Agrawal 
> Date: Mon, 28 May 2012 04:39:46 -0500
> To: user@cassandra.apache.org
> ReplyTo: user@cassandra.apache.org
> Subject: Doubt in Row key range scan
>  
> Dear all
>  
> I have stored my data into Cassandra database in the format “tickerID_date”. 
> Now when I specify the row key range like 1_2012/05/24(start) to 
> 1_2012/05/27(end) it says that the end key md5 value is lesser than start key 
> md5 value. So I changed my start key to  1_2012/05/27 and end key to 
> 1_2012/05/24, then I got all the keys even which are not in my range like 
> 67_2012/05/23 and 54_2012/05/28. I am  using Thrift API.
> Please help me as I want only the columns of 1_2012/05/24, 1_2012/05/25 , 
> 1_2012/05/26 and 1_2012/05/27.
>  
> Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com
>  
>  
> This email message may contain proprietary, private and confidential 
> information. The information transmitted is intended only for the person(s) 
> or entities to which it is addressed. Any review, retransmission, 
> dissemination or other use of, or taking of any action in reliance upon, this 
> information by persons or entities other than the intended recipient is 
> prohibited and may be illegal. If you received this in error, please contact 
> the sender and delete the message from your system.
> 
> Mu Sigma takes all reasonable steps to ensure that its electronic 
> communications are free from viruses. However, given Internet accessibility, 
> the Company cannot accept liability for any virus introduced by this e-mail 
> or any attachment and you are advised to use up-to-date virus checking 
> software.
> 
> This email message may contain proprietary, private and confidential 
> information. The information transmitted is intended only for the person(s) 
> or entities to which it is addressed. Any review, retransmission, 
> dissemination or other use of, or taking of any action in reliance upon, this 
> information by persons or entities other than the intended recipient is 
> prohibited and may be illegal. If you received this in error, please contact 
> the sender and delete the message from your system.
> 
> Mu Sigma takes all reasonable steps to ensure that its electronic 
> communications are free from viruses. However, given Internet accessibility, 
> the Company cannot accept liability for any virus introduced by this e-mail 
> or any attachment and you are advised to use up-to-date virus checking 
> software.

Cumprimentos,
Luís Ferreira





Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-02 Thread Ajay Garg
Hi All.

I have a 2*2 Network-Topology Replication setup, and I run my application
via DataStax-driver.

I frequently get the errors of type ::
*Cassandra timeout during write query at consistency SERIAL (3 replica were
required but only 0 acknowledged the write)*

I have already tried passing a "write-options with LOCAL_QUORUM
consistency-level" in all create/save statements, but I still get this
error.

Does something else need to be changed in /etc/cassandra/cassandra.yaml too?
Or may be some another place?

-- 
Regards,
Ajay


Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-02 Thread Eric Stevens
Serial consistency gets invoked at the protocol level when doing
lightweight transactions such as CAS operations.  If you're expecting that
your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
there aren't enough nodes available to satisfy serial consistency.

See
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html

On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg  wrote:

> Hi All.
>
> I have a 2*2 Network-Topology Replication setup, and I run my application
> via DataStax-driver.
>
> I frequently get the errors of type ::
> *Cassandra timeout during write query at consistency SERIAL (3 replica
> were required but only 0 acknowledged the write)*
>
> I have already tried passing a "write-options with LOCAL_QUORUM
> consistency-level" in all create/save statements, but I still get this
> error.
>
> Does something else need to be changed in /etc/cassandra/cassandra.yaml
> too?
> Or may be some another place?
>
>
> --
> Regards,
> Ajay
>


Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-02 Thread Ajay Garg
Hi Eric,

I am sorry, but I don't understand.

If there had been some issue in the configuration, then the
consistency-issue would be seen everytime (I guess).
As of now, the error is seen sometimes (probably 30% of times).

On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens  wrote:

> Serial consistency gets invoked at the protocol level when doing
> lightweight transactions such as CAS operations.  If you're expecting that
> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
> there aren't enough nodes available to satisfy serial consistency.
>
> See
> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
>
> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg  wrote:
>
>> Hi All.
>>
>> I have a 2*2 Network-Topology Replication setup, and I run my application
>> via DataStax-driver.
>>
>> I frequently get the errors of type ::
>> *Cassandra timeout during write query at consistency SERIAL (3 replica
>> were required but only 0 acknowledged the write)*
>>
>> I have already tried passing a "write-options with LOCAL_QUORUM
>> consistency-level" in all create/save statements, but I still get this
>> error.
>>
>> Does something else need to be changed in /etc/cassandra/cassandra.yaml
>> too?
>> Or may be some another place?
>>
>>
>> --
>> Regards,
>> Ajay
>>
>


-- 
Regards,
Ajay


Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-03 Thread Bryan Cheng
What Eric means is that SERIAL consistency is a special type of consistency
that is only invoked for a subset of operations: those that use
CAS/lightweight transactions, for example "IF NOT EXISTS" queries.

The differences between CAS operations and standard operations are
significant and there are large repercussions for tunable consistency. The
amount of time such an operation takes is greatly increased as well; you
may need to increase your internal node-to-node timeouts .

On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg  wrote:

> Hi Eric,
>
> I am sorry, but I don't understand.
>
> If there had been some issue in the configuration, then the
> consistency-issue would be seen everytime (I guess).
> As of now, the error is seen sometimes (probably 30% of times).
>
> On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens  wrote:
>
>> Serial consistency gets invoked at the protocol level when doing
>> lightweight transactions such as CAS operations.  If you're expecting that
>> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
>> there aren't enough nodes available to satisfy serial consistency.
>>
>> See
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
>>
>> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg  wrote:
>>
>>> Hi All.
>>>
>>> I have a 2*2 Network-Topology Replication setup, and I run my
>>> application via DataStax-driver.
>>>
>>> I frequently get the errors of type ::
>>> *Cassandra timeout during write query at consistency SERIAL (3 replica
>>> were required but only 0 acknowledged the write)*
>>>
>>> I have already tried passing a "write-options with LOCAL_QUORUM
>>> consistency-level" in all create/save statements, but I still get this
>>> error.
>>>
>>> Does something else need to be changed in /etc/cassandra/cassandra.yaml
>>> too?
>>> Or may be some another place?
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>
>
> --
> Regards,
> Ajay
>


Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-03 Thread Ajay Garg
Hmm... ok.

Ideally, we require ::

a)
The intra-DC-node-syncing takes place at the statement/query level.

b)
The inter-DC-node-syncing takes place at cassandra level.


That way, we don't spend too much delay at the statement/query level.


For the so-called CAS/lightweight transactions, the above are impossible
then?

On Wed, Nov 4, 2015 at 5:58 AM, Bryan Cheng  wrote:

> What Eric means is that SERIAL consistency is a special type of
> consistency that is only invoked for a subset of operations: those that use
> CAS/lightweight transactions, for example "IF NOT EXISTS" queries.
>
> The differences between CAS operations and standard operations are
> significant and there are large repercussions for tunable consistency. The
> amount of time such an operation takes is greatly increased as well; you
> may need to increase your internal node-to-node timeouts .
>
> On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg  wrote:
>
>> Hi Eric,
>>
>> I am sorry, but I don't understand.
>>
>> If there had been some issue in the configuration, then the
>> consistency-issue would be seen everytime (I guess).
>> As of now, the error is seen sometimes (probably 30% of times).
>>
>> On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens  wrote:
>>
>>> Serial consistency gets invoked at the protocol level when doing
>>> lightweight transactions such as CAS operations.  If you're expecting that
>>> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
>>> there aren't enough nodes available to satisfy serial consistency.
>>>
>>> See
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
>>>
>>> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg  wrote:
>>>
 Hi All.

 I have a 2*2 Network-Topology Replication setup, and I run my
 application via DataStax-driver.

 I frequently get the errors of type ::
 *Cassandra timeout during write query at consistency SERIAL (3 replica
 were required but only 0 acknowledged the write)*

 I have already tried passing a "write-options with LOCAL_QUORUM
 consistency-level" in all create/save statements, but I still get this
 error.

 Does something else need to be changed in /etc/cassandra/cassandra.yaml
 too?
 Or may be some another place?


 --
 Regards,
 Ajay

>>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay


Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-04 Thread Ajay Garg
Hi All.

I think we got the root-cause.

One of the fields in one of the class was marked with "@Version"
annotation, which was causing the Cassandra-Java-Driver to insert "If Not
Exists" in the insert query, thus invoking SERIAL consistency-level.

We removed the annotation (didn't really need that), and we have not
observed the error since about an hour or so.


Thanks Eric and Bryan for the help !!!


Thanks and Regards,
Ajay

On Wed, Nov 4, 2015 at 8:51 AM, Ajay Garg  wrote:

> Hmm... ok.
>
> Ideally, we require ::
>
> a)
> The intra-DC-node-syncing takes place at the statement/query level.
>
> b)
> The inter-DC-node-syncing takes place at cassandra level.
>
>
> That way, we don't spend too much delay at the statement/query level.
>
>
> For the so-called CAS/lightweight transactions, the above are impossible
> then?
>
> On Wed, Nov 4, 2015 at 5:58 AM, Bryan Cheng  wrote:
>
>> What Eric means is that SERIAL consistency is a special type of
>> consistency that is only invoked for a subset of operations: those that use
>> CAS/lightweight transactions, for example "IF NOT EXISTS" queries.
>>
>> The differences between CAS operations and standard operations are
>> significant and there are large repercussions for tunable consistency. The
>> amount of time such an operation takes is greatly increased as well; you
>> may need to increase your internal node-to-node timeouts .
>>
>> On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg  wrote:
>>
>>> Hi Eric,
>>>
>>> I am sorry, but I don't understand.
>>>
>>> If there had been some issue in the configuration, then the
>>> consistency-issue would be seen everytime (I guess).
>>> As of now, the error is seen sometimes (probably 30% of times).
>>>
>>> On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens  wrote:
>>>
 Serial consistency gets invoked at the protocol level when doing
 lightweight transactions such as CAS operations.  If you're expecting that
 your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
 there aren't enough nodes available to satisfy serial consistency.

 See
 http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html

 On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg 
 wrote:

> Hi All.
>
> I have a 2*2 Network-Topology Replication setup, and I run my
> application via DataStax-driver.
>
> I frequently get the errors of type ::
> *Cassandra timeout during write query at consistency SERIAL (3 replica
> were required but only 0 acknowledged the write)*
>
> I have already tried passing a "write-options with LOCAL_QUORUM
> consistency-level" in all create/save statements, but I still get this
> error.
>
> Does something else need to be changed in
> /etc/cassandra/cassandra.yaml too?
> Or may be some another place?
>
>
> --
> Regards,
> Ajay
>

>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay


Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-04 Thread Eric Stevens
Glad you got it figured out, but I'm confused about the @Version
annotation.  The DataStax Java Driver just handles statements, as far as I
know it's never going to modify statement text.  It sounds like you're
using an entity mapping framework on top of the java driver, which uses
@Version for optimistic locking, and that upgraded the generated statement
to a CAS operation.

On Wed, Nov 4, 2015 at 1:20 AM Ajay Garg  wrote:

> Hi All.
>
> I think we got the root-cause.
>
> One of the fields in one of the class was marked with "@Version"
> annotation, which was causing the Cassandra-Java-Driver to insert "If Not
> Exists" in the insert query, thus invoking SERIAL consistency-level.
>
> We removed the annotation (didn't really need that), and we have not
> observed the error since about an hour or so.
>
>
> Thanks Eric and Bryan for the help !!!
>
>
> Thanks and Regards,
> Ajay
>
> On Wed, Nov 4, 2015 at 8:51 AM, Ajay Garg  wrote:
>
>> Hmm... ok.
>>
>> Ideally, we require ::
>>
>> a)
>> The intra-DC-node-syncing takes place at the statement/query level.
>>
>> b)
>> The inter-DC-node-syncing takes place at cassandra level.
>>
>>
>> That way, we don't spend too much delay at the statement/query level.
>>
>>
>> For the so-called CAS/lightweight transactions, the above are impossible
>> then?
>>
>> On Wed, Nov 4, 2015 at 5:58 AM, Bryan Cheng 
>> wrote:
>>
>>> What Eric means is that SERIAL consistency is a special type of
>>> consistency that is only invoked for a subset of operations: those that use
>>> CAS/lightweight transactions, for example "IF NOT EXISTS" queries.
>>>
>>> The differences between CAS operations and standard operations are
>>> significant and there are large repercussions for tunable consistency. The
>>> amount of time such an operation takes is greatly increased as well; you
>>> may need to increase your internal node-to-node timeouts .
>>>
>>> On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg 
>>> wrote:
>>>
 Hi Eric,

 I am sorry, but I don't understand.

 If there had been some issue in the configuration, then the
 consistency-issue would be seen everytime (I guess).
 As of now, the error is seen sometimes (probably 30% of times).

 On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens 
 wrote:

> Serial consistency gets invoked at the protocol level when doing
> lightweight transactions such as CAS operations.  If you're expecting that
> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
> there aren't enough nodes available to satisfy serial consistency.
>
> See
> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
>
> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg 
> wrote:
>
>> Hi All.
>>
>> I have a 2*2 Network-Topology Replication setup, and I run my
>> application via DataStax-driver.
>>
>> I frequently get the errors of type ::
>> *Cassandra timeout during write query at consistency SERIAL (3
>> replica were required but only 0 acknowledged the write)*
>>
>> I have already tried passing a "write-options with LOCAL_QUORUM
>> consistency-level" in all create/save statements, but I still get this
>> error.
>>
>> Does something else need to be changed in
>> /etc/cassandra/cassandra.yaml too?
>> Or may be some another place?
>>
>>
>> --
>> Regards,
>> Ajay
>>
>


 --
 Regards,
 Ajay

>>>
>>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>