Re: High Compactions Pending

2014-09-22 Thread Chris Lohfink
Whats the output of 'nodetool compactionstats'?   Is concurrent_compactors not 
set in your cassandra.yaml?  Any Exception or Error 's in the system.log or 
output.log?

---
Chris Lohfink

On Sep 22, 2014, at 9:50 PM, Arun  wrote:

> Its constant since 4 hours. Remaining nodes have around 10 compactions. We 
> have 4 column families. 
> 
> 
> On Sep 22, 2014, at 19:39, Chris Lohfink  wrote:
> 
>> 35 isn't that high really in some scenarios (ie, theres a lot of column 
>> families), is it continuing to climb or does it drop down shortly after?
>> 
>> ---
>> Chris Lohfink
>> 
>> On Sep 22, 2014, at 7:57 PM, arun sirimalla  wrote:
>> 
>>> I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE running on it. I 
>>> notice high compaction pending on one of the node around 35.
>>> Compaction throughput set to 64 MB and flush writes to 4. Any suggestion is 
>>>  much appreciated.
>>> 
>>> -- 
>>> Arun 
>>> Senior Hadoop Engineer
>>> Cloudwick
>>> 
>>> Champion of Big Data
>>> http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html
>> 



Re: High Compactions Pending

2014-09-22 Thread Arun
Its constant since 4 hours. Remaining nodes have around 10 compactions. We have 
4 column families. 


> On Sep 22, 2014, at 19:39, Chris Lohfink  wrote:
> 
> 35 isn't that high really in some scenarios (ie, theres a lot of column 
> families), is it continuing to climb or does it drop down shortly after?
> 
> ---
> Chris Lohfink
> 
>> On Sep 22, 2014, at 7:57 PM, arun sirimalla  wrote:
>> 
>> I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE running on it. I 
>> notice high compaction pending on one of the node around 35.
>> Compaction throughput set to 64 MB and flush writes to 4. Any suggestion is  
>> much appreciated.
>> 
>> -- 
>> Arun 
>> Senior Hadoop Engineer
>> Cloudwick
>> 
>> Champion of Big Data
>> http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html
> 


Re: High Compactions Pending

2014-09-22 Thread Chris Lohfink
35 isn't that high really in some scenarios (ie, theres a lot of column 
families), is it continuing to climb or does it drop down shortly after?

---
Chris Lohfink

On Sep 22, 2014, at 7:57 PM, arun sirimalla  wrote:

> I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE running on it. I 
> notice high compaction pending on one of the node around 35.
> Compaction throughput set to 64 MB and flush writes to 4. Any suggestion is  
> much appreciated.
> 
> -- 
> Arun 
> Senior Hadoop Engineer
> Cloudwick
> 
> Champion of Big Data
> http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html



How to avoid column family duplication (when query requires multiple restrictions)

2014-09-22 Thread Gianluca Borello
Hi,

I have a column family storing very large blobs that I would not like to
duplicate, if possible.
Here's a simplified version:

CREATE TABLE timeline (
   key text,
   a int,
   b int,
   value blob,
   PRIMARY KEY (key, a, b)
);

On this, I run exactly two types of query. Both of them must have a query
range on 'a', and just one must have 'b' restricted.

First query:

cqlsh> SELECT * FROM timeline where key = 'event' and a >= 2 and a <= 3;

This one runs fine.

Second query:

cqlsh> SELECT * FROM timeline where key = 'event' and a >= 2 and a <= 3 and
b = 12;
code=2200 [Invalid query] message="PRIMARY KEY column "b" cannot be
restricted (preceding column "ColumnDefinition{name=a,
type=org.apache.cassandra.db.marshal.Int32Type, kind=CLUSTERING_COLUMN,
componentIndex=0, indexName=null, indexType=null}" is either not restricted
or by a non-EQ relation)"

This fails. Even if I create an index:

CREATE INDEX timeline_b ON timeline (b);
cqlsh> SELECT * FROM timeline where key = 'event' and a >= 2 and a <= 3 and
b = 12;
code=2200 [Invalid query] message="Cannot execute this query as it might
involve data filtering and thus may have unpredictable performance. If you
want to execute this query despite the performance unpredictability, use
ALLOW FILTERING"

I solved this problem by duplicating the column family (in "timeline_by_a"
and "timeline_by_b" where a and b are in opposite order), but I'm wondering
if there's a better solution, as this tends to grow pretty big.

In particular, from the little understanding that I have of the Cassandra
internals, it seems like even the second query should be fairly efficient
since the clustering columns are stored in order on disk, thus I don't
understand the ALLOW FILTERING requirement.

Another alternative that I'm thinking is just keeping another column family
that will serve as an "index" and I'll manually manage it in the
application.

Thanks.


High Compactions Pending

2014-09-22 Thread arun sirimalla
I have a 6 (i2.2xlarge) node cluster on AWS with 4.5 DSE running on it. I
notice high compaction pending on one of the node around 35.
Compaction throughput set to 64 MB and flush writes to 4. Any suggestion is
 much appreciated.

-- 
Arun
Senior Hadoop Engineer
Cloudwick

Champion of Big Data
http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html


Re: CPU consumption of Cassandra

2014-09-22 Thread Jake Luciani
Eric,

We have a new stress tool to help you share your schema for wider bench
marking.  see
http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
If you wouldn't mind creating a yaml for your schema I would be happy to
take a look.

-Jake




On Mon, Sep 22, 2014 at 12:39 PM, Leleu Eric 
wrote:

>  Hi,
>
>
>
>
>
> I’m currently testing Cassandra 2.0.9  (and since the last week 2.1) under
> some read heavy load…
>
>
>
> I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM
> and 8 Cores.
>
> I have around 93GB of data per node (one Disk of 300GB with SAS interface
> and a Rotational Speed of 10500)
>
>
>
> I have 300 active client threads and they request the C* nodes with a
> Consitency level set to ONE (I’m using the CQL datastax driver).
>
>
>
> During my tests I saw  a lot of CPU consumption (70% user / 6%sys / 4%
> iowait / 20%idle).
>
> C* nodes respond to around 5000 op/s (sometime up to 6000op/s)
>
>
>
> I try to profile a node and at the first look, 60% of the CPU is passed in
> the “sun.nio.ch” package. (SelectorImpl.select or Channel.read)
>
>
>
> I know that Benchmark results are highly dependent of the Dataset and use
> cases, but according to my point of view this CPU consumption is normal
> according to the load.
>
> Someone can confirm that point ?
>
> According to my Hardware configuration, can I expect to have more than
> 6000 read op/s ?
>
>
>
>
>
> Regards,
>
> Eric
>
>
>
>
>
>
>
>
>
> --
>
> Ce message et les pièces jointes sont confidentiels et réservés à l'usage
> exclusif de ses destinataires. Il peut également être protégé par le secret
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir
> immédiatement l'expéditeur et de le détruire. L'intégrité du message ne
> pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra
> être recherchée quant au contenu de ce message. Bien que les meilleurs
> efforts soient faits pour maintenir cette transmission exempte de tout
> virus, l'expéditeur ne donne aucune garantie à cet égard et sa
> responsabilité ne saurait être recherchée pour tout dommage résultant d'un
> virus transmis.
>
> This e-mail and the documents attached are confidential and intended
> solely for the addressee; it may also be privileged. If you receive this
> e-mail in error, please notify the sender immediately and destroy it. As
> its integrity cannot be secured on the Internet, the Worldline liability
> cannot be triggered for the message content. Although the sender endeavours
> to maintain a computer virus-free network, the sender does not warrant that
> this transmission is virus-free and will not be liable for any damages
> resulting from any virus transmitted.
>



-- 
http://twitter.com/tjake


Re: cassandra 2.1.0 unable to use cqlsh

2014-09-22 Thread Tim Dunphy
Hi Adam,

 Ok thanks again for the tips there! So I fell back to the stock
configuration of cassandra 2.1.0 and setup my environment variables... and
I was able to get cqlsh to work!

[root@beta-new:~] #cqlsh

Connected to mydomain Cluster at beta-new.mydomain.com:9042.

[cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.2.0 | Native protocol v3]

Use HELP for help.

cqlsh>


Thanks!

Tim

On Mon, Sep 22, 2014 at 11:05 AM, Adam Holmberg 
wrote:

> cqlsh in Cassandra 2.1.0 uses the DataStax python driver. The
> "cassandra.metadata" module is provided by this package. By default it uses
> the driver from an archive included in the Cassandra distribution
> (.../lib/cassandra-driver-internal-only-2.1.0.zip).
>
> See /usr/local/apache-cassandra-2.1.0/bin/cqlsh for how everything gets
> setup -- it's possible your wrapper or environment are not playing well
> with that.
>
> Also note that "9160" will not apply anymore since this driver uses the
> native protocol (9042).
>
> Adam
>
> On Sun, Sep 21, 2014 at 7:53 PM, Tim Dunphy  wrote:
>
>> Hey all,
>>
>>  I've just upgraded to the latest cassandra on my site with version 2.1.0.
>>
>> But now when I run the command I am getting the following error:
>>
>> [root@beta-new:/usr/local] #cqlsh
>> Traceback (most recent call last):
>>   File "/etc/alternatives/cassandrahome/bin/cqlsh-old", line 113, in
>> 
>> from cqlshlib import cqlhandling, cql3handling, pylexotron
>>   File
>> "/usr/local/apache-cassandra-2.1.0/bin/../pylib/cqlshlib/cql3handling.py",
>> line 18, in 
>> from cassandra.metadata import maybe_escape_name
>> ImportError: No module named cassandra.metadata
>>
>> Just to clarify some of the above output, all my 'cqlsh' command does is
>> automatically fill in some values I'd like to use as defaults and then
>> invoke the real command which I've named 'cqlsh-old'. Just a quirk of my
>> setup that's always allowed cqlsh to be invoked without issue across
>> multiple upgrades.
>>
>> [root@beta-new:/usr/local] #cat /etc/alternatives/cassandrahome/bin/cqlsh
>> #!/bin/sh
>> /etc/alternatives/cassandrahome/bin/cqlsh-old beta-new.mydomain.com 9160
>> --cqlversion="3.0.0"
>>
>> I'd appreciate any advice you  could spare on how to get around this
>> error!
>>
>> Thanks
>> Tim
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: CPU consumption of Cassandra

2014-09-22 Thread Chris Lohfink
Its going to depend a lot on your data model but 5-6k is on the low end of what 
I would expect.  N=RF=2 is not really something I would recommend.  That said 
93GB is not much data so the bottleneck may exist more in your data model, 
queries, or client.

What profiler are you using?  The cpu on the select/read is marked as RUNNABLE 
but its really more of a wait state that may throw some profilers off, it may 
be a red haring.

---
Chris Lohfink

On Sep 22, 2014, at 11:39 AM, Leleu Eric  wrote:

> Hi,
>  
>  
> I’m currently testing Cassandra 2.0.9  (and since the last week 2.1) under 
> some read heavy load…
>  
> I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM and 
> 8 Cores.
> I have around 93GB of data per node (one Disk of 300GB with SAS interface and 
> a Rotational Speed of 10500)
>  
> I have 300 active client threads and they request the C* nodes with a 
> Consitency level set to ONE (I’m using the CQL datastax driver).
>  
> During my tests I saw  a lot of CPU consumption (70% user / 6%sys / 4% iowait 
> / 20%idle).
> C* nodes respond to around 5000 op/s (sometime up to 6000op/s)
>  
> I try to profile a node and at the first look, 60% of the CPU is passed in 
> the “sun.nio.ch” package. (SelectorImpl.select or Channel.read)
>  
> I know that Benchmark results are highly dependent of the Dataset and use 
> cases, but according to my point of view this CPU consumption is normal 
> according to the load.
> Someone can confirm that point ?
> According to my Hardware configuration, can I expect to have more than 6000 
> read op/s ?
>  
>  
> Regards,
> Eric
>  
>  
>  
>  
> 
> 
> Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
> exclusif de ses destinataires. Il peut également être protégé par le secret 
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
> immédiatement l'expéditeur et de le détruire. L'intégrité du message ne 
> pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra 
> être recherchée quant au contenu de ce message. Bien que les meilleurs 
> efforts soient faits pour maintenir cette transmission exempte de tout virus, 
> l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
> saurait être recherchée pour tout dommage résultant d'un virus transmis.
> 
> This e-mail and the documents attached are confidential and intended solely 
> for the addressee; it may also be privileged. If you receive this e-mail in 
> error, please notify the sender immediately and destroy it. As its integrity 
> cannot be secured on the Internet, the Worldline liability cannot be 
> triggered for the message content. Although the sender endeavours to maintain 
> a computer virus-free network, the sender does not warrant that this 
> transmission is virus-free and will not be liable for any damages resulting 
> from any virus transmitted.



Re: Help with approach to remove RDBMS schema from code to move to C*?

2014-09-22 Thread Les Hartzman
Thanks everyone for the responses. One thing I'd forgotten about was the
need to model the CFs with regard to the kind of queries that are needed.
Fortunately this is primarily a write-once/read-many type of application,
so deletions are not currently a concern, but worth keeping in mind for the
future.

Les

On Sat, Sep 20, 2014 at 6:45 AM, Brice Dutheil 
wrote:

> I’m fairly new to cassandra, but here’s my input.
>
> Think of your column families as a projection of how the application needs
> them. Thinking with CQRS in mind helps. So with more CFs that may require
> more space, as data may be written differently in different column families
> for different usage. For that reason you have to think about the disk
> usage, considering the growth of the data, the space needed for cassandra
> to perform compaction and other stuff.
>
> Also on the modeling front, pay attention to growing wide rows, i.e. when
> updating or deleting column in such row may adds too many tombstones (
> tombstone_failure_threshold default is 100 000), which may cause
> cassandra to abort queries on such rows (before compaction) because it have
> to load this partition in memory to actually output the actual data.
> This is especially important for time series. We had to rework our model
> to bucket by period, to avoid such cases. However this will require some
> work on the business code to query such a column family.
>
> Avoid secondary indexes, which somehow relate to modeling per usage hence
> removing their need.
>
> Cheers,
> — Brice
>
> On Sat, Sep 20, 2014 at 6:55 AM, Jack Krupansky 
> wrote:
>
>   Start by asking how you intend to query the data. That should drive the
>> data model.
>>
>> Is there existing app client code or an app layer that is already using
>> the current schema, or are you intending to rewrite that as well.
>>
>> FWIW, you could place the numeric columns in a numeric map collection,
>> and the string columns in a string map collection, but... it’s best to
>> first step back and look at the big picture of what the data actually looks
>> like as well as how you want to query it.
>>
>> -- Jack Krupansky
>>
>>  *From:* Les Hartzman 
>> *Sent:* Friday, September 19, 2014 5:46 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Help with approach to remove RDBMS schema from code to move
>> to C*?
>>
>>  My company is using an RDBMS for storing time-series data. This
>> application was developed before Cassandra and NoSQL. I'd like to move to
>> C*, but ...
>>
>> The application supports data coming from multiple models of devices.
>> Because there is enough variability in the data, the main table to hold the
>> device data only has some core columns defined. The other columns are
>> non-specific; a set of columns for numeric and a set for character. So for
>> these non-specific columns, their use is defined in the code. The use of
>> column 'numeric_1' might hold a millisecond time for one device and a fault
>> code for another device. This appears to have been done to keep from
>> modifying the schema whenever a new device was introduced. And they rolled
>> their own db interface to support this mess.
>>
>> Now, we could just use C* like an RDBMS - defining CFs to mimic the
>> tables. But this just pushes a bad design from one platform to another.
>>
>> Clearly there needs to be a code re-write. But what suggestions does
>> anyone have on how to make this shift to C*?
>>
>> Would you just layout all of the columns represented by the different
>> devices, naming them as they are used, and having jagged rows? Or is there
>> some other way to approach this?
>>
>> Of course, the data miners already have scripts/methods for accessing the
>> data from the RDBMS now in the user-unfriendly form it's in now. This would
>> have to be addressed as well, but until I know how to store it, mining it
>> gets ahead of things.
>>
>> Thanks.
>>
>> Les
>>
>>
> ​
>


Re: Named Parameters in Prepared Statement

2014-09-22 Thread Alex Popescu
Yes, you can bind parameters by name:

```
INSERT INTO songs (id, title, album, artist) VALUES (:id, :title, :album,
:artist)
```

All DataStax drivers for Cassandra support this feature. In Java it looks
like

// prepare only once
PreparedStatememt pstmt = session.prepare("INSERT INTO songs (id, title,
album, artist) VALUES (:id, :title, :album, :artist)")

// later
BoundStatement stmt = new BoundStatement(pstmt);
stmt.setLong("id", 1234);
stmt.setString("title", "Example title");

On Mon, Sep 22, 2014 at 4:41 AM, Timmy Turner  wrote:

> Looking through the CQL 3.1 grammar in Cassandra, I found a "':' ident"
> alternative in the "value" rule (line 961).
>
> Is this for binding named parameters in prepared statements? Is this
> currently supported by any of the drivers or in Cassandra (2.1) itself?
>
> Looking at the docs and the current Java driver it doesn't seem that way.
>



-- 

:- a)


Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru


CPU consumption of Cassandra

2014-09-22 Thread Leleu Eric
Hi,


I'm currently testing Cassandra 2.0.9  (and since the last week 2.1) under some 
read heavy load...

I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM and 8 
Cores.
I have around 93GB of data per node (one Disk of 300GB with SAS interface and a 
Rotational Speed of 10500)

I have 300 active client threads and they request the C* nodes with a 
Consitency level set to ONE (I'm using the CQL datastax driver).

During my tests I saw  a lot of CPU consumption (70% user / 6%sys / 4% iowait / 
20%idle).
C* nodes respond to around 5000 op/s (sometime up to 6000op/s)

I try to profile a node and at the first look, 60% of the CPU is passed in the 
"sun.nio.ch" package. (SelectorImpl.select or Channel.read)

I know that Benchmark results are highly dependent of the Dataset and use 
cases, but according to my point of view this CPU consumption is normal 
according to the load.
Someone can confirm that point ?
According to my Hardware configuration, can I expect to have more than 6000 
read op/s ?


Regards,
Eric







Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage 
exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant 
?tre assur?e sur Internet, la responsabilit? de Worldline ne pourra ?tre 
recherch?e quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne 
saurait ?tre recherch?e pour tout dommage r?sultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.


Re: cassandra 2.1.0 unable to use cqlsh

2014-09-22 Thread Tim Dunphy
>
> cqlsh in Cassandra 2.1.0 uses the DataStax python driver. The
> "cassandra.metadata" module is provided by this package. By default it uses
> the driver from an archive included in the Cassandra distribution
> (.../lib/cassandra-driver-internal-only-2.1.0.zip).


Ok that's really good to know.

>
> See /usr/local/apache-cassandra-2.1.0/bin/cqlsh for how everything gets
> setup -- it's possible your wrapper or environment are not playing well
> with that.
> Also note that "9160" will not apply anymore since this driver uses the
> native protocol (9042).



OK yes very possible. I'll try working with what's originally there and if
need be make any alterations I'll need to.

Thanks!
Tim

On Mon, Sep 22, 2014 at 11:05 AM, Adam Holmberg 
wrote:

> cqlsh in Cassandra 2.1.0 uses the DataStax python driver. The
> "cassandra.metadata" module is provided by this package. By default it uses
> the driver from an archive included in the Cassandra distribution
> (.../lib/cassandra-driver-internal-only-2.1.0.zip).
>
> See /usr/local/apache-cassandra-2.1.0/bin/cqlsh for how everything gets
> setup -- it's possible your wrapper or environment are not playing well
> with that.
>
> Also note that "9160" will not apply anymore since this driver uses the
> native protocol (9042).
>
> Adam
>
> On Sun, Sep 21, 2014 at 7:53 PM, Tim Dunphy  wrote:
>
>> Hey all,
>>
>>  I've just upgraded to the latest cassandra on my site with version 2.1.0.
>>
>> But now when I run the command I am getting the following error:
>>
>> [root@beta-new:/usr/local] #cqlsh
>> Traceback (most recent call last):
>>   File "/etc/alternatives/cassandrahome/bin/cqlsh-old", line 113, in
>> 
>> from cqlshlib import cqlhandling, cql3handling, pylexotron
>>   File
>> "/usr/local/apache-cassandra-2.1.0/bin/../pylib/cqlshlib/cql3handling.py",
>> line 18, in 
>> from cassandra.metadata import maybe_escape_name
>> ImportError: No module named cassandra.metadata
>>
>> Just to clarify some of the above output, all my 'cqlsh' command does is
>> automatically fill in some values I'd like to use as defaults and then
>> invoke the real command which I've named 'cqlsh-old'. Just a quirk of my
>> setup that's always allowed cqlsh to be invoked without issue across
>> multiple upgrades.
>>
>> [root@beta-new:/usr/local] #cat /etc/alternatives/cassandrahome/bin/cqlsh
>> #!/bin/sh
>> /etc/alternatives/cassandrahome/bin/cqlsh-old beta-new.mydomain.com 9160
>> --cqlversion="3.0.0"
>>
>> I'd appreciate any advice you  could spare on how to get around this
>> error!
>>
>> Thanks
>> Tim
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>


-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: cassandra 2.1.0 unable to use cqlsh

2014-09-22 Thread Adam Holmberg
cqlsh in Cassandra 2.1.0 uses the DataStax python driver. The
"cassandra.metadata" module is provided by this package. By default it uses
the driver from an archive included in the Cassandra distribution
(.../lib/cassandra-driver-internal-only-2.1.0.zip).

See /usr/local/apache-cassandra-2.1.0/bin/cqlsh for how everything gets
setup -- it's possible your wrapper or environment are not playing well
with that.

Also note that "9160" will not apply anymore since this driver uses the
native protocol (9042).

Adam

On Sun, Sep 21, 2014 at 7:53 PM, Tim Dunphy  wrote:

> Hey all,
>
>  I've just upgraded to the latest cassandra on my site with version 2.1.0.
>
> But now when I run the command I am getting the following error:
>
> [root@beta-new:/usr/local] #cqlsh
> Traceback (most recent call last):
>   File "/etc/alternatives/cassandrahome/bin/cqlsh-old", line 113, in
> 
> from cqlshlib import cqlhandling, cql3handling, pylexotron
>   File
> "/usr/local/apache-cassandra-2.1.0/bin/../pylib/cqlshlib/cql3handling.py",
> line 18, in 
> from cassandra.metadata import maybe_escape_name
> ImportError: No module named cassandra.metadata
>
> Just to clarify some of the above output, all my 'cqlsh' command does is
> automatically fill in some values I'd like to use as defaults and then
> invoke the real command which I've named 'cqlsh-old'. Just a quirk of my
> setup that's always allowed cqlsh to be invoked without issue across
> multiple upgrades.
>
> [root@beta-new:/usr/local] #cat /etc/alternatives/cassandrahome/bin/cqlsh
> #!/bin/sh
> /etc/alternatives/cassandrahome/bin/cqlsh-old beta-new.mydomain.com 9160
> --cqlversion="3.0.0"
>
> I'd appreciate any advice you  could spare on how to get around this error!
>
> Thanks
> Tim
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


Fwd: Casssandra cluster setup.

2014-09-22 Thread Muthu Kumar
> Hi All
>
>
>
> I  am trying to configure a Cassandra cluster with two nodes. I am new to
Cassandra.
>
>
>
> I am using datastax distribution of Cassandra ( windows). I have
installed the same in two nodes and configured it  works as a separate
instance but not as cluster.
>
>
>
> The key changes I made in Cassandra.yaml is as follows as suggested by
http://www.datastax.com/documentation/cassandra/1.2/cassandra/initialize/initializeSingleDS.html
>
>
>
> Configuration setting for 10.144.32.134
>
>
>
> num_tokens: 256
>
>   - seeds: "10.144.32.134,10.137.12.84"
>
> listen_address: 10.144.32.134
>
> endpoint_snitch: RackInferringSnitch
>
> rpc_address: 0.0.0.0
>
>
>
> Configuration setting for 10.137.12.84
>
>
>
> num_tokens: 256
>
>   - seeds: "10.144.32.134,10.137.12.84"
>
> listen_address: 10.137.12.84
>
> endpoint_snitch: RackInferringSnitch
>
> rpc_address: 0.0.0.0
>
>
>
> post this  configuration am able to start the services as usual and see
the status as up.
>
>
>
> Nodetool Status from  134 ( server)
>
>
>
> D:\Program Files\DataStax Community\apache-cassandra\bin>nodetool -h
localhost status
>
> Starting NodeTool
>
> Note: Ownership information does not include topology; for complete
information, specify a keyspace
>
> Datacenter: 144
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens  Owns   Host ID
  Rack
>
> UN  10.144.32.134  40.03 MB   256 100.0%
c791918a-8fec-4c5c-ab83-1a3525c51b70  32
>
>
>
> Nodetool Status from 84
>
>
>
> C:\Program Files\DataStax Community\apache-cassandra\bin>nodetool.bat
status
>
> Starting NodeTool
>
> Note: Ownership information does not include topology; for complete
information, specify a keyspace
>
> Datacenter: 137
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address   Load   Tokens  Owns   Host
ID   Rack
>
> UN  10.137.12.84  69.58 KB   2   100.0%
f842ea74-8eef-4c82-80d4-3e06e7a00deb  12
>
> C:\Program Files\DataStax Community\apache-cassandra\bin>
>
>
>
>
>
> Can you please suggest me how to fix this issue.
>
>
>
> Regards
>
> Muthukumar.S
>


Re: Cassandra Data Model design

2014-09-22 Thread James Briggs
Cassandra partitions data across the cluster based on PK,
thus is optimized for WHERE PK=...


You are doing table scans, the opposite of what a distributed
system is designed for.


However, some users find Solr helps with queries like yours.


To learn what C* is good at, read this:
http://planetcassandra.org/blog/getting-started-with-time-series-data-modeling/


Thanks, James Briggs. 
-- 
Cassandra/MySQL DBA. Available in San Jose area or remote.
cass_top: https://github.com/jamesbriggs/cassandra-top




 From: Check Peck 
To: user  
Sent: Wednesday, September 17, 2014 3:35 PM
Subject: Re: Cassandra Data Model design
 


It takes around more than 50 seconds to return back 500 records from cqlsh 
command not from the code so that's why I am saying it is pretty slow.



On Wed, Sep 17, 2014 at 3:17 PM, Hao Cheng  wrote:

How slow is slow? Regardless of the data model question, in my experience 500 
rows of relatively light content should be lightning fast. Looking at my 
performance results on a test cluster of 3x r3.large AWS instances, we reach an 
op rate on Cassandra's stress test of at least 1000 operations per second and 
on average 7500 operations for second over the stress test data set.
>
>
>More broadly, it seems like you would benefit from either deltas (only 
>retrieve new data) or something like paging (only retrieve currently relevant 
>data), although its really difficult to say without more information.
>
>
>On Wed, Sep 17, 2014 at 1:01 PM, Check Peck  wrote:
>
>I have recently started working with Cassandra. We have cassandra cluster 
>which is using DSE 4.0 version and has VNODES enabled. We have a tables like 
>this - 
>>
>>Below is my first table -
>>
>>CREATE TABLE customers (
>>  customer_id int PRIMARY KEY,
>>  last_modified_date timeuuid,
>>  customer_value text
>>)
>>
>>Read query pattern is like this on above table as of now since we need to get 
>>everything from above table and load it into our application memory every x 
>>minutes.
>>
>>select customer_id, customer_value from datakeyspace.customers;
>>
>>We have second table like this -
>>
>>CREATE TABLE client_data (
>>  client_name text PRIMARY KEY,
>>  client_id text,
>>  creation_date timestamp,
>>  is_valid int,
>>  last_modified_date timestamp
>>)
>>
>>Right now in the above table, we have 500 records and all those records has 
>>"is_valid" column value set as 1. And the read query pattern is like this on 
>>above table as of now since we need to get everything from above table and 
>>load it into our application memory every x minutes so the below query will 
>>return me all 500 records since everything has is_valid set to 1.
>>
>>select client_name, client_id from  datakeyspace.client_data where 
>> is_valid=1;
>>
>>Since our cluster is VNODES enabled so my above query pattern is not 
>>efficient at all and it is taking lot of time to get the data from Cassandra. 
>>We are reading from these table with consistency level QUORUM.
>>
>>Is there any possibility of improving our data model?
>>
>>Any suggestions will be greatly appreciated.
>>
>

Named Parameters in Prepared Statement

2014-09-22 Thread Timmy Turner
Looking through the CQL 3.1 grammar in Cassandra, I found a "':' ident"
alternative in the "value" rule (line 961).

Is this for binding named parameters in prepared statements? Is this
currently supported by any of the drivers or in Cassandra (2.1) itself?

Looking at the docs and the current Java driver it doesn't seem that way.