RE: Accessing Cassandra Data from Excel / Tableau / R

2015-02-17 Thread Ashic Mahtab
Thanks, Peter. 
Any pointers on getting that working? Also, don't think Tableau supports JDBC 
(it does support ODBC). I've been able to start beeline (i.e. thriftserver) and 
connect Tableau to it via the Simba ODBC connector for Spark. But no idea how 
to let "it" (hive or spark) know where Cassandra is and how to access it.
Any help is appreciated.
-Ashic.

Date: Tue, 17 Feb 2015 19:49:42 -0500
Subject: Re: Accessing Cassandra Data from Excel / Tableau / R
From: wool...@gmail.com
To: user@cassandra.apache.org


Hive can connect to Cassandra, so that means you can point Tableau to hive 
using JDBC.

As long as you map Hive to cassandra, you should be able to query data just 
like regular hive

On Tue, Feb 17, 2015 at 7:29 PM, Ashic Mahtab  wrote:



What's a good way to load some cassandra data (perhaps result of a cql query) 
into Excel / Tableau? I see DSE has support, but that's not always an option. 
Simba do an odbc connectory that currently doesn't support UDTs + collections 
properly (and it's expensive). Is there a way to use Spark to provide a gateway 
to Cassandra data to the traditional BI tools? Perhaps with the ThriftServer? 
RCassandra also seems stuck in the distant pass...is there any new news on that 
front?
The reason I ask is some non-programmers simply want to look at some data by 
themselves, with some cql. I'd like to be able to give them that without data 
exports.
Thanks,Ashic. 

  

Re: Adding new node to cluster

2015-02-17 Thread Robert Coli
On Tue, Feb 17, 2015 at 2:25 PM,  wrote:

>  SimpleSnitch is not rack aware. You would want to choose seed nodes and
> then not change them. Seed nodes apparently don’t bootstrap.
>

No one seems to know what a "seed node" actually *is*, but "seed nodes" can
in fact bootstrap. They just have to temporarily forget to tell themselves
that they are a seed node while bootstrapping, and then other nodes will
still gossip to it as a seed once it comes up, even though it doesn't
consider itself a seed.

https://issues.apache.org/jira/browse/CASSANDRA-5836?focusedCommentId=13727032&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13727032
"

Replacing a seed node is a very common operation, and this best practice is
confusing/poorly documented. There are regular contacts to
#cassandra/cassandra-user@ where people ask how to replace a seed node, and
are confused by the answer. The workaround also means that, if you do not
restart your node after bootstrapping it (and changing the conf file back
to indicate to itself that it is a seed) the node runs until next restart
without any understanding that it is a seed node.

Being a seed node appears to mean two things :

1) I have myself as an entry in my own seed list, so I know that I am a
seed.
2) Other nodes have me in their seed list, so they consider me a seed.

The current code checks for 1) and refuses to bootstrap. The workaround is
to remove the 1) state temporarily. But if it is unsafe to bootstrap a seed
node because of either 1) or 2), the workaround is unsafe.

Can you explicate the special cases here? I sincerely would like to
understand why the code tries to prevent "a seed" from bootstrapping when
one can clearly, and apparently safely, bootstrap "a seed".

"


Unfortunately, there has been no answer.


=Rob


Re: Accessing Cassandra Data from Excel / Tableau / R

2015-02-17 Thread Peter Lin
Hive can connect to Cassandra, so that means you can point Tableau to hive
using JDBC.

As long as you map Hive to cassandra, you should be able to query data just
like regular hive

On Tue, Feb 17, 2015 at 7:29 PM, Ashic Mahtab  wrote:

> What's a good way to load some cassandra data (perhaps result of a cql
> query) into Excel / Tableau? I see DSE has support, but that's not always
> an option. Simba do an odbc connectory that currently doesn't support UDTs
> + collections properly (and it's expensive). Is there a way to use Spark to
> provide a gateway to Cassandra data to the traditional BI tools? Perhaps
> with the ThriftServer? RCassandra also seems stuck in the distant pass...is
> there any new news on that front?
>
> The reason I ask is some non-programmers simply want to look at some data
> by themselves, with some cql. I'd like to be able to give them that without
> data exports.
>
> Thanks,
> Ashic.
>


Accessing Cassandra Data from Excel / Tableau / R

2015-02-17 Thread Ashic Mahtab
What's a good way to load some cassandra data (perhaps result of a cql query) 
into Excel / Tableau? I see DSE has support, but that's not always an option. 
Simba do an odbc connectory that currently doesn't support UDTs + collections 
properly (and it's expensive). Is there a way to use Spark to provide a gateway 
to Cassandra data to the traditional BI tools? Perhaps with the ThriftServer? 
RCassandra also seems stuck in the distant pass...is there any new news on that 
front?
The reason I ask is some non-programmers simply want to look at some data by 
themselves, with some cql. I'd like to be able to give them that without data 
exports.
Thanks,Ashic. 

Re: Adding new node to cluster

2015-02-17 Thread Eric Stevens
> Seed nodes apparently don’t bootstrap

That's right, if a node has itself in its own seeds list, it assumes it's a
foundational member of the cluster, and it will join immediately with no
bootstrap.

If you've done this by accident, you should do nodetool decommission on
that node, and when it's fully left the cluster, wipe its data directory,
edit the yaml and remove it from the seeds list.

On Tue, Feb 17, 2015 at 3:25 PM,  wrote:

>  SimpleSnitch is not rack aware. You would want to choose seed nodes and
> then not change them. Seed nodes apparently don’t bootstrap. All nodes need
> the same seeds in the yaml file. Here is more info:
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/initialize/initializeSingleDS.html
>
>
>
>
>
>
>
> Sean Durity – Cassandra Admin, Big Data Team
>
> To engage the team, create a request
> 
>
>
>
> *From:* Batranut Bogdan [mailto:batra...@yahoo.com]
> *Sent:* Tuesday, February 17, 2015 3:28 PM
>
> *To:* user@cassandra.apache.org; reynald.bourtembo...@esrf.fr
> *Subject:* Re: Adding new node to cluster
>
>
>
> Hello,
>
>
>
> I use SimpleSnitch. All the nodes are in the sane datacenter. Not sure if
> all are in the same rack.
>
>
>
> On Tuesday, February 17, 2015 8:53 PM, "sean_r_dur...@homedepot.com" <
> sean_r_dur...@homedepot.com> wrote:
>
>
>
> What snitch are you using? You may need to do some work on your topology
> file (or rackdc) to make sure you have the topology you want. Also, it is
> possible you may need to restart OpsCenter agents and/or your browser to
> see the nodes represented properly in OpsCenter.
>
>
>
>
>
> Sean Durity – Cassandra Admin, Home Depot
>
>
>
> *From:* Batranut Bogdan [mailto:batra...@yahoo.com ]
> *Sent:* Tuesday, February 17, 2015 10:20 AM
> *To:* user@cassandra.apache.org; reynald.bourtembo...@esrf.fr
> *Subject:* Re: Adding new node to cluster
>
>
>
> Hello,
>
>
>
> I know that UN is good, but what troubles me is the addition of the own
> node's ip in it's yaml seeds section.
>
>
>
> On Tuesday, February 17, 2015 3:40 PM, Reynald Bourtembourg <
> reynald.bourtembo...@esrf.fr> wrote:
>
>
>
> Hi Bogdan
>
> In nodetool status:
>
>- UJ: means your node is Up and Joining
>- UN: means your node is Up and in Normal state
>
>  UN in nodetool is good ;-)
>
>
>
> On 17/02/2015 13:56, Batranut Bogdan wrote:
>
>   Hello all,
>
>
>
> I have an existing cluster. When adding a new node, I saw that Opscenter
> saw the node in an unknown cluster. In the yaml, the cluster name is the
> same. So i have stopped the node and added it's ip address in the list of
> seeds. Now Opscenter sees my node. But nodetool status now sees it as UN,
> instead of UJ when it first started. One other mension is that even if I
> stop the node, remove it's ip from the list of seeds, Opscenter sees the
> node in the known clustre but nodetool sees it as UN. I am not sure what
> the implications of adding a node's ip in it's seed list are and I think
> that for the existing nodes I have might done the same. Eg. started with
> it's ip in the seed list but after removing it and having to restart the
> nodes for whatever reason, I did not see any changes.
>
>
>
> Is my cluster ok, or what do I need to do to bring the cluster to a good
> state?
>
>
>
> Thank you.
>
>
>
>
>
>
>  --
>
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any app

RE: Adding new node to cluster

2015-02-17 Thread SEAN_R_DURITY
SimpleSnitch is not rack aware. You would want to choose seed nodes and then 
not change them. Seed nodes apparently don’t bootstrap. All nodes need the same 
seeds in the yaml file. Here is more info: 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/initialize/initializeSingleDS.html



Sean Durity – Cassandra Admin, Big Data Team
To engage the team, create a 
request

From: Batranut Bogdan [mailto:batra...@yahoo.com]
Sent: Tuesday, February 17, 2015 3:28 PM
To: user@cassandra.apache.org; reynald.bourtembo...@esrf.fr
Subject: Re: Adding new node to cluster

Hello,

I use SimpleSnitch. All the nodes are in the sane datacenter. Not sure if all 
are in the same rack.

On Tuesday, February 17, 2015 8:53 PM, 
"sean_r_dur...@homedepot.com" 
mailto:sean_r_dur...@homedepot.com>> wrote:

What snitch are you using? You may need to do some work on your topology file 
(or rackdc) to make sure you have the topology you want. Also, it is possible 
you may need to restart OpsCenter agents and/or your browser to see the nodes 
represented properly in OpsCenter.


Sean Durity – Cassandra Admin, Home Depot

From: Batranut Bogdan [mailto:batra...@yahoo.com]
Sent: Tuesday, February 17, 2015 10:20 AM
To: user@cassandra.apache.org; 
reynald.bourtembo...@esrf.fr
Subject: Re: Adding new node to cluster

Hello,

I know that UN is good, but what troubles me is the addition of the own node's 
ip in it's yaml seeds section.

On Tuesday, February 17, 2015 3:40 PM, Reynald Bourtembourg 
mailto:reynald.bourtembo...@esrf.fr>> wrote:

Hi Bogdan

In nodetool status:

  *   UJ: means your node is Up and Joining
  *   UN: means your node is Up and in Normal state
UN in nodetool is good ;-)

On 17/02/2015 13:56, Batranut Bogdan wrote:
Hello all,

I have an existing cluster. When adding a new node, I saw that Opscenter saw 
the node in an unknown cluster. In the yaml, the cluster name is the same. So i 
have stopped the node and added it's ip address in the list of seeds. Now 
Opscenter sees my node. But nodetool status now sees it as UN, instead of UJ 
when it first started. One other mension is that even if I stop the node, 
remove it's ip from the list of seeds, Opscenter sees the node in the known 
clustre but nodetool sees it as UN. I am not sure what the implications of 
adding a node's ip in it's seed list are and I think that for the existing 
nodes I have might done the same. Eg. started with it's ip in the seed list but 
after removing it and having to restart the nodes for whatever reason, I did 
not see any changes.

Is my cluster ok, or what do I need to do to bring the cluster to a good state?

Thank you.





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Adding new node to cluster

2015-02-17 Thread Batranut Bogdan
Hello,
I use SimpleSnitch. All the nodes are in the sane datacenter. Not sure if all 
are in the same rack. 

 On Tuesday, February 17, 2015 8:53 PM, "sean_r_dur...@homedepot.com" 
 wrote:
   

 #yiv3880239437 #yiv3880239437 -- _filtered #yiv3880239437 
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv3880239437 
{font-family:Wingdings;panose-1:5 0 0 0 0 0 0 0 0 0;} _filtered #yiv3880239437 
{font-family:Wingdings;panose-1:5 0 0 0 0 0 0 0 0 0;} _filtered #yiv3880239437 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv3880239437 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv3880239437 
#yiv3880239437 p.yiv3880239437MsoNormal, #yiv3880239437 
li.yiv3880239437MsoNormal, #yiv3880239437 div.yiv3880239437MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv3880239437 a:link, 
#yiv3880239437 span.yiv3880239437MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv3880239437 a:visited, #yiv3880239437 
span.yiv3880239437MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv3880239437 
span.yiv3880239437EmailStyle17 {color:#1F497D;}#yiv3880239437 
.yiv3880239437MsoChpDefault {font-size:10.0pt;} _filtered #yiv3880239437 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv3880239437 div.yiv3880239437WordSection1 
{}#yiv3880239437 _filtered #yiv3880239437 {} _filtered #yiv3880239437 
{font-family:Symbol;} _filtered #yiv3880239437 {} _filtered #yiv3880239437 
{font-family:Wingdings;} _filtered #yiv3880239437 {font-family:Wingdings;} 
_filtered #yiv3880239437 {font-family:Wingdings;} _filtered #yiv3880239437 
{font-family:Wingdings;} _filtered #yiv3880239437 {font-family:Wingdings;} 
_filtered #yiv3880239437 {font-family:Wingdings;} _filtered #yiv3880239437 
{font-family:Wingdings;}#yiv3880239437 ol {margin-bottom:0in;}#yiv3880239437 ul 
{margin-bottom:0in;}#yiv3880239437 What snitch are you using? You may need to 
do some work on your topology file (or rackdc) to make sure you have the 
topology you want. Also, it is possible you may need to restart OpsCenter 
agents and/or your browser to see the nodes represented properly in OpsCenter.  
     Sean Durity – Cassandra Admin, Home Depot    From: Batranut Bogdan 
[mailto:batra...@yahoo.com]
Sent: Tuesday, February 17, 2015 10:20 AM
To: user@cassandra.apache.org; reynald.bourtembo...@esrf.fr
Subject: Re: Adding new node to cluster    Hello,    I know that UN is good, 
but what troubles me is the addition of the own node's ip in it's yaml seeds 
section.    On Tuesday, February 17, 2015 3:40 PM, Reynald Bourtembourg 
 wrote:    Hi Bogdan

In nodetool status:
   - UJ: means your node is Up and Joining
   - UN: means your node is Up and in Normal state
 UN in nodetool is good ;-)    On 17/02/2015 13:56, Batranut Bogdan wrote: 
Hello all,    I have an existing cluster. When adding a new node, I saw that 
Opscenter saw the node in an unknown cluster. In the yaml, the cluster name is 
the same. So i have stopped the node and added it's ip address in the list of 
seeds. Now Opscenter sees my node. But nodetool status now sees it as UN, 
instead of UJ when it first started. One other mension is that even if I stop 
the node, remove it's ip from the list of seeds, Opscenter sees the node in the 
known clustre but nodetool sees it as UN. I am not sure what the implications 
of adding a node's ip in it's seed list are and I think that for the existing 
nodes I have might done the same. Eg. started with it's ip in the seed list but 
after removing it and having to restart the nodes for whatever reason, I did 
not see any changes.    Is my cluster ok, or what do I need to do to bring the 
cluster to a good state?    Thank you. 
      

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


   

Re: How to connect to Opscenter from outside the cloud?

2015-02-17 Thread Kai Wang
You can start from here:
http://www.datastax.com/docs/1.1/references/firewall_ref

By default ops site is hosted at port .

On Tue, Feb 17, 2015 at 12:38 PM, Syed, Basit B. (NSN - FI/Espoo) <
basit.b.s...@nsn.com> wrote:

>  Hi,
> I have a two  node cluster running on openstack cloud. One of the node is
> also running Opscenter, while both are running datastax-agents.
>
> How can I use browser on my Windows machine to connect to this instance of
> opscenter? Specifically, I want to ask, which ports should I open in
> default security group to make it happen?
>
> Regards,
> Basit
>
> Datacenter: Cassandra
> =
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens  Owns   Host
> ID   Rack
> UN  192.168.2.6  26.45 MB   1   14.8%
> 95bbe8a0-942b-4408-b152-88a1c4f4e2de  rack1
> UN  192.168.2.4  9.92 GB1   85.2%
> 2527f7e0-e2f6-41d5-b6c1-48d1d922ef8e  rack1
>
>
>
>


RE: Two problems with Cassandra

2015-02-17 Thread SEAN_R_DURITY
Full table scans are not the best use case for Cassandra. Without some kind of 
pagination, the node taking the request (the coordinator node) will try to 
assemble the data from all nodes to return to the client. With a dataset of any 
decent size, it will overwhelm the single node.

Pagination is supported in newer versions of Cassandra (2.0.x+, I think) and 
some drivers. You can see there is other discussion on the list about the best 
ways to split your workload and do some parallel processing. Something I 
haven’t seen mentioned recently (but probably discussed before I joined the 
list) is setting up a separate, analytics DC. There you could integrate with 
hadoop or spark or just size your nodes differently to handle an analytics type 
workload.

We have found that it is better to use a list of known keys and pull back rows 
(aka partitions) individually for any table scan type operations. However, we 
are usually able to generate the list of keys outside of Cassandra…


Sean Durity – Cassandra Admin, Home Depot

From: Pavel Velikhov [mailto:pavel.velik...@gmail.com]
Sent: Thursday, February 12, 2015 4:23 AM
To: user@cassandra.apache.org
Subject: Re: Two problems with Cassandra


On Feb 12, 2015, at 12:37 AM, Robert Coli 
mailto:rc...@eventbrite.com>> wrote:

On Wed, Feb 11, 2015 at 2:22 AM, Pavel Velikhov 
mailto:pavel.velik...@gmail.com>> wrote:
  2. While trying to update the full dataset with a simple transformation 
(again via python driver), single node and clustered Cassandra run out of 
memory no matter what settings I try, even I put a lot of sleeps into the mix. 
However simpler transformations (updating just one column, specially when there 
is a lot of processing overhead) work just fine.

What does a "simple transformation" mean here? Assuming a reasonable sized 
heap, OOM sounds like you're trying to update a large number of large 
partitions in a single operation.

In general, in Cassandra, you're best off interacting with a single or small 
number of partitions in any given interaction.

=Rob


Hi Robert!

  Simple transformation is changing just a single column value (for I usually 
do it for the whole dataset).
  But when I was running out of memory, I was reading in 5 columns and updating 
3. Some of them could be big, but I need to check and rerun this case.
  (I worked around this by dumping to files and then scanning the files and 
updating the database, but this stinks!)

  I don’t quite understand the fundamentals of Cassandra - if I’m just doing 
one scan with a reasonable number of columns that I fetch, and I’m updating at 
the same time, what’s happening there? Why eat up so much memory and die?



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


[RELEASE] Apache Cassandra 2.1.3 released

2015-02-17 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.3.

This release contains over 100 fixes for 2.1 so anyone on 2.1.X should
upgrade to this ASAP.


Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Enjoy!

[1]: http://goo.gl/xGm4Qq (CHANGES.txt)
[2]: http://goo.gl/dBGQa0 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: sstables remain after compaction

2015-02-17 Thread Robert Coli
On Fri, Feb 13, 2015 at 7:45 PM, Jason Wee  wrote:

> I trigger user defined compaction to big sstables (big as in the size per
> sstable reach more than 50GB, some 100GB). Occasionally, after user defined
> compaction, I see some sstables remain, even after 12 hours elapsed.
>

That is unexpected. What version of Cassandra?


> You mentioned a thread, could you tell what threads are those or perhaps
> highlight in the code?
>

I'd presume READ_STAGE threads.

=Rob


RE: Adding new node to cluster

2015-02-17 Thread SEAN_R_DURITY
What snitch are you using? You may need to do some work on your topology file 
(or rackdc) to make sure you have the topology you want. Also, it is possible 
you may need to restart OpsCenter agents and/or your browser to see the nodes 
represented properly in OpsCenter.


Sean Durity – Cassandra Admin, Home Depot

From: Batranut Bogdan [mailto:batra...@yahoo.com]
Sent: Tuesday, February 17, 2015 10:20 AM
To: user@cassandra.apache.org; reynald.bourtembo...@esrf.fr
Subject: Re: Adding new node to cluster

Hello,

I know that UN is good, but what troubles me is the addition of the own node's 
ip in it's yaml seeds section.

On Tuesday, February 17, 2015 3:40 PM, Reynald Bourtembourg 
mailto:reynald.bourtembo...@esrf.fr>> wrote:

Hi Bogdan

In nodetool status:

  *   UJ: means your node is Up and Joining
  *   UN: means your node is Up and in Normal state
UN in nodetool is good ;-)

On 17/02/2015 13:56, Batranut Bogdan wrote:
Hello all,

I have an existing cluster. When adding a new node, I saw that Opscenter saw 
the node in an unknown cluster. In the yaml, the cluster name is the same. So i 
have stopped the node and added it's ip address in the list of seeds. Now 
Opscenter sees my node. But nodetool status now sees it as UN, instead of UJ 
when it first started. One other mension is that even if I stop the node, 
remove it's ip from the list of seeds, Opscenter sees the node in the known 
clustre but nodetool sees it as UN. I am not sure what the implications of 
adding a node's ip in it's seed list are and I think that for the existing 
nodes I have might done the same. Eg. started with it's ip in the seed list but 
after removing it and having to restart the nodes for whatever reason, I did 
not see any changes.

Is my cluster ok, or what do I need to do to bring the cluster to a good state?

Thank you.





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


How to connect to Opscenter from outside the cloud?

2015-02-17 Thread Syed, Basit B. (NSN - FI/Espoo)
Hi,
I have a two  node cluster running on openstack cloud. One of the node is also 
running Opscenter, while both are running datastax-agents.

How can I use browser on my Windows machine to connect to this instance of 
opscenter? Specifically, I want to ask, which ports should I open in default 
security group to make it happen?

Regards,
Basit

Datacenter: Cassandra
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  Owns   Host ID  
 Rack
UN  192.168.2.6  26.45 MB   1   14.8%  95bbe8a0-942b-4408-b152-88a1c4f4e2de 
 rack1
UN  192.168.2.4  9.92 GB1   85.2%  2527f7e0-e2f6-41d5-b6c1-48d1d922ef8e 
 rack1





Re: Adding new node to cluster

2015-02-17 Thread Batranut Bogdan
Hello,
I know that UN is good, but what troubles me is the addition of the own node's 
ip in it's yaml seeds section. 

 On Tuesday, February 17, 2015 3:40 PM, Reynald Bourtembourg 
 wrote:
   

  Hi Bogdan
 
 In nodetool status:

   - UJ: means your node is Up and Joining
   - UN: means your node is Up and in Normal state
 UN in nodetool is good ;-)
  
 On 17/02/2015 13:56, Batranut Bogdan wrote:
  
  Hello all, 
  I have an existing cluster. When adding a new node, I saw that Opscenter saw 
the node in an unknown cluster. In the yaml, the cluster name is the same. So i 
have stopped the node and added it's ip address in the list of seeds. Now 
Opscenter sees my node. But nodetool status now sees it as UN, instead of UJ 
when it first started. One other mension is that even if I stop the node, 
remove it's ip from the list of seeds, Opscenter sees the node in the known 
clustre but nodetool sees it as UN. I am not sure what the implications of 
adding a node's ip in it's seed list are and I think that for the existing 
nodes I have might done the same. Eg. started with it's ip in the seed list but 
after removing it and having to restart the nodes for whatever reason, I did 
not see any changes. 
  Is my cluster ok, or what do I need to do to bring the cluster to a good 
state? 
  Thank you.  
 
 



Re: road map for Cassandra 3.0

2015-02-17 Thread Ernesto Reinaldo Barreiro
Thanks for your answer!

On Tue, Feb 17, 2015 at 2:24 PM, Ajaya Agrawal  wrote:

> It would be around April of this year. I asked the same thing in
> cassaandra-dev irc channel sometime back. This is by no means an official
> release date or month, just a guesstimate.
>
> Cheers,
> Ajaya
>
> On Wed, Feb 11, 2015 at 6:49 PM, Ernesto Reinaldo Barreiro <
> reier...@gmail.com> wrote:
>
>> Thanks for your answer!
>>
>> On Wed, Feb 11, 2015 at 1:03 PM, DuyHai Doan 
>> wrote:
>>
>>> Look at the JIRA, filter by 3.0. But it's not very accurate. There are
>>> lot of new features scheduled for 3.0. Some of them will make it on time
>>> for 3.0.0 like User Defined Functions I guess. Some other features will be
>>> shipped with future 3 middle/minor versions.
>>>
>>>
>>> On Wed, Feb 11, 2015 at 1:25 PM, Ernesto Reinaldo Barreiro <
>>> reier...@gmail.com> wrote:
>>>
 Hi,

 Is there a public road map for Cassandra 3.0? Are there any estimates
 for the release date of 3.0?

 --
 Regards - Ernesto Reinaldo Barreiro

>>>
>>>
>>
>>
>> --
>> Regards - Ernesto Reinaldo Barreiro
>>
>
>


-- 
Regards - Ernesto Reinaldo Barreiro


Re: road map for Cassandra 3.0

2015-02-17 Thread Ajaya Agrawal
It would be around April of this year. I asked the same thing in
cassaandra-dev irc channel sometime back. This is by no means an official
release date or month, just a guesstimate.

Cheers,
Ajaya

On Wed, Feb 11, 2015 at 6:49 PM, Ernesto Reinaldo Barreiro <
reier...@gmail.com> wrote:

> Thanks for your answer!
>
> On Wed, Feb 11, 2015 at 1:03 PM, DuyHai Doan  wrote:
>
>> Look at the JIRA, filter by 3.0. But it's not very accurate. There are
>> lot of new features scheduled for 3.0. Some of them will make it on time
>> for 3.0.0 like User Defined Functions I guess. Some other features will be
>> shipped with future 3 middle/minor versions.
>>
>>
>> On Wed, Feb 11, 2015 at 1:25 PM, Ernesto Reinaldo Barreiro <
>> reier...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is there a public road map for Cassandra 3.0? Are there any estimates
>>> for the release date of 3.0?
>>>
>>> --
>>> Regards - Ernesto Reinaldo Barreiro
>>>
>>
>>
>
>
> --
> Regards - Ernesto Reinaldo Barreiro
>


Re: Many pending compactions

2015-02-17 Thread Roni Balthazar
HI,

Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was
the solution...
The number of SSTables decreased from many thousands to a number below
a hundred and the SSTables are now much bigger with several gigabytes
(most of them).

Cheers,

Roni Balthazar



On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam  wrote:
> After some diagnostic ( we didn't set yet cold_reads_to_omit ). Compaction
> are running but VERY slow with "idle" IO.
>
> We had a lot of "Data files" in Cassandra. In DC_A it is about ~12 (only
> xxx-Data.db) in DC_B has only ~4000.
>
> I don't know if this change anything but:
> 1) in DC_A avg size of Data.db file is ~13 mb. I have few a really big ones,
> but most is really small (almost 1 files are less then 100mb).
> 2) in DC_B avg size of Data.db is much bigger ~260mb.
>
> Do you think that above flag will help us?
>
>
> On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam  wrote:
>>
>> I set setcompactionthroughput 999 permanently and it doesn't change
>> anything. IO is still same. CPU is idle.
>>
>> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar 
>> wrote:
>>>
>>> Hi,
>>>
>>> You can run "nodetool compactionstats" to view statistics on compactions.
>>> Setting cold_reads_to_omit to 0.0 can help to reduce the number of
>>> SSTables when you use Size-Tiered compaction.
>>> You can also create a cron job to increase the value of
>>> setcompactionthroughput during the night or when your IO is not busy.
>>>
>>> From http://wiki.apache.org/cassandra/NodeTool:
>>> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
>>> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
>>>
>>> Cheers,
>>>
>>> Roni Balthazar
>>>
>>> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam  wrote:
>>> > One think I do not understand. In my case compaction is running
>>> > permanently.
>>> > Is there a way to check which compaction is pending? The only
>>> > information is
>>> > about total count.
>>> >
>>> >
>>> > On Monday, February 16, 2015, Ja Sam  wrote:
>>> >>
>>> >> Of couse I made a mistake. I am using 2.1.2. Anyway night build is
>>> >> available from
>>> >> http://cassci.datastax.com/job/cassandra-2.1/
>>> >>
>>> >> I read about cold_reads_to_omit It looks promising. Should I set also
>>> >> compaction throughput?
>>> >>
>>> >> p.s. I am really sad that I didn't read this before:
>>> >>
>>> >> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>>> >>
>>> >>
>>> >>
>>> >> On Monday, February 16, 2015, Carlos Rolo  wrote:
>>> >>>
>>> >>> Hi 100% in agreement with Roland,
>>> >>>
>>> >>> 2.1.x series is a pain! I would never recommend the current 2.1.x
>>> >>> series
>>> >>> for production.
>>> >>>
>>> >>> Clocks is a pain, and check your connectivity! Also check tpstats to
>>> >>> see
>>> >>> if your threadpools are being overrun.
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Carlos Juzarte Rolo
>>> >>> Cassandra Consultant
>>> >>>
>>> >>> Pythian - Love your data
>>> >>>
>>> >>> rolo@pythian | Twitter: cjrolo | Linkedin:
>>> >>> linkedin.com/in/carlosjuzarterolo
>>> >>> Tel: 1649
>>> >>> www.pythian.com
>>> >>>
>>> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer
>>> >>>  wrote:
>>> 
>>>  Hi,
>>> 
>>>  1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by
>>>  Al
>>>  Tobey from DataStax)
>>>  7) minimal reads (usually none, sometimes few)
>>> 
>>>  those two points keep me repeating an anwser I got. First where did
>>>  you
>>>  get 2.1.3 from? Maybe I missed it, I will have a look. But if it is
>>>  2.1.2
>>>  whis is the latest released version, that version has many bugs -
>>>  most of
>>>  them I got kicked by while testing 2.1.2. I got many problems with
>>>  compactions not beeing triggred on column families not beeing read,
>>>  compactions and repairs not beeing completed.  See
>>> 
>>> 
>>> 
>>>  https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1
>>> 
>>>  https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
>>> 
>>>  Apart from that, how are those both datacenters connected? Maybe
>>>  there
>>>  is a bottleneck.
>>> 
>>>  Also do you have ntp up and running on all nodes to keep all clocks
>>>  in
>>>  thight sync?
>>> 
>>>  Note: I'm no expert (yet) - just sharing my 2 cents.
>>> 
>>>  Cheers,
>>>  Roland
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>>
>>> >>>
>>> >>>
>>> >
>>
>>
>


Re: Adding new node to cluster

2015-02-17 Thread Reynald Bourtembourg

Hi Bogdan

In nodetool status:

 * UJ: means your node is Up and Joining
 * UN: means your node is Up and in Normal state

UN in nodetool is good ;-)


On 17/02/2015 13:56, Batranut Bogdan wrote:

Hello all,

I have an existing cluster. When adding a new node, I saw that 
Opscenter saw the node in an unknown cluster. In the yaml, the cluster 
name is the same. So i have stopped the node and added it's ip address 
in the list of seeds. Now Opscenter sees my node. But nodetool status 
now sees it as UN, instead of UJ when it first started. One other 
mension is that even if I stop the node, remove it's ip from the list 
of seeds, Opscenter sees the node in the known clustre but nodetool 
sees it as UN. I am not sure what the implications of adding a node's 
ip in it's seed list are and I think that for the existing nodes I 
have might done the same. Eg. started with it's ip in the seed list 
but after removing it and having to restart the nodes for whatever 
reason, I did not see any changes.


Is my cluster ok, or what do I need to do to bring the cluster to a 
good state?


Thank you.




Re: Many pending compactions

2015-02-17 Thread Ja Sam
After some diagnostic ( we didn't set yet cold_reads_to_omit ). Compaction
are running but VERY slow with "idle" IO.

We had a lot of "Data files" in Cassandra. In DC_A it is about ~12
(only xxx-Data.db) in DC_B has only ~4000.

I don't know if this change anything but:
1) in DC_A avg size of Data.db file is ~13 mb. I have few a really big
ones, but most is really small (almost 1 files are less then 100mb).
2) in DC_B avg size of Data.db is much bigger ~260mb.

Do you think that above flag will help us?


On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam  wrote:

> I set setcompactionthroughput 999 permanently and it doesn't change
> anything. IO is still same. CPU is idle.
>
> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar 
> wrote:
>
>> Hi,
>>
>> You can run "nodetool compactionstats" to view statistics on compactions.
>> Setting cold_reads_to_omit to 0.0 can help to reduce the number of
>> SSTables when you use Size-Tiered compaction.
>> You can also create a cron job to increase the value of
>> setcompactionthroughput during the night or when your IO is not busy.
>>
>> From http://wiki.apache.org/cassandra/NodeTool:
>> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
>> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
>>
>> Cheers,
>>
>> Roni Balthazar
>>
>> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam  wrote:
>> > One think I do not understand. In my case compaction is running
>> permanently.
>> > Is there a way to check which compaction is pending? The only
>> information is
>> > about total count.
>> >
>> >
>> > On Monday, February 16, 2015, Ja Sam  wrote:
>> >>
>> >> Of couse I made a mistake. I am using 2.1.2. Anyway night build is
>> >> available from
>> >> http://cassci.datastax.com/job/cassandra-2.1/
>> >>
>> >> I read about cold_reads_to_omit It looks promising. Should I set also
>> >> compaction throughput?
>> >>
>> >> p.s. I am really sad that I didn't read this before:
>> >>
>> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>> >>
>> >>
>> >>
>> >> On Monday, February 16, 2015, Carlos Rolo  wrote:
>> >>>
>> >>> Hi 100% in agreement with Roland,
>> >>>
>> >>> 2.1.x series is a pain! I would never recommend the current 2.1.x
>> series
>> >>> for production.
>> >>>
>> >>> Clocks is a pain, and check your connectivity! Also check tpstats to
>> see
>> >>> if your threadpools are being overrun.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Carlos Juzarte Rolo
>> >>> Cassandra Consultant
>> >>>
>> >>> Pythian - Love your data
>> >>>
>> >>> rolo@pythian | Twitter: cjrolo | Linkedin:
>> >>> linkedin.com/in/carlosjuzarterolo
>> >>> Tel: 1649
>> >>> www.pythian.com
>> >>>
>> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer
>> >>>  wrote:
>> 
>>  Hi,
>> 
>>  1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by
>> Al
>>  Tobey from DataStax)
>>  7) minimal reads (usually none, sometimes few)
>> 
>>  those two points keep me repeating an anwser I got. First where did
>> you
>>  get 2.1.3 from? Maybe I missed it, I will have a look. But if it is
>> 2.1.2
>>  whis is the latest released version, that version has many bugs -
>> most of
>>  them I got kicked by while testing 2.1.2. I got many problems with
>>  compactions not beeing triggred on column families not beeing read,
>>  compactions and repairs not beeing completed.  See
>> 
>> 
>> 
>> https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1
>> 
>> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
>> 
>>  Apart from that, how are those both datacenters connected? Maybe
>> there
>>  is a bottleneck.
>> 
>>  Also do you have ntp up and running on all nodes to keep all clocks
>> in
>>  thight sync?
>> 
>>  Note: I'm no expert (yet) - just sharing my 2 cents.
>> 
>>  Cheers,
>>  Roland
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>>
>> >>>
>> >
>>
>
>


Adding new node to cluster

2015-02-17 Thread Batranut Bogdan
Hello all,
I have an existing cluster. When adding a new node, I saw that Opscenter saw 
the node in an unknown cluster. In the yaml, the cluster name is the same. So i 
have stopped the node and added it's ip address in the list of seeds. Now 
Opscenter sees my node. But nodetool status now sees it as UN, instead of UJ 
when it first started. One other mension is that even if I stop the node, 
remove it's ip from the list of seeds, Opscenter sees the node in the known 
clustre but nodetool sees it as UN. I am not sure what the implications of 
adding a node's ip in it's seed list are and I think that for the existing 
nodes I have might done the same. Eg. started with it's ip in the seed list but 
after removing it and having to restart the nodes for whatever reason, I did 
not see any changes.
Is my cluster ok, or what do I need to do to bring the cluster to a good state?
Thank you.

Re: Many pending compactions

2015-02-17 Thread Ja Sam
I set setcompactionthroughput 999 permanently and it doesn't change
anything. IO is still same. CPU is idle.

On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar 
wrote:

> Hi,
>
> You can run "nodetool compactionstats" to view statistics on compactions.
> Setting cold_reads_to_omit to 0.0 can help to reduce the number of
> SSTables when you use Size-Tiered compaction.
> You can also create a cron job to increase the value of
> setcompactionthroughput during the night or when your IO is not busy.
>
> From http://wiki.apache.org/cassandra/NodeTool:
> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999
> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16
>
> Cheers,
>
> Roni Balthazar
>
> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam  wrote:
> > One think I do not understand. In my case compaction is running
> permanently.
> > Is there a way to check which compaction is pending? The only
> information is
> > about total count.
> >
> >
> > On Monday, February 16, 2015, Ja Sam  wrote:
> >>
> >> Of couse I made a mistake. I am using 2.1.2. Anyway night build is
> >> available from
> >> http://cassci.datastax.com/job/cassandra-2.1/
> >>
> >> I read about cold_reads_to_omit It looks promising. Should I set also
> >> compaction throughput?
> >>
> >> p.s. I am really sad that I didn't read this before:
> >>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
> >>
> >>
> >>
> >> On Monday, February 16, 2015, Carlos Rolo  wrote:
> >>>
> >>> Hi 100% in agreement with Roland,
> >>>
> >>> 2.1.x series is a pain! I would never recommend the current 2.1.x
> series
> >>> for production.
> >>>
> >>> Clocks is a pain, and check your connectivity! Also check tpstats to
> see
> >>> if your threadpools are being overrun.
> >>>
> >>> Regards,
> >>>
> >>> Carlos Juzarte Rolo
> >>> Cassandra Consultant
> >>>
> >>> Pythian - Love your data
> >>>
> >>> rolo@pythian | Twitter: cjrolo | Linkedin:
> >>> linkedin.com/in/carlosjuzarterolo
> >>> Tel: 1649
> >>> www.pythian.com
> >>>
> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer
> >>>  wrote:
> 
>  Hi,
> 
>  1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al
>  Tobey from DataStax)
>  7) minimal reads (usually none, sometimes few)
> 
>  those two points keep me repeating an anwser I got. First where did
> you
>  get 2.1.3 from? Maybe I missed it, I will have a look. But if it is
> 2.1.2
>  whis is the latest released version, that version has many bugs -
> most of
>  them I got kicked by while testing 2.1.2. I got many problems with
>  compactions not beeing triggred on column families not beeing read,
>  compactions and repairs not beeing completed.  See
> 
> 
> 
> https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1
> 
> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html
> 
>  Apart from that, how are those both datacenters connected? Maybe there
>  is a bottleneck.
> 
>  Also do you have ntp up and running on all nodes to keep all clocks in
>  thight sync?
> 
>  Note: I'm no expert (yet) - just sharing my 2 cents.
> 
>  Cheers,
>  Roland
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>>
> >>>
> >
>