Re: sstablescrum fails with OOM

2017-11-02 Thread sai krishnam raju potturi
Yes. Move the corrupt sstable, and run a repair on this node, so that it
gets in sync with it's peers.

On Thu, Nov 2, 2017 at 6:12 PM, Shashi Yachavaram 
wrote:

> We are cassandra 2.0.17 and have corrupted sstables. Ran offline
> sstablescrub but it fails with OOM. Increased the MAX_HEAP_SIZE to 8G it
> still fails.
>
> Can we move the corrupted sstable file and rerun sstablescrub followed by
> repair.
>
> -shashi..
>


Re: IPv6-only host, can't seem to get Cassandra to bind to a public port

2017-04-11 Thread sai krishnam raju potturi
We have included the IPV6 address with scope GLOBAL, and not IPV6 with
SCOPE LINK in the YAML and TOPOLOGY files.

inet6 addr: 2001: *** : ** : ** : * : * :  :   Scope:Global

inet6 addr: fe80 :: *** :  :  :  Scope:Link


Not sure if this might be of relevance to the issue you are facing.


thanks

Sai



On Tue, Apr 11, 2017 at 10:29 AM, Martijn Pieters <mjpiet...@fb.com> wrote:

> From: sai krishnam raju potturi <pskraj...@gmail.com>
> > I got a similar error, and commenting out the below line helped.
> > JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack=true"
> >
> > Did you also include "rpc_interface_prefer_ipv6: true" in the YAML file?
>
> No luck at all here. Yes, I had commented out that line (and also tried
> replacing it with `-Djava.net.preferIPv6Addresses=true`, included in my
> email. I also included an error to make sure it was the right file).
>
> It all *should* work, but doesn’t. :-(
>
> I just tried again with “rpc_interface_prefer_ipv6: true” set as well, but
> without luck. I note that I have the default “rpc_address: localhost”, so
> it’ll bind to the lo loopback, which has IPv4 configured already. Not that
> using “rpc_address: ‘::1’” instead works (same error, so I can’t bind to
> the IPv6 localhost address either).
>
> Martijn Pieters
>
>
>
>
>


Re: IPv6-only host, can't seem to get Cassandra to bind to a public port

2017-04-11 Thread sai krishnam raju potturi
I got a similar error, and commenting out the below line helped.

JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack=true"


Did you also include "rpc_interface_prefer_ipv6: true" in the YAML file?


thanks

Sai



On Tue, Apr 11, 2017 at 6:37 AM, Martijn Pieters  wrote:

> I’m having issues getting a single-node Cassandra cluster to run on a
> Ubuntu 16.04 VM with only IPv6 available. I’m running Oracle Java 8
> (8u121-1~webupd8~2), Cassandra 3.10 (installed via the Cassandra
> http://www.apache.org/dist/cassandra/debian packages.)
>
>
>
> I consistently get a “Protocol family unavailable” exception:
>
>
>
> ERROR [main] 2017-04-11 09:54:23,991 CassandraDaemon.java:752 - Exception
> encountered during startup
>
> java.lang.RuntimeException: java.net.SocketException: Protocol family
> unavailable
>
> at 
> org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:730)
> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.net.MessagingService.listen(MessagingService.java:664)
> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.net.MessagingService.listen(MessagingService.java:648)
> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:773)
> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:666)
> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
> ~[apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394)
> [apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
> [apache-cassandra-3.10.jar:3.10]
>
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735)
> [apache-cassandra-3.10.jar:3.10]
>
> Caused by: java.net.SocketException: Protocol family unavailable
>
> at sun.nio.ch.Net.bind0(Native Method) ~[na:1.8.0_121]
>
> at sun.nio.ch.Net.bind(Net.java:433) ~[na:1.8.0_121]
>
> at sun.nio.ch.Net.bind(Net.java:425) ~[na:1.8.0_121]
>
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> ~[na:1.8.0_121]
>
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> ~[na:1.8.0_121]
>
> at 
> org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:714)
> ~[apache-cassandra-3.10.jar:3.10]
>
> ... 8 common frames omitted
>
>
>
> `lo` (loopback) has both `inet` and `inet6` addresses, but `eth0` has no
> `inet` addresses, so only inet6 addr entries (both a local and a global
> scope address are configured).
>
>
>
> My configuration changes:
>
>
>
> listen_address: 
>
> listen_interface_prefer_ipv6: true
>
>
>
> Tracing through the source code the exception shows that it is the
> listen_address value above that throws the exception, changing it back to
> 127.0.0.1 makes the server work again (but then I don’t get to use it on my
> local network). I tried both the local and the global scope IPv6 address.
>
>
>
> I tried changing the JVM configuration to prefer IPv6 by editing
> /etc/cassandra/cassandra-env.sh:
>
>
>
> --- etc/cassandra/cassandra-env.sh  2017-01-31 16:29:32.0
> +
>
> +++ /etc/cassandra/cassandra-env.sh 2017-04-11 09:52:51.45600
> +
>
> @@ -290,6 +290,9 @@
>
> # to the location of the native libraries.
>
> JVM_OPTS="$JVM_OPTS -Djava.library.path=$CASSANDRA_HOME/lib/sigar-bin"
>
>
>
> +#JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack=true"
>
> +JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv6Addresses=true"
>
> +
>
> JVM_OPTS="$JVM_OPTS $MX4J_ADDRESS"
>
> JVM_OPTS="$JVM_OPTS $MX4J_PORT"
>
> JVM_OPTS="$JVM_OPTS $JVM_EXTRA_OPTS"
>
>
>
> But this makes no difference
>
>
>
> I also tried using `listen_interface` instead, but that only changes the
> error message to:
>
>
>
> ERROR [main] 2017-04-11 10:35:16,426 CassandraDaemon.java:752 -
> Exception encountered during startup: Configured listen_interface "eth0"
> could not be found
>
>
>
> What else can I do?
>
>
>
> Martijn Pieters
>


Re: Re : Decommissioned nodes show as DOWN in Cassandra versions 2.1.12 - 2.1.16

2017-01-27 Thread sai krishnam raju potturi
FYI : This issue is related to CASSANDRA-10205
<https://issues.apache.org/jira/browse/CASSANDRA-10205> (Gossiper
class) patch introduced in 2.1.11. When we roll back the changes from
CASSANDRA-10205
<https://issues.apache.org/jira/browse/CASSANDRA-10205> (Gossiper
class) in 2.1.12 and 2.1.15, everything works as expected. Further tests
still need to be done on our end though.

One more thing observed was that the decommissioned nodes do not show up as
"UNREACHABLE" in the "nodetool describecluster" after 72 hours. Things are
normal.

thanks Pillai; but the ip-address does not exist in the system-peers table
on any of the nodes. Unsafe-assasinate is not our preferred option when we
decommission a datacenter consisting of more than 100 nodes.

Kurt; we have not tested out 2.1.7 and 2.1.8 versions yet

Pratik; i'm not sure if your issue relates to this, as we observe the node
as UNREACHABLE in the "nodetool describecluster". nodetool gossipinfo
should generally show the information of the decommissioned nodes for a
while, which is expected behaviour.

thanks
Sai




On Fri, Jan 27, 2017 at 12:54 PM, Harikrishnan Pillai <
hpil...@walmartlabs.com> wrote:

> Please remove the ips from the system.peer table of all nodes  or you can
> use unsafeassasinate from JMX.
>
>
> --
> *From:* Agrawal, Pratik <paagr...@amazon.com>
> *Sent:* Friday, January 27, 2017 9:05:43 AM
> *To:* user@cassandra.apache.org; k...@instaclustr.com; pskraj...@gmail.com
> *Cc:* Sun, Guan
>
> *Subject:* Re: Re : Decommissioned nodes show as DOWN in Cassandra
> versions 2.1.12 - 2.1.16
>
> We are seeing the same issue with Cassandra 2.0.8. The nodetool gossipinfo
> reports a node being down even after we decommission the node from the
> cluster.
>
> Thanks,
> Pratik
>
> From: kurt greaves <k...@instaclustr.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Friday, January 27, 2017 at 5:54 AM
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Re : Decommissioned nodes show as DOWN in Cassandra versions
> 2.1.12 - 2.1.16
>
> we've seen this issue on a few clusters, including on 2.1.7 and 2.1.8.
> pretty sure it is an issue in gossip that's known about. in later versions
> it seems to be fixed.
>
> On 24 Jan 2017 06:09, "sai krishnam raju potturi" <pskraj...@gmail.com>
> wrote:
>
>> In the Cassandra versions 2.1.11 - 2.1.16, after we decommission a node
>> or datacenter, we observe the decommissioned nodes marked as DOWN in the
>> cluster when you do a "nodetool describecluster". The nodes however do not
>> show up in the "nodetool status" command.
>> The decommissioned node also does not show up in the "system_peers" table
>> on the nodes.
>>
>> The workaround we follow is rolling restart of the cluster, which removes
>> the decommissioned nodes from the "UNREACHABLE STATE", and shows the actual
>> state of the cluster. The workaround is tedious for huge clusters.
>>
>> We also verified the decommission process in CCM tool, and observed the
>> same issue for clusters with versions from 2.1.12 to 2.1.16. The issue was
>> not observed in versions prior to or later than the ones mentioned above.
>>
>>
>> Has anybody in the community observed similar issue? We've also raised a
>> JIRA issue regarding this.   https://issues.apache.org/jira
>> /browse/CASSANDRA-13144
>>
>>
>> Below are the observed logs from the versions without the bug, and with
>> the bug.  The one's highlighted in yellow show the expected logs. The one's
>> highlighted in red are the one's where the node is recognized as down, and
>> shows as UNREACHABLE.
>>
>>
>>
>> Cassandra 2.1.1 Logs showing the decommissioned node :  (Without the bug)
>>
>> 2017-01-19 20:18:56,415 [GossipStage:1] DEBUG ArrivalWindow Ignoring
>> interval time of 2049943233 <(204)%20994-3233> for /X.X.X.X
>> 2017-01-19 20:18:56,416 [GossipStage:1] DEBUG StorageService Node
>> /X.X.X.X state left, tokens [ 59353109817657926242901533144729725259,
>> 60254520910109313597677907197875221475, 
>> 75698727618038614819889933974570742305,
>> 84508739091270910297310401957975430578]
>> 2017-01-19 20:18:56,416 [GossipStage:1] DEBUG Gossiper adding expire
>> time for endpoint : /X.X.X.X (1485116334088)
>> 2017-01-19 20:18:56,417 [GossipStage:1] INFO StorageService Removing
>> tokens [100434964734820719895982857900842892337,
>> 114144647582686041354301802358217767299, 
>> 13209060517964702932350041942412177,
>> 138409460913

Re : Decommissioned nodes show as DOWN in Cassandra versions 2.1.12 - 2.1.16

2017-01-23 Thread sai krishnam raju potturi
In the Cassandra versions 2.1.11 - 2.1.16, after we decommission a node or
datacenter, we observe the decommissioned nodes marked as DOWN in the
cluster when you do a "nodetool describecluster". The nodes however do not
show up in the "nodetool status" command.
The decommissioned node also does not show up in the "system_peers" table
on the nodes.

The workaround we follow is rolling restart of the cluster, which removes
the decommissioned nodes from the "UNREACHABLE STATE", and shows the actual
state of the cluster. The workaround is tedious for huge clusters.

We also verified the decommission process in CCM tool, and observed the
same issue for clusters with versions from 2.1.12 to 2.1.16. The issue was
not observed in versions prior to or later than the ones mentioned above.


Has anybody in the community observed similar issue? We've also raised a
JIRA issue regarding this.
https://issues.apache.org/jira/browse/CASSANDRA-13144


Below are the observed logs from the versions without the bug, and with the
bug.  The one's highlighted in yellow show the expected logs. The one's
highlighted in red are the one's where the node is recognized as down, and
shows as UNREACHABLE.



Cassandra 2.1.1 Logs showing the decommissioned node :  (Without the bug)

2017-01-19 20:18:56,415 [GossipStage:1] DEBUG ArrivalWindow Ignoring
interval time of 2049943233 for /X.X.X.X
2017-01-19 20:18:56,416 [GossipStage:1] DEBUG StorageService Node /X.X.X.X
state left, tokens [ 59353109817657926242901533144729725259,
60254520910109313597677907197875221475,
75698727618038614819889933974570742305,
84508739091270910297310401957975430578]
2017-01-19 20:18:56,416 [GossipStage:1] DEBUG Gossiper adding expire time
for endpoint : /X.X.X.X (1485116334088)
2017-01-19 20:18:56,417 [GossipStage:1] INFO StorageService Removing
tokens [100434964734820719895982857900842892337,
114144647582686041354301802358217767299,
13209060517964702932350041942412177,
138409460913927199437556572481804704749] for /X.X.X.X
2017-01-19 20:18:56,418 [HintedHandoff:3] INFO HintedHandOffManager
Deleting any stored hints for /X.X.X.X
2017-01-19 20:18:56,424 [GossipStage:1] DEBUG MessagingService Resetting
version for /X.X.X.X
2017-01-19 20:18:56,424 [GossipStage:1] DEBUG Gossiper removing endpoint
/X.X.X.X
2017-01-19 20:18:56,437 [GossipStage:1] DEBUG StorageService Ignoring state
change for dead or unknown endpoint: /X.X.X.X
2017-01-19 20:19:02,022 [WRITE-/X.X.X.X] DEBUG OutboundTcpConnection
attempting to connect to /X.X.X.X
2017-01-19 20:19:02,023 [HANDSHAKE-/X.X.X.X] INFO OutboundTcpConnection
Handshaking version with /X.X.X.X
2017-01-19 20:19:02,023 [WRITE-/X.X.X.X] DEBUG MessagingService Setting
version 7 for /X.X.X.X
2017-01-19 20:19:08,096 [GossipStage:1] DEBUG ArrivalWindow Ignoring
interval time of 2074454222 for /X.X.X.X
2017-01-19 20:19:54,407 [GossipStage:1] DEBUG ArrivalWindow Ignoring
interval time of 4302985797 for /X.X.X.X
2017-01-19 20:19:57,405 [GossipTasks:1] DEBUG Gossiper 6 elapsed,
/X.X.X.X gossip quarantine over
2017-01-19 20:19:57,455 [GossipStage:1] DEBUG ArrivalWindow Ignoring
interval time of 3047826501 for /X.X.X.X
2017-01-19 20:19:57,455 [GossipStage:1] DEBUG StorageService Ignoring state
change for dead or unknown endpoint: /X.X.X.X


Cassandra 2.1.16 Logs showing the decommissioned node :   (The logs in
2.1.16 show the same as 2.1.1 upto "DEBUG Gossiper 6 elapsed, /X.X.X.X
gossip quarantine over", and then is followed by "NODE is now DOWN"

017-01-19 19:52:23,687 [GossipStage:1] DEBUG StorageService.java:1883 -
Node /X.X.X.X state left, tokens [-1112888759032625467,
-228773855963737699, -311455042375
4381391, -4848625944949064281, -6920961603460018610, -8566729719076824066,
1611098831406674636, 7278843689020594771, 7565410054791352413, 9166885764,
8654747784805453046]
2017-01-19 19:52:23,688 [GossipStage:1] DEBUG Gossiper.java:1520 - adding
expire time for endpoint : /X.X.X.X (1485114743567)
2017-01-19 19:52:23,688 [GossipStage:1] INFO StorageService.java:1965 -
Removing tokens [-1112888759032625467, -228773855963737699,
-3114550423754381391, -48486259449
49064281, -6920961603460018610, 5690722015779071557, 6202373691525063547,
7191120402564284381, 7278843689020594771, 7565410054791352413,
8524200089166885764, 865474778
4805453046] for /X.X.X.X
2017-01-19 19:52:23,689 [HintedHandoffManager:1] INFO
HintedHandOffManager.java:230 - Deleting any stored hints for /X.X.X.X
2017-01-19 19:52:23,689 [GossipStage:1] DEBUG MessagingService.java:840 -
Resetting version for /X.X.X.X
2017-01-19 19:52:23,690 [GossipStage:1] DEBUG Gossiper.java:417 - removing
endpoint /X.X.X.X
2017-01-19 19:52:23,691 [GossipStage:1] DEBUG StorageService.java:1552 -
Ignoring state change for dead or unknown endpoint: /X.X.X.X
2017-01-19 19:52:31,617 [MessagingService-Outgoing-/X.X.X.X] DEBUG
OutboundTcpConnection.java:372 - attempting to connect to /X.X.X.X
2017-01-19 19:52:31,618 [HANDSHAKE-/X.X.X.X] INFO
OutboundTcpConnection.java:488 - Handshaking version 

Re: Re : Generic keystore when enabling SSL

2016-11-17 Thread sai krishnam raju potturi
hi Jacob;

 I would suggest you create your own Certificate Authority, and create
a generic keystore and trustore.

Cassandra by default does not implement HostName Verification in it's
code. All it does is to check if it's peer certificate is signed by the
trusted authority ( the root CA in the truststore).

In short; if you were to have a COMODO signed certificate, and i
have a COMODO signed certificate, i will be able to establish communication
with your node. The reason being, Cassandra only checks if the peer
certificate is signed by a trusted authority, which it'll be in this case.

   Even wild card certificates with multiple SAN's is of no use here,
as Cassandra does no SAN or CN verification.

   If you were to have your own CA, there will be no way for me to
establish the chain of trust.

thanks
Sai




On Fri, Oct 28, 2016 at 2:06 AM, Vladimir Yudovin <vla...@winguzone.com>
wrote:

> Hi Jacob,
>
> there is no problem to use the same certificate (whether issued by some
> authority or self signed) on all nodes until it's present in truststore. CN
> doesn't matter in this case, it can be any string you want.
>
> Would this impact client-to-node encryption
>
> Nu, but clients should either add nodes certificate to their truststore or
> disable validation (each Cassandra driver does this in its own way).
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 27 Oct 2016 16:45:48 -0400*Jacob Shadix
> <jacobsha...@gmail.com <jacobsha...@gmail.com>>* wrote 
>
> I am interested if anyone has taken this approach to share the same
> keystore across all the nodes with the 3rd party root/intermediate CA
> existing only in the truststore. If so, please share your experience and
> lessons learned. Would this impact client-to-node encryption as the
> certificates used in internode would not have the hostnames represented in
> CN?
>
> -- Jacob Shadix
>
> On Wed, Sep 21, 2016 at 11:40 AM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> hi Evans;
>rather than having one individual certificate for every node, we are
> looking at getting one Comodo wild-card certificate, and importing that
> into the keystore. along with the intermediate CA provided by Comodo. As
> far as the trust-store is concerned, we are looking at importing the
> intermediate CA provided along with the signed wild-card cert by Comodo.
>
>So in this case we'll be having just one keystore (generic), and
> truststore we'll be copying to all the nodes. We've run into issues
> however, and are trying to iron that out. Interested to know if anybody in
> the community has taken a similar approach.
>
>We are pretty much going on the lines of following post by LastPickle
> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-
> step-by-step-part-1-server-to-server.html. Instead of creating our own
> CA, we are relying on Comodo.
>
> thanks
> Sai
>
>
> On Wed, Sep 21, 2016 at 10:30 AM, Eric Evans <john.eric.ev...@gmail.com>
> wrote:
>
> On Tue, Sep 20, 2016 at 12:57 PM, sai krishnam raju potturi
> <pskraj...@gmail.com> wrote:
> > Due to the security policies in our company, we were asked to use 3rd
> party
> > signed certs. Since we'll require to manage 100's of individual certs, we
> > wanted to know if there is a work around with a generic keystore and
> > truststore.
>
> Can you explain what you mean by "generic keystore"?  Are you looking
> to create keystores signed by a self-signed root CA (distributed via a
> truststore)?
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>
>
>


Re: Rebuild failing while adding new datacenter

2016-10-20 Thread sai krishnam raju potturi
we faced a similar issue earlier, but that was more related to firewall
rules. The newly added datacenter was not able to communicate with the
existing datacenters on the port 7000(inter-node communication). Your's
might be a different issue, but just saying.


On Thu, Oct 20, 2016 at 4:12 PM, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello All,
>
> I have single datacenter with 3 C* nodes and we are trying to expand the
> cluster to another region/DC. I am seeing the below error while doing a 
> "nodetool
> rebuild -- name_of_existing_data_center" .
>
> [user@machine ~]$ nodetool rebuild DC1
> nodetool: Unable to find sufficient sources for streaming range
> (-402178150752044282,-396707578307430827] in keyspace system_distributed
> See 'nodetool help' or 'nodetool help '.
> [user@machine ~]$
>
> user@cqlsh> SELECT * from system_schema.keyspaces where
> keyspace_name='system_distributed';
>
>  keyspace_name | durable_writes | replication
> ---++---
> --
>  system_distributed |   True | {'class':
> 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
>
> (1 rows)
>
> To overcome this I have updated system_distributed keyspace to DC1:3 and
> DC2:3 with NetworkTopologyStrategy
>
> C* Version - 3.0.8
>
> Is this a bug that is introduced in 3.0.8 version of cassandra? as I
> haven't seen this issue with the older versions?
>


Re: Open File Handles for Deleted sstables

2016-09-28 Thread sai krishnam raju potturi
restarting the cassandra service helped get rid of those files in our
situation.

thanks
Sai

On Wed, Sep 28, 2016 at 3:15 PM, Anuj Wadehra 
wrote:

> Hi,
>
> We are facing an issue where Cassandra has open file handles for deleted
> sstable files. These open file handles keep on increasing with time and
> eventually lead to disk crisis. This is visible via lsof command.
>
> There are no Exceptions in logs.We are suspecting a race condition where
> compactions/repairs and reads are done on same sstable. I have gone through
> few JIRAs but somehow not able to coorelate the issue with those tickets.
>
> We are using 2.0.14. OS is Red Hat Linux.
>
> Any suggestions?
>
> Thanks
> Anuj
>
>
>


Re: Re : Generic keystore when enabling SSL

2016-09-21 Thread sai krishnam raju potturi
hi Evans;
   rather than having one individual certificate for every node, we are
looking at getting one Comodo wild-card certificate, and importing that
into the keystore. along with the intermediate CA provided by Comodo. As
far as the trust-store is concerned, we are looking at importing the
intermediate CA provided along with the signed wild-card cert by Comodo.

   So in this case we'll be having just one keystore (generic), and
truststore we'll be copying to all the nodes. We've run into issues
however, and are trying to iron that out. Interested to know if anybody in
the community has taken a similar approach.

   We are pretty much going on the lines of following post by LastPickle
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html.
Instead of creating our own CA, we are relying on Comodo.

thanks
Sai

On Wed, Sep 21, 2016 at 10:30 AM, Eric Evans <john.eric.ev...@gmail.com>
wrote:

> On Tue, Sep 20, 2016 at 12:57 PM, sai krishnam raju potturi
> <pskraj...@gmail.com> wrote:
> > Due to the security policies in our company, we were asked to use 3rd
> party
> > signed certs. Since we'll require to manage 100's of individual certs, we
> > wanted to know if there is a work around with a generic keystore and
> > truststore.
>
> Can you explain what you mean by "generic keystore"?  Are you looking
> to create keystores signed by a self-signed root CA (distributed via a
> truststore)?
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>


Re: Re : Generic keystore when enabling SSL

2016-09-20 Thread sai krishnam raju potturi
thanks Robert; we followed the instructions mentioned in
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra
-step-by-step-part-1-server-to-server.html. It worked great.

 Due to the security policies in our company, we were asked to
use 3rd party signed certs. Since we'll require to manage 100's of
individual certs, we wanted to know if there is a work around with a
generic keystore and truststore.

thanks
Sai


Re : Generic keystore when enabling SSL

2016-09-20 Thread sai krishnam raju potturi
hi;
  has anybody enabled SSL using a generic keystore for node-to-node
encryption. We're using 3rd party signed certificates, and want to avoid
the hassle of managing 100's of certificates.

thanks
Sai


Re: Is to ok restart DECOMMISION

2016-09-15 Thread sai krishnam raju potturi
hi Laxmi;
  what's the size of data per node? If the data is really huge, then let
the decommission process continue. Else; stop the cassandra process on the
decommissioning node, and from another node in the datacenter, do a
"nodetool removenode host-id". This might speed up the decommissioning
process since the streaming will be from 2 replicas rather than just one.
See if unthrottling the streamthroughput might help.

   Make sure there are no tcp sessions in hung state. If you any TCP
sessions in hung state, alter the tcp parameters.

sudo sysctl -w net.core.wmem_max = 16777216
sudo sysctl -w net.core.rmem_max = 16777216
sudo sysctl -w net.ipv4.tcp_window_scaling = 1
sudo sysctl -w net.ipv4.tcp_keepalive_time = 1800
sudo sysctl -w net.ipv4.tcp_keepalive_probes = 9
sudo sysctl -w net.ipv4.tcp_keepalive_intvl = 75


thanks

On Thu, Sep 15, 2016 at 9:28 AM, laxmikanth sadula 
wrote:

> I started decommssioned a node in our cassandra cluster.
> But its taking too long time (more than 12 hrs) , so I would like to
> restart(stop/kill the node & restart 'node decommission' again)..
>
> Does killing node/stopping decommission and restarting decommission will
> cause any issues to cluster?
>
> Using c*-2.0.17 , 2 Data centers, each DC with 3 groups each , each group
> with 3 nodes with RF-3
>
> --
> Thanks...!
>


Re: Re : Cluster performance after enabling SSL

2016-09-14 Thread sai krishnam raju potturi
Thanks a lot. That was really good info.

On Tue, Sep 13, 2016, 15:41 G P <gil.mpinhe...@hotmail.com> wrote:

> Read this:
>
> http://www.aifb.kit.edu/images/5/58/IC2E2014-Performance_Overhead_TLS.pdf
>
> It can cause bigger variances in latencies, but not much.
> Terça-feira, 13 Setembro 2016, 08:01PM +01:00 de sai krishnam raju potturi
> pskraj...@gmail.com:
>
>
> hi;
>   will enabling SSL (node-to-node) cause an overhead in the performance of
> Cassandra? We have tried it out on a small test cluster while running
> Cassandra-stress tool, and did not see much difference in terms of read and
> write latencies.
>  Could somebody throw some light regarding any impact SSL will have on
> large clusters in terms of performance. Thanks in advance.
>
> Cassandra-version (2.1.15)
>
> thanks
> Sai
>
>


Re: Re : Cluster performance after enabling SSL

2016-09-14 Thread sai krishnam raju potturi
thanks Surabhi; we'll do further tests regarding this. The per node tps are
less, but for the overall cluster the tps are like 90k.

On Tue, Sep 13, 2016 at 3:25 PM, Surbhi Gupta <surbhi.gupt...@gmail.com>
wrote:

> We have seen a little overhead in latencies while enabling the
> client_encryption.
> Our cluster gets around 40-50K reads and writes per second.
>
> On 13 September 2016 at 12:01, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> hi;
>>   will enabling SSL (node-to-node) cause an overhead in the performance
>> of Cassandra? We have tried it out on a small test cluster while running
>> Cassandra-stress tool, and did not see much difference in terms of read and
>> write latencies.
>>  Could somebody throw some light regarding any impact SSL will have
>> on large clusters in terms of performance. Thanks in advance.
>>
>> Cassandra-version (2.1.15)
>>
>> thanks
>> Sai
>>
>
>


Re : Cluster performance after enabling SSL

2016-09-13 Thread sai krishnam raju potturi
hi;
  will enabling SSL (node-to-node) cause an overhead in the performance of
Cassandra? We have tried it out on a small test cluster while running
Cassandra-stress tool, and did not see much difference in terms of read and
write latencies.
 Could somebody throw some light regarding any impact SSL will have on
large clusters in terms of performance. Thanks in advance.

Cassandra-version (2.1.15)

thanks
Sai


Re: Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-09-11 Thread sai krishnam raju potturi
Make sure there is no spike in the load-avg on the existing nodes, as that
might affect your application read request latencies.

On Sun, Sep 11, 2016, 17:10 Jens Rantil  wrote:

> Hi Bhuvan,
>
> I have done such expansion multiple times and can really recommend
> bootstrapping a new DC and pointing your clients to it. The process is so
> much faster and the documentation you referred to has worked out fine for
> me.
>
> Cheers,
> Jens
>
>
> On Sunday, September 11, 2016, Bhuvan Rawal  wrote:
>
>> Hi,
>>
>> We are running Cassandra 3.6 and want to bump up Cassandra nodes in an
>> existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to
>> leverage more memory instead of m4.2xlarge). Bootstrapping a node would
>> take 7-8 hours.
>>
>> If this activity is performed serially then it will take 5-6 days. I had
>> a look at CASSANDRA-7069
>>  and a bit of
>> discussion in the past at -
>> http://grokbase.com/t/cassandra/user/147gcqvybg/adding-more-nodes-into-the-cluster.
>> Wanted to know if the limitation is still applicable and race condition
>> could occur in 3.6 version.
>>
>> If this is not the case can we add a new datacenter as mentioned here
>> opsAddDCToCluster
>> 
>>  and
>> bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in
>> cassandra.yaml and rebuilding nodes simultaneously in the new dc?
>>
>>
>> Thanks & Regards,
>> Bhuvan
>>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook  Linkedin
> 
>  Twitter 
>
>


Re: Re : Default values in Cassandra YAML file

2016-08-10 Thread sai krishnam raju potturi
thanks Romain, this had been a doubt for quiet a while.

thanks

On Wed, Aug 10, 2016 at 4:59 PM, Romain Hardouin <romainh...@yahoo.fr>
wrote:

> Yes. You can even see that some caution is taken in the code
> https://github.com/apache/cassandra/blob/trunk/
> src/java/org/apache/cassandra/config/Config.java#L131
> (But if I were you I would not rely on this. It's always better to be
> explicit.)
>
> Best,
>
> Romain
>
> Le Mercredi 10 août 2016 17h50, sai krishnam raju potturi <
> pskraj...@gmail.com> a écrit :
>
>
> hi;
>if there are any missed attributes in the YAML file, will Cassandra
> pick up default values for those attributes.
>
> thanks
>
>
>
>


Re : Default values in Cassandra YAML file

2016-08-10 Thread sai krishnam raju potturi
hi;
   if there are any missed attributes in the YAML file, will Cassandra pick
up default values for those attributes.

thanks


Re: Re : Purging tombstones from a particular row in SSTable

2016-07-28 Thread sai krishnam raju potturi
thanks a lot Alain. That was really great info.

The issues here was that tombstones were not in the SSTable, but rather in
the Memtable. We had to a nodetool flush, and run a nodetool compact to get
rid of the tombstones, a million of them. The size of the largest SSTable
was actually 48MB.

This link was helpful in getting the count of tombstones in a sstable,
which was 0 in our case.
https://gist.github.com/JensRantil/063b7c56ca4a8dfe1c50

The application team did not have a good model. They are working on a
new datamodel.

thanks

On Wed, Jul 27, 2016 at 7:17 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Hi,
>
> I just released a detailed post about tombstones today that might be of
> some interest for you:
> http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
>
> 220kb worth of tombstones doesn’t seem like enough to worry about.
>
>
> +1
>
> I believe you might be missing some other bigger SSTable having a lot of
> tombstones as well. Finding the biggest sstable and reading the tombstone
> ratio from there might be more relevant.
>
> You also should give a try to: "unchecked_tombstone_compaction" set to
> true rather than tuning other options so aggressively. The "single SSTable
> compaction" section of my post might help you on this issue:
> http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html#single-sstable-compaction
>
> Other thoughts:
>
> Also if you use TTLs and timeseries, using TWCS instead of STCS could be
> more efficient evicting tombstones.
>
> we have a columnfamily that has around 1000 rows, with one row is really
>> huge (million columns)
>
>
> I am sorry to say that this model does not look that great. Imbalances
> might become an issue as a few nodes will handle a lot more load than the
> rest of the nodes. Also even if this is getting improved in newer versions
> of Cassandra, wide rows are something you want to avoid while using 2.0.14
> (which is no longer supported for about a year now). I know it is not
> always easy and never the good time, but maybe should you consider
> upgrading both your model and your version of Cassandra (regardless of the
> fact you manage to solve this issue or not with
> "unchecked_tombstone_compaction").
>
> Good luck,
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-07-28 0:00 GMT+02:00 sai krishnam raju potturi <pskraj...@gmail.com>:
>
>> The read queries are continuously failing though because of the
>> tombstones. "Request did not complete within rpc_timeout."
>>
>> thanks
>>
>>
>> On Wed, Jul 27, 2016 at 5:51 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>>> 220kb worth of tombstones doesn’t seem like enough to worry about.
>>>
>>>
>>>
>>>
>>>
>>> *From: *sai krishnam raju potturi <pskraj...@gmail.com>
>>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>> *Date: *Wednesday, July 27, 2016 at 2:43 PM
>>> *To: *Cassandra Users <user@cassandra.apache.org>
>>> *Subject: *Re: Re : Purging tombstones from a particular row in SSTable
>>>
>>>
>>>
>>> and also the sstable size in question is like 220 kb in size.
>>>
>>>
>>>
>>> thanks
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jul 27, 2016 at 5:41 PM, sai krishnam raju potturi <
>>> pskraj...@gmail.com> wrote:
>>>
>>> it's set to 1800 Vinay.
>>>
>>>
>>>
>>>  bloom_filter_fp_chance=0.01 AND
>>>
>>>   caching='KEYS_ONLY' AND
>>>
>>>   comment='' AND
>>>
>>>   dclocal_read_repair_chance=0.10 AND
>>>
>>>   gc_grace_seconds=1800 AND
>>>
>>>   index_interval=128 AND
>>>
>>>   read_repair_chance=0.000000 AND
>>>
>>>   replicate_on_write='true' AND
>>>
>>>   populate_io_cache_on_flush='false' AND
>>>
>>>   default_time_to_live=0 AND
>>>
>>>   speculative_retry='99.0PERCENTILE' AND
>>>
>>>   memtable_flush_period_in_ms=0 AND
>>>
>>>   compaction={'min_sstable_size': '1024', 'tombstone_threshold': '0.01',
>>> 'tombstone_compaction_interval': '1800', 'class':
>>> 'SizeTieredCompactionStrategy'} AND
>>>
>>>   compression={'sstable_compression'

Re: Re : Purging tombstones from a particular row in SSTable

2016-07-27 Thread sai krishnam raju potturi
The read queries are continuously failing though because of the tombstones.
"Request did not complete within rpc_timeout."

thanks


On Wed, Jul 27, 2016 at 5:51 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> 220kb worth of tombstones doesn’t seem like enough to worry about.
>
>
>
>
>
> *From: *sai krishnam raju potturi <pskraj...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Wednesday, July 27, 2016 at 2:43 PM
> *To: *Cassandra Users <user@cassandra.apache.org>
> *Subject: *Re: Re : Purging tombstones from a particular row in SSTable
>
>
>
> and also the sstable size in question is like 220 kb in size.
>
>
>
> thanks
>
>
>
>
>
> On Wed, Jul 27, 2016 at 5:41 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> it's set to 1800 Vinay.
>
>
>
>  bloom_filter_fp_chance=0.01 AND
>
>   caching='KEYS_ONLY' AND
>
>   comment='' AND
>
>   dclocal_read_repair_chance=0.10 AND
>
>   gc_grace_seconds=1800 AND
>
>   index_interval=128 AND
>
>   read_repair_chance=0.00 AND
>
>   replicate_on_write='true' AND
>
>   populate_io_cache_on_flush='false' AND
>
>   default_time_to_live=0 AND
>
>   speculative_retry='99.0PERCENTILE' AND
>
>   memtable_flush_period_in_ms=0 AND
>
>   compaction={'min_sstable_size': '1024', 'tombstone_threshold': '0.01',
> 'tombstone_compaction_interval': '1800', 'class':
> 'SizeTieredCompactionStrategy'} AND
>
>   compression={'sstable_compression': 'LZ4Compressor'};
>
>
>
> thanks
>
>
>
>
>
> On Wed, Jul 27, 2016 at 5:34 PM, Vinay Kumar Chella <
> vinaykumar...@gmail.com> wrote:
>
> What is your GC_grace_seconds set to?
>
>
>
> On Wed, Jul 27, 2016 at 1:13 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> thanks Vinay and DuyHai.
>
>
>
> we are using verison 2.0.14. I did "user defined compaction" following
> the instructions in the below link, The tombstones still persist even after
> that.
>
>
>
> https://gist.github.com/jeromatron/e238e5795b3e79866b83
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_jeromatron_e238e5795b3e79866b83=CwMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=-sQ3Vf5bs3z4cO36h_AU-kIhMGVKcb3eCtzIb-fZ1Fc=0RQ3r6c0L4vICot8eqpOBKBAuKiKEkoKdmcjLbvBBwY=>
>
>
>
> Also, we changed the tombstone_compaction_interval : 1800
> and tombstone_threshold : 0.1, but it did not help.
>
>
>
> thanks
>
>
>
>
>
>
>
> On Wed, Jul 27, 2016 at 4:05 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
> This feature is also exposed directly in nodetool from version Cassandra
> 3.4
>
>
>
> nodetool compact --user-defined 
>
>
>
> On Wed, Jul 27, 2016 at 9:58 PM, Vinay Chella <vche...@netflix.com> wrote:
>
> You can run file level compaction using JMX to get rid of tombstones in
> one SSTable. Ensure you set GC_Grace_seconds such that
>
>
>
> current time >= deletion(tombstone time)+ GC_Grace_seconds
>
>
>
> File level compaction
>
>
>
> /usr/bin/java -jar cmdline-jmxclient-0.10.3.jar - localhost:
>
> ​{​
>
> ​port}
>
>  org.apache.cassandra.db:type=CompactionManager 
> forceUserDefinedCompaction="'${KEYSPACE}','${
>
> ​SSTABLEFILENAME
>
> }'""
>
>
>
>
>
>
>
>
> On Wed, Jul 27, 2016 at 11:59 AM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> hi;
>
>   we have a columnfamily that has around 1000 rows, with one row is really
> huge (million columns). 95% of the row contains tombstones. Since there
> exists just one SSTable , there is going to be no compaction kicked in. Any
> way we can get rid of the tombstones in that row?
>
>
>
> Userdefined compaction nor nodetool compact had no effect. Any ideas folks?
>
>
>
> thanks
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Re : Purging tombstones from a particular row in SSTable

2016-07-27 Thread sai krishnam raju potturi
it's set to 1800 Vinay.

 bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.10 AND
  gc_grace_seconds=1800 AND
  index_interval=128 AND
  read_repair_chance=0.00 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'min_sstable_size': '1024', 'tombstone_threshold': '0.01',
'tombstone_compaction_interval': '1800', 'class':
'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

thanks


On Wed, Jul 27, 2016 at 5:34 PM, Vinay Kumar Chella <vinaykumar...@gmail.com
> wrote:

> What is your GC_grace_seconds set to?
>
> On Wed, Jul 27, 2016 at 1:13 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> thanks Vinay and DuyHai.
>>
>> we are using verison 2.0.14. I did "user defined compaction"
>> following the instructions in the below link, The tombstones still persist
>> even after that.
>>
>> https://gist.github.com/jeromatron/e238e5795b3e79866b83
>>
>> Also, we changed the tombstone_compaction_interval : 1800 and 
>> tombstone_threshold
>> : 0.1, but it did not help.
>>
>> thanks
>>
>>
>>
>> On Wed, Jul 27, 2016 at 4:05 PM, DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>>> This feature is also exposed directly in nodetool from version Cassandra
>>> 3.4
>>>
>>> nodetool compact --user-defined 
>>>
>>> On Wed, Jul 27, 2016 at 9:58 PM, Vinay Chella <vche...@netflix.com>
>>> wrote:
>>>
>>>> You can run file level compaction using JMX to get rid of tombstones in
>>>> one SSTable. Ensure you set GC_Grace_seconds such that
>>>>
>>>> current time >= deletion(tombstone time)+ GC_Grace_seconds
>>>>
>>>>
>>>> File level compaction
>>>>
>>>> /usr/bin/java -jar cmdline-jmxclient-0.10.3.jar - localhost:
>>>>> ​{​
>>>>> ​port}
>>>>>  org.apache.cassandra.db:type=CompactionManager 
>>>>> forceUserDefinedCompaction="'${KEYSPACE}','${
>>>>> ​SSTABLEFILENAME
>>>>> }'""
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jul 27, 2016 at 11:59 AM, sai krishnam raju potturi <
>>>> pskraj...@gmail.com> wrote:
>>>>
>>>>> hi;
>>>>>   we have a columnfamily that has around 1000 rows, with one row is
>>>>> really huge (million columns). 95% of the row contains tombstones. Since
>>>>> there exists just one SSTable , there is going to be no compaction kicked
>>>>> in. Any way we can get rid of the tombstones in that row?
>>>>>
>>>>> Userdefined compaction nor nodetool compact had no effect. Any ideas
>>>>> folks?
>>>>>
>>>>> thanks
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Re : Purging tombstones from a particular row in SSTable

2016-07-27 Thread sai krishnam raju potturi
thanks Vinay and DuyHai.

we are using verison 2.0.14. I did "user defined compaction" following
the instructions in the below link, The tombstones still persist even after
that.

https://gist.github.com/jeromatron/e238e5795b3e79866b83

Also, we changed the tombstone_compaction_interval : 1800 and
tombstone_threshold
: 0.1, but it did not help.

thanks



On Wed, Jul 27, 2016 at 4:05 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> This feature is also exposed directly in nodetool from version Cassandra
> 3.4
>
> nodetool compact --user-defined 
>
> On Wed, Jul 27, 2016 at 9:58 PM, Vinay Chella <vche...@netflix.com> wrote:
>
>> You can run file level compaction using JMX to get rid of tombstones in
>> one SSTable. Ensure you set GC_Grace_seconds such that
>>
>> current time >= deletion(tombstone time)+ GC_Grace_seconds
>>
>>
>> File level compaction
>>
>> /usr/bin/java -jar cmdline-jmxclient-0.10.3.jar - localhost:
>>> ​{​
>>> ​port}
>>>  org.apache.cassandra.db:type=CompactionManager 
>>> forceUserDefinedCompaction="'${KEYSPACE}','${
>>> ​SSTABLEFILENAME
>>> }'""
>>>
>>>
>>
>>
>>
>> On Wed, Jul 27, 2016 at 11:59 AM, sai krishnam raju potturi <
>> pskraj...@gmail.com> wrote:
>>
>>> hi;
>>>   we have a columnfamily that has around 1000 rows, with one row is
>>> really huge (million columns). 95% of the row contains tombstones. Since
>>> there exists just one SSTable , there is going to be no compaction kicked
>>> in. Any way we can get rid of the tombstones in that row?
>>>
>>> Userdefined compaction nor nodetool compact had no effect. Any ideas
>>> folks?
>>>
>>> thanks
>>>
>>>
>>>
>>
>>
>


Re : Purging tombstones from a particular row in SSTable

2016-07-27 Thread sai krishnam raju potturi
hi;
  we have a columnfamily that has around 1000 rows, with one row is really
huge (million columns). 95% of the row contains tombstones. Since there
exists just one SSTable , there is going to be no compaction kicked in. Any
way we can get rid of the tombstones in that row?

Userdefined compaction nor nodetool compact had no effect. Any ideas folks?

thanks


Re: Re : Recommended procedure for enabling SSL on a live production cluster

2016-07-26 Thread sai krishnam raju potturi
hi Nate;
thanks for the help. Upgrading to 2.1.12 seems to be the solution for
client to node encryption on NATIVE port.

The other issue we are facing is with the STORAGE port. The reason behind
this is that we need to switch back and forth between different
internode_encryption modes, and we need C* servers to keep running in
transient state or during mode switching. Currently this is not possible.
For example, we have a internode_encryption=none cluster in a multi-region
AWS environment and want to set internode_encryption=dc by rolling restart
C* nodes. However, the node with internode_encryption=dc, does not open to
listen to non-ssl port. As a result, we have a splitted brain cluster here.

Below is a ticket opened for the exact same issue. Has anybody overcome any
such issue on a production cluster? Thanks in advance.

https://issues.apache.org/jira/browse/CASSANDRA-8751

thanks
Sai

On Wed, Jul 20, 2016 at 5:25 PM, Nate McCall <n...@thelastpickle.com> wrote:

> If you migrate to the latest 2.1 first, you can make this a non-issue as
> 2.1.12 and above support simultaneous SSL and plain on the same port for
> exactly this use case:
> https://issues.apache.org/jira/browse/CASSANDRA-10559
>
> On Thu, Jul 21, 2016 at 3:02 AM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> hi ;
>>  if possible could someone shed some light on this. I followed a
>> post from the lastpickle which was very informative, but we had some
>> concerns when it came to enabling SSL on a live production cluster.
>>
>>
>> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html
>>
>> 1 : We generally remove application traffic from a DC which has ongoing
>> changes, just not to affect end customers if things go south during the
>> update.
>>
>> 2 : So once DC-A has been restarted after enabling SSL, this would be
>> missing writes during that period, as the DC-A would be shown as down by
>> the other DC's. We will not be able to put back application traffic on DC-A
>> until we run inter-dc repairs, which will happen only  when SSL has been
>> enabled on all DC's.
>>
>> 3 : Repeating the procedure for every DC will lead to some missed writes
>> across all DC's.
>>
>> 4 : We could do the rolling restart of a DC-A with application traffic
>> on, but we are concerned if for any infrastructure related reason we have
>> an issue, we will have to serve traffic from another DC-B, which might be
>> missing on writes to the DC-A during that period.
>>
>> We have 4 DC's which 50 nodes each.
>>
>>
>> thanks
>> Sai
>>
>> -- Forwarded message --
>> From: sai krishnam raju potturi <pskraj...@gmail.com>
>> Date: Mon, Jul 18, 2016 at 11:06 AM
>> Subject: Re : Recommended procedure for enabling SSL on a live production
>> cluster
>> To: user@cassandra.apache.org
>>
>>
>> Hi;
>>   We have a Cassandra cluster ( version 2.0.14 ) spanning across 4
>> datacenters with 50 nodes each. We are planning to enable SSL between the
>> datacenters. We are following the standard procedure for enabling SSL (
>> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html)
>> . We were planning to enable SSL for each datacenter at a time.
>>
>> During the rolling restart, it's expected that the nodes in the
>> datacenter that had the service restarted, will show as down by the nodes
>> in other datacenters that have not restarted the service. This would lead
>> to missed writes among various nodes during this procedure.
>>
>> What would be the recommended procedure for enabling SSL on a live
>> production cluster without the chaos.
>>
>> thanks
>> Sai
>>
>>
>
>
> --
> -
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: Re : Recommended procedure for enabling SSL on a live production cluster

2016-07-21 Thread sai krishnam raju potturi
Thank you Nate. We are currently on 2.0.14. So the only option we are left
with is the upgrade to 2.1.12.
So for the earlier versions there is no way to enable SSL once the cluster
is in production. Thanks again Nate.

Thanks
Sai

On Wed, Jul 20, 2016, 17:26 Nate McCall <n...@thelastpickle.com> wrote:

> If you migrate to the latest 2.1 first, you can make this a non-issue as
> 2.1.12 and above support simultaneous SSL and plain on the same port for
> exactly this use case:
> https://issues.apache.org/jira/browse/CASSANDRA-10559
>
> On Thu, Jul 21, 2016 at 3:02 AM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> hi ;
>>  if possible could someone shed some light on this. I followed a
>> post from the lastpickle which was very informative, but we had some
>> concerns when it came to enabling SSL on a live production cluster.
>>
>>
>> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html
>>
>> 1 : We generally remove application traffic from a DC which has ongoing
>> changes, just not to affect end customers if things go south during the
>> update.
>>
>> 2 : So once DC-A has been restarted after enabling SSL, this would be
>> missing writes during that period, as the DC-A would be shown as down by
>> the other DC's. We will not be able to put back application traffic on DC-A
>> until we run inter-dc repairs, which will happen only  when SSL has been
>> enabled on all DC's.
>>
>> 3 : Repeating the procedure for every DC will lead to some missed writes
>> across all DC's.
>>
>> 4 : We could do the rolling restart of a DC-A with application traffic
>> on, but we are concerned if for any infrastructure related reason we have
>> an issue, we will have to serve traffic from another DC-B, which might be
>> missing on writes to the DC-A during that period.
>>
>> We have 4 DC's which 50 nodes each.
>>
>>
>> thanks
>> Sai
>>
>> -- Forwarded message --
>> From: sai krishnam raju potturi <pskraj...@gmail.com>
>> Date: Mon, Jul 18, 2016 at 11:06 AM
>> Subject: Re : Recommended procedure for enabling SSL on a live production
>> cluster
>> To: user@cassandra.apache.org
>>
>>
>> Hi;
>>   We have a Cassandra cluster ( version 2.0.14 ) spanning across 4
>> datacenters with 50 nodes each. We are planning to enable SSL between the
>> datacenters. We are following the standard procedure for enabling SSL (
>> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html)
>> . We were planning to enable SSL for each datacenter at a time.
>>
>> During the rolling restart, it's expected that the nodes in the
>> datacenter that had the service restarted, will show as down by the nodes
>> in other datacenters that have not restarted the service. This would lead
>> to missed writes among various nodes during this procedure.
>>
>> What would be the recommended procedure for enabling SSL on a live
>> production cluster without the chaos.
>>
>> thanks
>> Sai
>>
>>
>
>
> --
> -
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: Re : Recommended procedure for enabling SSL on a live production cluster

2016-07-20 Thread sai krishnam raju potturi
hi ;
 if possible could someone shed some light on this. I followed a
post from the lastpickle which was very informative, but we had some
concerns when it came to enabling SSL on a live production cluster.

http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html

1 : We generally remove application traffic from a DC which has ongoing
changes, just not to affect end customers if things go south during the
update.

2 : So once DC-A has been restarted after enabling SSL, this would be
missing writes during that period, as the DC-A would be shown as down by
the other DC's. We will not be able to put back application traffic on DC-A
until we run inter-dc repairs, which will happen only  when SSL has been
enabled on all DC's.

3 : Repeating the procedure for every DC will lead to some missed writes
across all DC's.

4 : We could do the rolling restart of a DC-A with application traffic on,
but we are concerned if for any infrastructure related reason we have an
issue, we will have to serve traffic from another DC-B, which might be
missing on writes to the DC-A during that period.

We have 4 DC's which 50 nodes each.


thanks
Sai

-- Forwarded message --
From: sai krishnam raju potturi <pskraj...@gmail.com>
Date: Mon, Jul 18, 2016 at 11:06 AM
Subject: Re : Recommended procedure for enabling SSL on a live production
cluster
To: user@cassandra.apache.org


Hi;
  We have a Cassandra cluster ( version 2.0.14 ) spanning across 4
datacenters with 50 nodes each. We are planning to enable SSL between the
datacenters. We are following the standard procedure for enabling SSL (
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html)
. We were planning to enable SSL for each datacenter at a time.

During the rolling restart, it's expected that the nodes in the
datacenter that had the service restarted, will show as down by the nodes
in other datacenters that have not restarted the service. This would lead
to missed writes among various nodes during this procedure.

What would be the recommended procedure for enabling SSL on a live
production cluster without the chaos.

thanks
Sai


Re : Recommended procedure for enabling SSL on a live production cluster

2016-07-18 Thread sai krishnam raju potturi
Hi;
  We have a Cassandra cluster ( version 2.0.14 ) spanning across 4
datacenters with 50 nodes each. We are planning to enable SSL between the
datacenters. We are following the standard procedure for enabling SSL (
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html)
. We were planning to enable SSL for each datacenter at a time.

During the rolling restart, it's expected that the nodes in the
datacenter that had the service restarted, will show as down by the nodes
in other datacenters that have not restarted the service. This would lead
to missed writes among various nodes during this procedure.

What would be the recommended procedure for enabling SSL on a live
production cluster without the chaos.

thanks
Sai


Re: Setting bloom_filter_fp_chance < 0.01

2016-05-18 Thread sai krishnam raju potturi
hi Adarsh;
were there any drawbacks to setting the bloom_filter_fp_chance  to the
default value?

thanks
Sai

On Wed, May 18, 2016 at 2:21 AM, Adarsh Kumar  wrote:

> Hi,
>
> What is the impact of setting bloom_filter_fp_chance < 0.01.
>
> During performance tuning I was trying to tune bloom_filter_fp_chance and
> have following questions:
>
> 1). Why bloom_filter_fp_chance = 0 is not allowed. (
> https://issues.apache.org/jira/browse/CASSANDRA-5013)
> 2). What is the maximum/recommended value of bloom_filter_fp_chance (if we
> do not have any limitation for bloom filter size).
>
> NOTE: We are using default SizeTieredCompactionStrategy on
> cassandra  2.1.8.621
>
> Thanks in advance..:)
>
> Adarsh Kumar
>


Re: Re : Optimum value for native_transport_max_concurrent_connections and native_transport_max_concurrent_connections_per_ip

2016-05-11 Thread sai krishnam raju potturi
thanks Alain. We were planning to move to 2.0.17 to begin with, and later
move to 2.1. We will try tweaking the number of connections from the
application side to begin with.

thanks
Sai

On Wed, May 11, 2016 at 10:12 AM, Alain RODRIGUEZ <arodr...@gmail.com>
wrote:

> Hi,
>
> tl;dr: Would you like to give Cassandra 2.1 a try?
>
> Longer answer:
>
> With the good versions of both the driver and Cassandra, you could be
> using the V3 protocol, which dramatically improved way connections work.
>
> http://www.datastax.com/dev/blog/java-driver-2-1-2-native-protocol-v3
>
> Also, in  C*2.1, nodetool tpstats outputs the number of blocked native
> requests, allowing to better tune the
> native_transport_max_concurrent_connections.
>
> If you don't want to upgrade to 2.1, just try tuning the number of
> connection on the client side and monitor. About
> native_transport_max_concurrent_connections, as for other parameters,
> starting from defaults and monitoring the changes, working as much as
> possible on a canary node is often the best way to go.
>
> Hope this helps,
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-04-29 16:39 GMT+02:00 sai krishnam raju potturi <pskraj...@gmail.com>
> :
>
>> hi;
>>   we are upgrading our cluster from apache-cassandra 2.0.14 to 2.0.17.
>> We have been facing SYN flooding issue (port 9042) in our current
>> version of Cassandra at times. We are hoping to tackle the SYN flooding
>> issues with the following attributes in the YAML file for 2.0.17
>>
>> native_transport_max_concurrent_connections
>>
>> native_transport_max_concurrent_connections_per_ip
>>
>>
>> Are there any observed limitations for the above mentioned attributes.
>> During the peak hours each node serves around 1500 connections. Please
>> suggest optimal values for the mentioned attributes.
>>
>>
>> thanks
>>
>
>


Re : Optimum value for native_transport_max_concurrent_connections and native_transport_max_concurrent_connections_per_ip

2016-04-29 Thread sai krishnam raju potturi
hi;
  we are upgrading our cluster from apache-cassandra 2.0.14 to 2.0.17. We
have been facing SYN flooding issue (port 9042) in our current version of
Cassandra at times. We are hoping to tackle the SYN flooding issues with
the following attributes in the YAML file for 2.0.17

native_transport_max_concurrent_connections

native_transport_max_concurrent_connections_per_ip


Are there any observed limitations for the above mentioned attributes.
During the peak hours each node serves around 1500 connections. Please
suggest optimal values for the mentioned attributes.


thanks


Re : Optimum value for native_transport_max_concurrent_connections and native_transport_max_concurrent_connections_per_ip

2016-04-05 Thread sai krishnam raju potturi
hi;
  we are upgrading our cluster from apache-cassandra 2.0.14 to 2.0.17. We
have been facing SYN flooding issue (port 9042) in our current version of
Cassandra at times. We are hoping to tackle the SYN flooding issues with
the following attributes in the YAML file for 2.0.17

native_transport_max_concurrent_connections

native_transport_max_concurrent_connections_per_ip


Are there any observed limitations for the above mentioned attributes.
During the peak hours each node serves around 1500 connections. Please
suggest optimal values for the mentioned attributes.


thanks
Sai Potturi


Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

2016-02-18 Thread sai krishnam raju potturi
thanks a lot Alian. We did rely on "unsafeassasinate" earlier, which
worked. We were planning to upgrade from 2.0.14 version to 2.1.12, on all
our clusters.
  But we are trying to figure out why decommissioned nodes are showing up
in the "nodetool describecluster" as "UNREACHABLE".

thanks
Sai

On Wed, Feb 17, 2016 at 5:42 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Hi,
>
> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>
>
> I believe this is the expected behavior, we keep some a trace of leaving
> nodes for a few days, this shouldn't be an issue for you
>
> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>>
>
> This is a weird behaviour I haven't see for a while. You might want to dig
> this some more.
>
> Restarting the entire cluster,  everytime a node is decommissioned does
>> not seem right
>>
>
> Meanwhile, if you are sure the node is out and streams have ended, I guess
> it could be ok to use a JMX client (MX4J, JConsole...) and then use the JMX
> method Gossiper.unsafeAssassinateEndpoints(ip_address) to assassinate the
> gone node from any of the remaining nodes.
>
> How to -->
> http://tumblr.doki-pen.org/post/22654515359/assassinating-cassandra-nodes
> (3 years old post, I partially read it, but I think it might still be
> relevant)
>
> Has anybody experienced similar behaviour
>
>
> FTR, 3 years old similar issue I faced -->
> http://grokbase.com/t/cassandra/user/127knx7nn0/unreachable-node-not-in-nodetool-ring
>
> FWIW, people using C* = 3.x, this is exposed through nodetool -->
> https://docs.datastax.com/en/cassandra/3.x/cassandra/tools/toolsAssassinate.html
>
> Keep in mind that something called 'unsafe' and 'assassinate' at the same
> time is not something you want to use in a regular decommissioning process
> as it drop the node with no file transfer, you basically totally lose a
> node (unless node is out already which seems to be your case, it should be
> safe to use it in your case). I only used it to fix gossip status in the
> past or at some point when forcing a removenode was not working, followed
> by full repairs on remaining nodes.
>
> C*heers,
> -
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-02-16 20:08 GMT+01:00 sai krishnam raju potturi <pskraj...@gmail.com>
> :
>
>> hi;
>> we have a 12 node cluster across 2 datacenters. We are currently
>> using cassandra 2.1.12 version.
>>
>> SNITCH : GossipingPropertyFileSnitch
>>
>> When we decommissioned few nodes in a particular datacenter and observed
>> the following :
>>
>> nodetool status shows only the live nodes in the cluster.
>>
>> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>>
>> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>>
>>
>> When the live nodes were restarted, "nodetool describecluster" shows
>> only the live nodes, which is expected.
>>
>> Purging the gossip info too did not help.
>>
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
>> 140213388002871593911508364312533329916,
>>  98576967436431350637134234839492449485] for /X.X.X.X
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [6977666116265389022494863106850615,
>> 111270759969411259938117902792984586225,
>> 138611464975439236357814418845450428175] for /X.X.X.X
>>
>> Has anybody experienced similar behaviour. Restarting the entire cluster,
>>  everytime a node is decommissioned does not seem right. Thanks in advance
>> for the help.
>>
>>
>> thanks
>> Sai
>>
>>
>>
>


Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

2016-02-18 Thread sai krishnam raju potturi
thank you Ben. We are using cassandra 2.1.12 version. We did face the bug
mentioned  https://issues.apache.org/jira/browse/CASSANDRA-10371 in DSE
4.6.7, in another cluster. It's strange we are seeing that even
in cassandra 2.1.12 version.

  The "nodetool describecluster" showing decommissioned nodes as
UNREACHABLE is something we are seeing for the first time.

thanks
Sai

On Wed, Feb 17, 2016 at 12:36 PM, Ben Bromhead <b...@instaclustr.com> wrote:

> I'm not sure what version of Cassandra you are running so here is some
> general advice:
>
>- Gossip entries for decommissioned nodes will hang around for a few
>days to help catch up nodes in the case of a partition. This is why you see
>the decommissioned nodes listed as LEFT. This is intentional
>- If you keep seeing those entries in your logs and you are on 2.0.x,
>you might be impacted by
>https://issues.apache.org/jira/browse/CASSANDRA-10371. In this case
>upgrade to 2.1 or you can try the work arounds listed in the ticket.
>
> Ben
>
> On Tue, 16 Feb 2016 at 11:09 sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> hi;
>> we have a 12 node cluster across 2 datacenters. We are currently
>> using cassandra 2.1.12 version.
>>
>> SNITCH : GossipingPropertyFileSnitch
>>
>> When we decommissioned few nodes in a particular datacenter and observed
>> the following :
>>
>> nodetool status shows only the live nodes in the cluster.
>>
>> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>>
>> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>>
>>
>> When the live nodes were restarted, "nodetool describecluster" shows
>> only the live nodes, which is expected.
>>
>> Purging the gossip info too did not help.
>>
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
>> 140213388002871593911508364312533329916,
>>  98576967436431350637134234839492449485] for /X.X.X.X
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [6977666116265389022494863106850615,
>> 111270759969411259938117902792984586225,
>> 138611464975439236357814418845450428175] for /X.X.X.X
>>
>> Has anybody experienced similar behaviour. Restarting the entire cluster,
>>  everytime a node is decommissioned does not seem right. Thanks in advance
>> for the help.
>>
>>
>> thanks
>> Sai
>>
>>
>> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>


Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

2016-02-17 Thread sai krishnam raju potturi
thanks Rajesh. What we have observed is the decommissioned nodes show up as
"UNREACHABLE" in "nodetool describecluster" command. Their status shows up
as "LEFT" in "nodetool gossipinfo". This is observed in 2.1.12 version.

Decommissioned nodes did not show up in the "nodetool describecluster" and
"nodetool gossipinfo" in 2.0.14 version that we use in another cluster.


thanks
Sai

On Tue, Feb 16, 2016 at 2:08 PM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> hi;
> we have a 12 node cluster across 2 datacenters. We are currently using
> cassandra 2.1.12 version.
>
> SNITCH : GossipingPropertyFileSnitch
>
> When we decommissioned few nodes in a particular datacenter and observed
> the following :
>
> nodetool status shows only the live nodes in the cluster.
>
> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>
> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>
>
> When the live nodes were restarted, "nodetool describecluster" shows only
> the live nodes, which is expected.
>
> Purging the gossip info too did not help.
>
> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
> INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
> 140213388002871593911508364312533329916,
>  98576967436431350637134234839492449485] for /X.X.X.X
> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
> INFO  17:27:07 Removing tokens [6977666116265389022494863106850615,
> 111270759969411259938117902792984586225,
> 138611464975439236357814418845450428175] for /X.X.X.X
>
> Has anybody experienced similar behaviour. Restarting the entire cluster,
>  everytime a node is decommissioned does not seem right. Thanks in advance
> for the help.
>
>
> thanks
> Sai
>
>
>


Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

2016-02-16 Thread sai krishnam raju potturi
hi;
we have a 12 node cluster across 2 datacenters. We are currently using
cassandra 2.1.12 version.

SNITCH : GossipingPropertyFileSnitch

When we decommissioned few nodes in a particular datacenter and observed
the following :

nodetool status shows only the live nodes in the cluster.

nodetool describecluster shows the decommissioned nodes as UNREACHABLE.

nodetool gossipinfo shows the decommissioned nodes as "LEFT"


When the live nodes were restarted, "nodetool describecluster" shows only
the live nodes, which is expected.

Purging the gossip info too did not help.

INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
140213388002871593911508364312533329916,
 98576967436431350637134234839492449485] for /X.X.X.X
INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
INFO  17:27:07 Removing tokens [6977666116265389022494863106850615,
111270759969411259938117902792984586225,
138611464975439236357814418845450428175] for /X.X.X.X

Has anybody experienced similar behaviour. Restarting the entire cluster,
 everytime a node is decommissioned does not seem right. Thanks in advance
for the help.


thanks
Sai


Re: reducing disk space consumption

2016-02-10 Thread sai krishnam raju potturi
suggestion :  try the following command  "lsof | grep DEL". If in the
output if you see a lot of SSTable files; restart the node. The disk space
will be claimed back.


thanks
Sai

On Wed, Feb 10, 2016 at 9:59 AM, Ted Yu  wrote:

> Hi,
> I am using DSE 4.8.4
> On one node, disk space is low where:
>
> 42G
> /var/lib/cassandra/data/usertable/data-0abea7f0cf9211e5a355bf8dafbfa99c
>
> Using CLI, I dropped keyspace usertable but the data dir above still
> consumes 42G.
>
> What action would free this part of disk (I don't need the data) ?
>
> Thanks
>


Re: Re : Possibility of using 2 different snitches in the Multi_DC cluster

2016-02-03 Thread sai krishnam raju potturi
thanks a lot Robert. Greatly appreciate it.

thanks
Sai

On Tue, Feb 2, 2016 at 6:19 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Feb 2, 2016 at 1:23 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> What is the possibility of using GossipingPropertFileSnitch on
>> datacenters in our private cloud, and Ec2MultiRegionSnitch in AWS?
>>
>
> You should just use GPFS everywhere.
>
> This is also the reason why you should not use EC2MRS if you might ever
> have a DC that is outside of AWS. Just use GPFS.
>
> =Rob
> PS - To answer your actual question... one "can" use different snitches on
> a per node basis, but ONE REALLY REALLY SHOULDN'T CONSIDER THIS A VALID
> APPROACH AND IF ONE TRIES AND FAILS I WILL POINT AND LAUGH AND NOT HELP
> THEM :D
>


Re : Possibility of using 2 different snitches in the Multi_DC cluster

2016-02-02 Thread sai krishnam raju potturi
hi;
  we have a multi-DC cluster spanning across our own private cloud and AWS.
We are currently using Propertyfile snitch across our cluster.

What is the possibility of using GossipingPropertFileSnitch on datacenters
in our private cloud, and Ec2MultiRegionSnitch in AWS?

Thanks in advance for the help.

thanks
Sai


Re: Cassandra Cleanup and disk space

2015-11-26 Thread sai krishnam raju potturi
Could it have been that you expanded your cluster a while back, but did not
cleanup then.

On Thu, Nov 26, 2015, 07:51 Luigi Tagliamonte  wrote:

> I did it 2 times and in both times it freed a lot of space, don't think
> that it's just a coincidence.
> On Nov 26, 2015 10:56 AM, "Carlos Alonso"  wrote:
>
>> May it be a SizeTieredCompaction of big SSTables just finished and freed
>> some space?
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 26 November 2015 at 08:55, Luigi Tagliamonte  wrote:
>>
>>> Hi Everyone,
>>> I'd like to understand what cleanup does on a running cluster when there
>>> is no cluster topology change, i did a test and i saw the cluster disk
>>> space shrink of 200GB.
>>> I'm using cassandra 2.1.9.
>>> --
>>> Luigi
>>> ---
>>> “The only way to get smarter is by playing a smarter opponent.”
>>>
>>
>>


Re: handling down node cassandra 2.0.15

2015-11-16 Thread sai krishnam raju potturi
Is that a seed node?

On Mon, Nov 16, 2015, 05:21 Anishek Agarwal  wrote:

> Hello,
>
> We are having a 3 node cluster and one of the node went down due to a
> hardware memory failure looks like. We followed the steps below after the
> node was down for more than the default value of *max_hint_window_in_ms*
>
> I tried to restart cassandra by following the steps @
>
>
>1.
>
> http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_replace_node_t.html
>2.
>
> http://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html
>
> *except the "clear data" part as it was not specified in second blog
> above.*
>
> i was trying to restart the same node that went down, however I did not
> get the messages in log files as stated in 2 against "StorageService"
>
> instead it just tried to replay and then stopped with the error message as
> below:
>
> *ERROR [main] 2015-11-16 15:27:22,944 CassandraDaemon.java (line 584)
> Exception encountered during startup*
> *java.lang.RuntimeException: Cannot replace address with a node that is
> already bootstrapped*
>
> Can someone please help me if there is something i am doing wrong here.
>
> Thanks for the help in advance.
>
> Regards,
> Anishek
>


Re: Re : will Unsafeassaniate a dead node maintain the replication factor

2015-11-01 Thread sai krishnam raju potturi
thanks Surabhi. Would try that out.

On Sat, Oct 31, 2015 at 6:52 PM, Surbhi Gupta <surbhi.gupt...@gmail.com>
wrote:

> If it is using vnodes then just run nodetool repair . It should fix the
> issue related to data if any.
>
> And then run nodetool cleanup
>
> Sent from my iPhone
>
> On Oct 31, 2015, at 3:12 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> yes Surbhi.
>
> On Sat, Oct 31, 2015 at 1:13 PM, Surbhi Gupta <surbhi.gupt...@gmail.com>
> wrote:
>
>> Is the cluster using vnodes?
>>
>> Sent from my iPhone
>>
>> On Oct 31, 2015, at 9:16 AM, sai krishnam raju potturi <
>> pskraj...@gmail.com> wrote:
>>
>> yes Surbhi.
>>
>> On Sat, Oct 31, 2015 at 12:10 PM, Surbhi Gupta <surbhi.gupt...@gmail.com>
>> wrote:
>>
>>> So have you already done unsafe assassination ?
>>>
>>> On 31 October 2015 at 08:37, sai krishnam raju potturi <
>>> pskraj...@gmail.com> wrote:
>>>
>>>> it's dead; and we had to do unsafeassassinate as other 2 methods did
>>>> not work
>>>>
>>>> On Sat, Oct 31, 2015 at 11:30 AM, Surbhi Gupta <
>>>> surbhi.gupt...@gmail.com> wrote:
>>>>
>>>>> Whether the node is down or up which you want to decommission?
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Oct 31, 2015, at 8:24 AM, sai krishnam raju potturi <
>>>>> pskraj...@gmail.com> wrote:
>>>>>
>>>>> Thanks Surabhi. Decommission nor removenode did not work. We did not
>>>>> capture the tokens of the dead node. Any way we could make sure the
>>>>> replication of 3 is maintained?
>>>>>
>>>>> On Sat, Oct 31, 2015, 11:14 Surbhi Gupta <surbhi.gupt...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> You have to do few things before unsafe as sanitation . First run the
>>>>>> nodetool decommission if the node is up and wait till streaming happens .
>>>>>> You can check is the streaming is completed by nodetool netstats . If
>>>>>> streaming is completed you can do unsafe assanitation .
>>>>>>
>>>>>> To answer your question unsafe assanitation will not take care of
>>>>>> replication factor .
>>>>>> It is like forcing a node out from the cluster .
>>>>>>
>>>>>> Hope this helps.
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> > On Oct 31, 2015, at 5:12 AM, sai krishnam raju potturi <
>>>>>> pskraj...@gmail.com> wrote:
>>>>>> >
>>>>>> > hi;
>>>>>> >would unsafeassasinating a dead node maintain the replication
>>>>>> factor like decommission process or removenode process?
>>>>>> >
>>>>>> > thanks
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re : will Unsafeassaniate a dead node maintain the replication factor

2015-10-31 Thread sai krishnam raju potturi
hi;
   would unsafeassasinating a dead node maintain the replication factor
like decommission process or removenode process?

thanks


Re : Unable to bootstrap a new node

2015-10-31 Thread sai krishnam raju potturi
hi;
   we were trying to add in a new node to the cluster. It fails during the
bootstrap process unable to gossip with seed nodes. We have not faced this
earlier.


2015-10-31 12:30:15,779 [HANDSHAKE-/X.X.X.X] INFO OutboundTcpConnection
Handshaking version with /X.X.X.X
2015-10-31 12:30:15,810 [HANDSHAKE-/Y.Y.Y.Y] INFO OutboundTcpConnection
Handshaking version with /Y.Y.Y.Y
2015-10-31 12:30:46,743 [main] ERROR CassandraDaemon Exception encountered
during startup
java.lang.RuntimeException: Unable to gossip with any seeds
at
org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1296)
at
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457)
at
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:671)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:623)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:515)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:437)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:423)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:641)
2015-10-31 12:30:46,748 [Thread-5] INFO DseDaemon DSE shutting down...
2015-10-31 12:30:46,760 [Thread-5] ERROR CassandraDaemon Exception in
thread Thread[Thread-5,5,main]
java.lang.NullPointerException
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1373)
at com.datastax.bdp.gms.DseState.setActiveStatus(DseState.java:171)
at com.datastax.bdp.server.DseDaemon.stop(DseDaemon.java:503)
at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:412)
2015-10-31 12:30:46,846 [StorageServiceShutdownHook] WARN Gossiper No local
state or state is in silent shutdown, not announcing shutdown
2015-10-31 12:30:46,846 [StorageServiceShutdownHook] INFO MessagingService
Waiting for messaging service to quiesce
2015-10-31 12:30:46,847 [ACCEPT-/Z.Z.Z.Z] INFO MessagingService
MessagingService has terminated the accept() thread

Thanks in advance for the help.


Re: Re : will Unsafeassaniate a dead node maintain the replication factor

2015-10-31 Thread sai krishnam raju potturi
it's dead; and we had to do unsafeassassinate as other 2 methods did not
work

On Sat, Oct 31, 2015 at 11:30 AM, Surbhi Gupta <surbhi.gupt...@gmail.com>
wrote:

> Whether the node is down or up which you want to decommission?
>
> Sent from my iPhone
>
> On Oct 31, 2015, at 8:24 AM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> Thanks Surabhi. Decommission nor removenode did not work. We did not
> capture the tokens of the dead node. Any way we could make sure the
> replication of 3 is maintained?
>
> On Sat, Oct 31, 2015, 11:14 Surbhi Gupta <surbhi.gupt...@gmail.com> wrote:
>
>> You have to do few things before unsafe as sanitation . First run the
>> nodetool decommission if the node is up and wait till streaming happens .
>> You can check is the streaming is completed by nodetool netstats . If
>> streaming is completed you can do unsafe assanitation .
>>
>> To answer your question unsafe assanitation will not take care of
>> replication factor .
>> It is like forcing a node out from the cluster .
>>
>> Hope this helps.
>>
>> Sent from my iPhone
>>
>> > On Oct 31, 2015, at 5:12 AM, sai krishnam raju potturi <
>> pskraj...@gmail.com> wrote:
>> >
>> > hi;
>> >would unsafeassasinating a dead node maintain the replication factor
>> like decommission process or removenode process?
>> >
>> > thanks
>> >
>> >
>>
>


Re: Re : will Unsafeassaniate a dead node maintain the replication factor

2015-10-31 Thread sai krishnam raju potturi
Thanks Surabhi. Decommission nor removenode did not work. We did not
capture the tokens of the dead node. Any way we could make sure the
replication of 3 is maintained?

On Sat, Oct 31, 2015, 11:14 Surbhi Gupta <surbhi.gupt...@gmail.com> wrote:

> You have to do few things before unsafe as sanitation . First run the
> nodetool decommission if the node is up and wait till streaming happens .
> You can check is the streaming is completed by nodetool netstats . If
> streaming is completed you can do unsafe assanitation .
>
> To answer your question unsafe assanitation will not take care of
> replication factor .
> It is like forcing a node out from the cluster .
>
> Hope this helps.
>
> Sent from my iPhone
>
> > On Oct 31, 2015, at 5:12 AM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
> >
> > hi;
> >would unsafeassasinating a dead node maintain the replication factor
> like decommission process or removenode process?
> >
> > thanks
> >
> >
>


Re: Re : will Unsafeassaniate a dead node maintain the replication factor

2015-10-31 Thread sai krishnam raju potturi
yes Surbhi.

On Sat, Oct 31, 2015 at 12:10 PM, Surbhi Gupta <surbhi.gupt...@gmail.com>
wrote:

> So have you already done unsafe assassination ?
>
> On 31 October 2015 at 08:37, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> it's dead; and we had to do unsafeassassinate as other 2 methods did not
>> work
>>
>> On Sat, Oct 31, 2015 at 11:30 AM, Surbhi Gupta <surbhi.gupt...@gmail.com>
>> wrote:
>>
>>> Whether the node is down or up which you want to decommission?
>>>
>>> Sent from my iPhone
>>>
>>> On Oct 31, 2015, at 8:24 AM, sai krishnam raju potturi <
>>> pskraj...@gmail.com> wrote:
>>>
>>> Thanks Surabhi. Decommission nor removenode did not work. We did not
>>> capture the tokens of the dead node. Any way we could make sure the
>>> replication of 3 is maintained?
>>>
>>> On Sat, Oct 31, 2015, 11:14 Surbhi Gupta <surbhi.gupt...@gmail.com>
>>> wrote:
>>>
>>>> You have to do few things before unsafe as sanitation . First run the
>>>> nodetool decommission if the node is up and wait till streaming happens .
>>>> You can check is the streaming is completed by nodetool netstats . If
>>>> streaming is completed you can do unsafe assanitation .
>>>>
>>>> To answer your question unsafe assanitation will not take care of
>>>> replication factor .
>>>> It is like forcing a node out from the cluster .
>>>>
>>>> Hope this helps.
>>>>
>>>> Sent from my iPhone
>>>>
>>>> > On Oct 31, 2015, at 5:12 AM, sai krishnam raju potturi <
>>>> pskraj...@gmail.com> wrote:
>>>> >
>>>> > hi;
>>>> >would unsafeassasinating a dead node maintain the replication
>>>> factor like decommission process or removenode process?
>>>> >
>>>> > thanks
>>>> >
>>>> >
>>>>
>>>
>>
>


Re: Re : will Unsafeassaniate a dead node maintain the replication factor

2015-10-31 Thread sai krishnam raju potturi
yes Surbhi.

On Sat, Oct 31, 2015 at 1:13 PM, Surbhi Gupta <surbhi.gupt...@gmail.com>
wrote:

> Is the cluster using vnodes?
>
> Sent from my iPhone
>
> On Oct 31, 2015, at 9:16 AM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
> yes Surbhi.
>
> On Sat, Oct 31, 2015 at 12:10 PM, Surbhi Gupta <surbhi.gupt...@gmail.com>
> wrote:
>
>> So have you already done unsafe assassination ?
>>
>> On 31 October 2015 at 08:37, sai krishnam raju potturi <
>> pskraj...@gmail.com> wrote:
>>
>>> it's dead; and we had to do unsafeassassinate as other 2 methods did not
>>> work
>>>
>>> On Sat, Oct 31, 2015 at 11:30 AM, Surbhi Gupta <surbhi.gupt...@gmail.com
>>> > wrote:
>>>
>>>> Whether the node is down or up which you want to decommission?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Oct 31, 2015, at 8:24 AM, sai krishnam raju potturi <
>>>> pskraj...@gmail.com> wrote:
>>>>
>>>> Thanks Surabhi. Decommission nor removenode did not work. We did not
>>>> capture the tokens of the dead node. Any way we could make sure the
>>>> replication of 3 is maintained?
>>>>
>>>> On Sat, Oct 31, 2015, 11:14 Surbhi Gupta <surbhi.gupt...@gmail.com>
>>>> wrote:
>>>>
>>>>> You have to do few things before unsafe as sanitation . First run the
>>>>> nodetool decommission if the node is up and wait till streaming happens .
>>>>> You can check is the streaming is completed by nodetool netstats . If
>>>>> streaming is completed you can do unsafe assanitation .
>>>>>
>>>>> To answer your question unsafe assanitation will not take care of
>>>>> replication factor .
>>>>> It is like forcing a node out from the cluster .
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> > On Oct 31, 2015, at 5:12 AM, sai krishnam raju potturi <
>>>>> pskraj...@gmail.com> wrote:
>>>>> >
>>>>> > hi;
>>>>> >would unsafeassasinating a dead node maintain the replication
>>>>> factor like decommission process or removenode process?
>>>>> >
>>>>> > thanks
>>>>> >
>>>>> >
>>>>>
>>>>
>>>
>>
>


Re : Data restore to a new cluster

2015-10-26 Thread sai krishnam raju potturi
hi;
   we are working on a data backup and restore procedure to a new cluster.
We are following the datastax documentation. It mentions a step

"Restore the SSTable files snapshotted from the old cluster onto the new
cluster using the same directories"

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html

Could not find a mention about  "SCHEMA" creation. Could somebody shed some
light on this. At what point do we create the "SCHEMA", of required.


thanks
Sai


Re: Re : Replication factor for system_auth keyspace

2015-10-16 Thread sai krishnam raju potturi
thanks guys for the advice. We were running parallel repairs earlier, with
cassandra version 2.0.14. As pointed out having set the replication factor
really huge for system_auth was causing the repair to take really long.

thanks
Sai

On Fri, Oct 16, 2015 at 9:56 AM, Victor Chen <victor.h.c...@gmail.com>
wrote:

> To elaborate on what Robert said, I think with most things technology
> related, the answer with these sorts of questions (i.e. "ideal settings")
> is usually "it depends." Remember that technology is a tool that we use to
> accomplish something we want. It's just a mechanism that we as humans use
> to exert our wishes on other things. In this case, cassandra allows us to
> exert our wishes on the data we need to have available. So think for a
> second about what you want? To be less philosophical and more practical,
> how many nodes you are comfortable losing or likely to lose? How many
> copies of your system_auth keyspace do you want to have always available?
>
> Also, what do you mean by "really long?" What version of cassandra are you
> using? If you are on 2.1, look at migrating to incremental repair. That it
> takes so long for such a small keyspace leads me to believe you're using
> sequential repair ...
>
> -V
>
> On Thu, Oct 15, 2015 at 7:46 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Thu, Oct 15, 2015 at 10:24 AM, sai krishnam raju potturi <
>> pskraj...@gmail.com> wrote:
>>
>>>   we are deploying a new cluster with 2 datacenters, 48 nodes in each
>>> DC. For the system_auth keyspace, what should be the ideal
>>> replication_factor set?
>>>
>>> We tried setting the replication factor equal to the number of nodes in
>>> a datacenter, and the repair for the system_auth keyspace took really long.
>>> Your suggestions would be of great help.
>>>
>>
>> More than 1 and a lot less than 48.
>>
>> =Rob
>>
>>
>


Re : Replication factor for system_auth keyspace

2015-10-15 Thread sai krishnam raju potturi
hi;
  we are deploying a new cluster with 2 datacenters, 48 nodes in each DC.
For the system_auth keyspace, what should be the ideal replication_factor
set?

We tried setting the replication factor equal to the number of nodes in a
datacenter, and the repair for the system_auth keyspace took really long.
Your suggestions would be of great help.

thanks
Sai


Re: Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-09 Thread sai krishnam raju potturi
thanks Jonathan. I see a advantage in doing it one AZ or rack at a time.

On Thu, Oct 8, 2015 at 6:41 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> My hunch is the bigger your cluster the less impact it will have, as each
> node takes part in smaller and smaller % of total queries.  Considering
> that compaction is always happening, I'd wager if you've got a big cluster
> (as you say you do) you'll probably be ok running several cleanups at a
> time.
>
> I'd say start one, see how your perf is impacted (if at all) and go from
> there.
>
> If you're running a proper snitch you could probably do an entire rack /
> AZ at a time.
>
>
> On Thu, Oct 8, 2015 at 3:08 PM sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> We plan to do it during non-peak hours when customer traffic is less.
>> That sums up to 10 nodes a day, which is concerning as we have other data
>> centers to be expanded eventually.
>>
>> Since cleanup is similar to compaction, which is CPU intensive and will
>> effect reads  if this data center were to serve traffic. Is running cleanup
>> in parallel advisable??
>>
>> On Thu, Oct 8, 2015, 17:53 Jonathan Haddad <j...@jonhaddad.com> wrote:
>>
>>> Unless you're close to running out of disk space, what's the harm in it
>>> taking a while?  How big is your DC?  At 45 min per node, you can do 32
>>> nodes a day.  Diverting traffic away from a DC just to run cleanup feels
>>> like overkill to me.
>>>
>>>
>>>
>>> On Thu, Oct 8, 2015 at 2:39 PM sai krishnam raju potturi <
>>> pskraj...@gmail.com> wrote:
>>>
>>>> hi;
>>>>our cassandra cluster currently uses DSE 4.6. The underlying
>>>> cassandra version is 2.0.14.
>>>>
>>>> We are planning on adding multiple nodes to one of our datacenters.
>>>> This requires "nodetool cleanup". The "nodetool cleanup" operation
>>>> takes around 45 mins for each node.
>>>>
>>>> Datastax documentation recommends running "nodetool cleanup" for one
>>>> node at a time. That would be really long, owing to the size of our
>>>> datacenter.
>>>>
>>>> If we were to divert the read and write traffic away from a particular
>>>> datacenter, could we run "cleanup" on multiple nodes in parallel for
>>>> that datacenter??
>>>>
>>>>
>>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>>>>
>>>>
>>>> thanks
>>>> Sai
>>>>
>>>


Re: CQL error when adding multiple conditional update statements in the same batch

2015-10-08 Thread sai krishnam raju potturi
could you also provide the columnfamily schema.

On Thu, Oct 8, 2015 at 4:13 PM, Peddi, Praveen  wrote:

> Hi,
>
> I am trying to understand this error message that CQL is throwing when I
> try to update 2 different rows with different values on same conditional
> columns. Doesn't CQL support that? I am wondering why CQL has this
> restriction (since condition applies to each row independently, why does
> CQL even care if the values of the condition is same or different).
>
> BEGIN BATCH
> UPDATE activities SET state='CLAIMED',version=11 WHERE key='Key1' IF 
> version=10;
> UPDATE activities SET state='ALLOCATED',version=2 WHERE key='Key2' IF 
> version=1;
> APPLY BATCH;
>
> gives the following error
>
> Bad Request: Duplicate and incompatible conditions for column version
>
> Is there anyway to update more than 1 row with different conditional value
> for each row (other than executing these statements individually)?
> -Praveen
>
>


Re: Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-08 Thread sai krishnam raju potturi
We plan to do it during non-peak hours when customer traffic is less. That
sums up to 10 nodes a day, which is concerning as we have other data
centers to be expanded eventually.

Since cleanup is similar to compaction, which is CPU intensive and will
effect reads  if this data center were to serve traffic. Is running cleanup
in parallel advisable??

On Thu, Oct 8, 2015, 17:53 Jonathan Haddad <j...@jonhaddad.com> wrote:

> Unless you're close to running out of disk space, what's the harm in it
> taking a while?  How big is your DC?  At 45 min per node, you can do 32
> nodes a day.  Diverting traffic away from a DC just to run cleanup feels
> like overkill to me.
>
>
>
> On Thu, Oct 8, 2015 at 2:39 PM sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> hi;
>>our cassandra cluster currently uses DSE 4.6. The underlying cassandra
>> version is 2.0.14.
>>
>> We are planning on adding multiple nodes to one of our datacenters. This
>> requires "nodetool cleanup". The "nodetool cleanup" operation takes
>> around 45 mins for each node.
>>
>> Datastax documentation recommends running "nodetool cleanup" for one
>> node at a time. That would be really long, owing to the size of our
>> datacenter.
>>
>> If we were to divert the read and write traffic away from a particular
>> datacenter, could we run "cleanup" on multiple nodes in parallel for
>> that datacenter??
>>
>>
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>>
>>
>> thanks
>> Sai
>>
>


Re : Nodetool Cleanup on multiple nodes in parallel

2015-10-08 Thread sai krishnam raju potturi
hi;
   our cassandra cluster currently uses DSE 4.6. The underlying cassandra
version is 2.0.14.

We are planning on adding multiple nodes to one of our datacenters. This
requires "nodetool cleanup". The "nodetool cleanup" operation takes around
45 mins for each node.

Datastax documentation recommends running "nodetool cleanup" for one node
at a time. That would be really long, owing to the size of our datacenter.

If we were to divert the read and write traffic away from a particular
datacenter, could we run "cleanup" on multiple nodes in parallel for that
datacenter??

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html


thanks
Sai


Re: Node won't go away

2015-10-08 Thread sai krishnam raju potturi
the below solution should work.

For each node in the cluster :
 a : Stop cassandra service on the node.
 b : manually delete data under $data_directory/system/peers/  directory.
 c : In cassandra-env.sh file, add the line JVM_OPTS="$JVM_OPTS
-Dcassandra.load_ring_state=false".
 d : Restart service on the node.
 e : delete the added line in cassandra-env.sh  JVM_OPTS="$JVM_OPTS
-Dcassandra.load_ring_state=false".

thanks
Sai Potturi



On Thu, Oct 8, 2015 at 11:27 AM, Robert Wille  wrote:

> We had some problems with a node, so we decided to rebootstrap it. My IT
> guy screwed up, and when he added -Dcassandra.replace_address to
> cassandra-env.sh, he forgot the closing quote. The node bootstrapped, and
> then refused to join the cluster. We shut it down, and then noticed that
> nodetool status no longer showed that node, and the “Owns” column had
> increased from ~10% per node to ~11% (we originally had 10 nodes). I don’t
> know why Cassandra decided to automatically remove the node from the
> cluster, but it did. We figured it would be best to make sure the node was
> completely forgotten, and then add it back into the cluster as a new node.
> Problem is, it won’t completely go away.
>
> nodetool status doesn’t list it, but its still in system.peers, and
> OpsCenter still shows it. When I run nodetool removenode, it says that it
> can’t find the node.
>
> How do I completely get rid of it?
>
> Thanks in advance
>
> Robert
>
>


Re : List users getting stuck and not returning results

2015-10-02 Thread sai krishnam raju potturi
We have 2 clusters running DSE. On one of the clusters we recently added
additional nodes to a datacenter.

On the cluster where we added nodes, we are getting authentication issues
from client. We are also unable to "list users" on system_auth keyspace.
It's getting stuck.

InvalidRequestException(*why:*_UserhasnoSELECTpermission** on <>
orany of itsparents*)  -> client side error

The other clusters perform fine.

Thanks in advance.


Re: High read latency

2015-09-25 Thread sai krishnam raju potturi
Jaydeep; since your primary key involves a clustering column, you may be
having pretty wide rows. The read would be sequential. The latency could be
acceptable, if the read were to involve really wide rows.

If your primary key was like ((a,b)) without the clustering column, it's
like reading a key value pair, and 40ms latency may have been a concern.

Bottom Line : The latency depends on how wide the row is.

On Tue, Sep 22, 2015 at 1:27 PM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> thanks for the information. Posting the query too would be of help.
>
> On Tue, Sep 22, 2015 at 11:56 AM, Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Please find required details here:
>>
>> -  Number of req/s
>>
>> 2k reads/s
>>
>> -  Schema details
>>
>> create table test {
>>
>> a timeuuid,
>>
>> b bigint,
>>
>> c int,
>>
>> d int static,
>>
>> e int static,
>>
>> f int static,
>>
>> g int static,
>>
>> h int,
>>
>> i text,
>>
>> j text,
>>
>> k text,
>>
>> l text,
>>
>> m set
>>
>> n bigint
>>
>> o bigint
>>
>> p bigint
>>
>> q bigint
>>
>> r int
>>
>> s text
>>
>> t bigint
>>
>> u text
>>
>> v text
>>
>> w text
>>
>> x bigint
>>
>> y bigint
>>
>> z bigint,
>>
>> primary key ((a, b), c)
>>
>> };
>>
>> -  JVM settings about the heap
>>
>> Default settings
>>
>> -  Execution time of the GC
>>
>> Avg. 400ms. I do not see long pauses of GC anywhere in the log file.
>>
>> On Tue, Sep 22, 2015 at 5:34 AM, Leleu Eric <eric.le...@worldline.com>
>> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>>
>>>
>>> Before speaking about tuning, can you provide some additional
>>> information ?
>>>
>>>
>>>
>>> -  Number of req/s
>>>
>>> -  Schema details
>>>
>>> -  JVM settings about the heap
>>>
>>> -  Execution time of the GC
>>>
>>>
>>>
>>> 43ms for a read latency may be acceptable according to the number of
>>> request per second.
>>>
>>>
>>>
>>>
>>>
>>> Eric
>>>
>>>
>>>
>>> *De :* Jaydeep Chovatia [mailto:chovatia.jayd...@gmail.com]
>>> *Envoyé :* mardi 22 septembre 2015 00:07
>>> *À :* user@cassandra.apache.org
>>> *Objet :* High read latency
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> My application issues more read requests than write, I do see that under
>>> load cfstats for one of the table is quite high around 43ms
>>>
>>>
>>>
>>> Local read count: 114479357
>>>
>>> Local read latency: 43.442 ms
>>>
>>> Local write count: 22288868
>>>
>>> Local write latency: 0.609 ms
>>>
>>>
>>>
>>>
>>>
>>> Here is my node configuration:
>>>
>>> RF=3, Read/Write with QUORUM, 64GB RAM, 48 CPU core. I have only 5 GB of
>>> data on each node (and for experiment purpose I stored data in tmpfs)
>>>
>>>
>>>
>>> I've tried increasing concurrent_read count upto 512 but no help in read
>>> latency. CPU/Memory/IO looks fine on system.
>>>
>>>
>>>
>>> Any idea what should I tune?
>>>
>>>
>>>
>>> Jaydeep
>>>
>>> --
>>>
>>> Ce message et les pièces jointes sont confidentiels et réservés à
>>> l'usage exclusif de ses destinataires. Il peut également être protégé par
>>> le secret professionnel. Si vous recevez ce message par erreur, merci d'en
>>> avertir immédiatement l'expéditeur et de le détruire. L'intégrité du
>>> message ne pouvant être assurée sur Internet, la responsabilité de
>>> Worldline ne pourra être recherchée quant au contenu de ce message. Bien
>>> que les meilleurs efforts soient faits pour maintenir cette transmission
>>> exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et
>>> sa responsabilité ne saurait être recherchée pour tout dommage résultant
>>> d'un virus transmis.
>>>
>>> This e-mail and the documents attached are confidential and intended
>>> solely for the addressee; it may also be privileged. If you receive this
>>> e-mail in error, please notify the sender immediately and destroy it. As
>>> its integrity cannot be secured on the Internet, the Worldline liability
>>> cannot be triggered for the message content. Although the sender endeavours
>>> to maintain a computer virus-free network, the sender does not warrant that
>>> this transmission is virus-free and will not be liable for any damages
>>> resulting from any virus transmitted.
>>>
>>
>>
>


Re: High read latency

2015-09-22 Thread sai krishnam raju potturi
thanks for the information. Posting the query too would be of help.

On Tue, Sep 22, 2015 at 11:56 AM, Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> Please find required details here:
>
> -  Number of req/s
>
> 2k reads/s
>
> -  Schema details
>
> create table test {
>
> a timeuuid,
>
> b bigint,
>
> c int,
>
> d int static,
>
> e int static,
>
> f int static,
>
> g int static,
>
> h int,
>
> i text,
>
> j text,
>
> k text,
>
> l text,
>
> m set
>
> n bigint
>
> o bigint
>
> p bigint
>
> q bigint
>
> r int
>
> s text
>
> t bigint
>
> u text
>
> v text
>
> w text
>
> x bigint
>
> y bigint
>
> z bigint,
>
> primary key ((a, b), c)
>
> };
>
> -  JVM settings about the heap
>
> Default settings
>
> -  Execution time of the GC
>
> Avg. 400ms. I do not see long pauses of GC anywhere in the log file.
>
> On Tue, Sep 22, 2015 at 5:34 AM, Leleu Eric 
> wrote:
>
>> Hi,
>>
>>
>>
>>
>>
>> Before speaking about tuning, can you provide some additional information
>> ?
>>
>>
>>
>> -  Number of req/s
>>
>> -  Schema details
>>
>> -  JVM settings about the heap
>>
>> -  Execution time of the GC
>>
>>
>>
>> 43ms for a read latency may be acceptable according to the number of
>> request per second.
>>
>>
>>
>>
>>
>> Eric
>>
>>
>>
>> *De :* Jaydeep Chovatia [mailto:chovatia.jayd...@gmail.com]
>> *Envoyé :* mardi 22 septembre 2015 00:07
>> *À :* user@cassandra.apache.org
>> *Objet :* High read latency
>>
>>
>>
>> Hi,
>>
>>
>>
>> My application issues more read requests than write, I do see that under
>> load cfstats for one of the table is quite high around 43ms
>>
>>
>>
>> Local read count: 114479357
>>
>> Local read latency: 43.442 ms
>>
>> Local write count: 22288868
>>
>> Local write latency: 0.609 ms
>>
>>
>>
>>
>>
>> Here is my node configuration:
>>
>> RF=3, Read/Write with QUORUM, 64GB RAM, 48 CPU core. I have only 5 GB of
>> data on each node (and for experiment purpose I stored data in tmpfs)
>>
>>
>>
>> I've tried increasing concurrent_read count upto 512 but no help in read
>> latency. CPU/Memory/IO looks fine on system.
>>
>>
>>
>> Any idea what should I tune?
>>
>>
>>
>> Jaydeep
>>
>> --
>>
>> Ce message et les pièces jointes sont confidentiels et réservés à l'usage
>> exclusif de ses destinataires. Il peut également être protégé par le secret
>> professionnel. Si vous recevez ce message par erreur, merci d'en avertir
>> immédiatement l'expéditeur et de le détruire. L'intégrité du message ne
>> pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra
>> être recherchée quant au contenu de ce message. Bien que les meilleurs
>> efforts soient faits pour maintenir cette transmission exempte de tout
>> virus, l'expéditeur ne donne aucune garantie à cet égard et sa
>> responsabilité ne saurait être recherchée pour tout dommage résultant d'un
>> virus transmis.
>>
>> This e-mail and the documents attached are confidential and intended
>> solely for the addressee; it may also be privileged. If you receive this
>> e-mail in error, please notify the sender immediately and destroy it. As
>> its integrity cannot be secured on the Internet, the Worldline liability
>> cannot be triggered for the message content. Although the sender endeavours
>> to maintain a computer virus-free network, the sender does not warrant that
>> this transmission is virus-free and will not be liable for any damages
>> resulting from any virus transmitted.
>>
>
>


Fwd: Re : Restoring nodes in a new datacenter, from snapshots in an existing datacenter

2015-08-28 Thread sai krishnam raju potturi
hi;
 We have cassandra cluster with Vnodes spanning across 3 data centers.
We take backup of the snapshots from one datacenter.
   In a doomsday scenario, we want to restore a downed datacenter, with
snapshots from another datacenter. We have same number of nodes in each
datacenter.

1 : We know it requires copying the snapshots and their corresponding token
ranges to the nodes in new datacenter, and running nodetool refresh.

2 : The question is, we will now have 2 datacenters, with the same exact
token ranges. Will that cause any problem.

DC1 : Node-1 : token1..token10
  Node-2 : token11 .token20
  Node-3 : token21 . token30
  Node-4 : token31 . token40

 DC2 : Node-1 : token1.token10
   Node-2 : token11token20
   Node-3 : token21token30
   Node-4 : token31token40


thanks
Sai


Re: Re : Restoring nodes in a new datacenter, from snapshots in an existing datacenter

2015-08-28 Thread sai krishnam raju potturi
thanks Nate. But regarding our situation, of the 3 Datacenters we have DC1
DC2 and DC3, we take backup of snapshots  on DC1.

If DC3 were to go down, will we not be able to bring up a new DC4 with
snapshots and token_ranges from DC1?

On Fri, Aug 28, 2015 at 3:19 PM, Nate McCall n...@thelastpickle.com wrote:

 You cannot use the identical token ranges. You have to capture membership
 information somewhere for each datacenter, and use that token information
 when briging up the replacement DC.

 You can find details on this process here:

 http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html

 That process is straight forward, but it can go south pretty quickly if
 you miss a step. It's a really good idea to set asside some time to try
 this out in a staging/test system and build a runbook for the process
 targeting your specific environment.

 On Fri, Aug 28, 2015 at 1:12 PM, sai krishnam raju potturi 
 pskraj...@gmail.com wrote:


 hi;
  We have cassandra cluster with Vnodes spanning across 3 data
 centers. We take backup of the snapshots from one datacenter.
In a doomsday scenario, we want to restore a downed datacenter, with
 snapshots from another datacenter. We have same number of nodes in each
 datacenter.

 1 : We know it requires copying the snapshots and their corresponding
 token ranges to the nodes in new datacenter, and running nodetool
 refresh.

 2 : The question is, we will now have 2 datacenters, with the same exact
 token ranges. Will that cause any problem.

 DC1 : Node-1 : token1..token10
   Node-2 : token11 .token20
   Node-3 : token21 . token30
   Node-4 : token31 . token40

  DC2 : Node-1 : token1.token10
Node-2 : token11token20
Node-3 : token21token30
Node-4 : token31token40


 thanks
 Sai






 --
 -
 Nate McCall
 Austin, TX
 @zznate

 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com



Re : Decommissioned node appears in logs, and is sometimes marked as UNREACHEABLE in `nodetool describecluster`

2015-08-28 Thread sai krishnam raju potturi
hi;
we decommissioned nodes in a datacenter a while back. Those nodes keep
showing up in the logs, and also sometimes marked as UNREACHABLE when
`nodetool describecluster` is run.

However these nodes do not show up in `nodetool status` and
`nodetool ring`.

Below are a couple lines from the logs.

2015-08-27 04:38:16,180 [GossipStage:473] INFO Gossiper InetAddress /
10.0.0.1 is now DOWN
2015-08-27 04:38:16,183 [GossipStage:473] INFO StorageService Removing
tokens [85070591730234615865843651857942052865] for /10.0.0.1

thanks
Sai