Re: Strange issue wherein cassandra not being started from cron

2017-01-11 Thread Ajay Garg
Hi Hannu.

On Wed, Jan 11, 2017 at 8:31 PM, Hannu Kröger <hkro...@gmail.com> wrote:

> One possible reason is that cassandra process gets different user when run
> differently. Check who owns the data files and check also what gets written
> into the /var/log/cassandra/system.log (or whatever that was).
>

Absolutely nothing gets written to /var/log/cassandra/system.log (when
trying to invoke cassandra via cron).


>
> Hannu
>
>
> On 11 Jan 2017, at 16.42, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
> Tried everything.
> Every other cron job/script I try works, just the cassandra-service does
> not.
>
> On Wed, Jan 11, 2017 at 8:51 AM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>>
>>
>> On Tuesday, January 10, 2017, Jonathan Haddad <j...@jonhaddad.com> wrote:
>>
>>> Last I checked, cron doesn't load the same, full environment you see
>>> when you log in. Also, why put Cassandra on a cron?
>>> On Mon, Jan 9, 2017 at 9:47 PM Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>>>
>>>> Hi Ajay,
>>>>
>>>> Have you had a look at cron logs? - mine is in path /var/log/cron
>>>>
>>>> Thanks & Regards,
>>>>
>>>> On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg <ajaygargn...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All.
>>>>>
>>>>> Facing a very weird issue, wherein the command
>>>>>
>>>>> */etc/init.d/cassandra start*
>>>>>
>>>>> causes cassandra to start when the command is run from command-line.
>>>>>
>>>>>
>>>>> However, if I put the above as a cron job
>>>>>
>>>>>
>>>>>
>>>>> ** * * * * /etc/init.d/cassandra start*
>>>>> cassandra never starts.
>>>>>
>>>>>
>>>>> I have checked, and "cron" service is running.
>>>>>
>>>>>
>>>>> Any ideas what might be wrong?
>>>>> I am pasting the cassandra script for brevity.
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>> Ajay
>>>>>
>>>>>
>>>>> 
>>>>> 
>>>>> #! /bin/sh
>>>>> ### BEGIN INIT INFO
>>>>> # Provides:  cassandra
>>>>> # Required-Start:$remote_fs $network $named $time
>>>>> # Required-Stop: $remote_fs $network $named $time
>>>>> # Should-Start:  ntp mdadm
>>>>> # Should-Stop:   ntp mdadm
>>>>> # Default-Start: 2 3 4 5
>>>>> # Default-Stop:  0 1 6
>>>>> # Short-Description: distributed storage system for structured data
>>>>> # Description:   Cassandra is a distributed (peer-to-peer) system
>>>>> for
>>>>> #the management and storage of structured data.
>>>>> ### END INIT INFO
>>>>>
>>>>> # Author: Eric Evans <eev...@racklabs.com>
>>>>>
>>>>> DESC="Cassandra"
>>>>> NAME=cassandra
>>>>> PIDFILE=/var/run/$NAME/$NAME.pid
>>>>> SCRIPTNAME=/etc/init.d/$NAME
>>>>> CONFDIR=/etc/cassandra
>>>>> WAIT_FOR_START=10
>>>>> CASSANDRA_HOME=/usr/share/cassandra
>>>>> FD_LIMIT=10
>>>>>
>>>>> [ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
>>>>> [ -e /etc/cassandra/cassandra.yaml ] || exit 0
>>>>> [ -e /etc/cassandra/cassandra-env.sh ] || exit 0
>>>>>
>>>>> # Read configuration variable file if it is present
>>>>> [ -r /etc/default/$NAME ] && . /etc/default/$NAME
>>>>>
>>>>> # Read Cassandra environment file.
>>>>> . /etc/cassandra/cassandra-env.sh
>>>>>
>>>>> if [ -z "$JVM_OPTS" ]; then
>>>>> echo "Initialization failed; \$JVM_OPTS not set!" >&2
>>>>> exit 3
>>>>> fi
>>>>>
>>>>> export JVM_OPTS
>>>>>
>>>>> # Export JAVA_HOME, if set.
>>>>> [ -n "$JAVA_HOME" ] && export JAVA_HOME
>>>>>
>>>>> # Load the VERBOSE setting and other rcS variables
>>>>>

Re: Strange issue wherein cassandra not being started from cron

2017-01-11 Thread Ajay Garg
On Wed, Jan 11, 2017 at 8:29 PM, Martin Schröder <mar...@oneiros.de> wrote:

> 2017-01-11 15:42 GMT+01:00 Ajay Garg <ajaygargn...@gmail.com>:
> > Tried everything.
>
> Then try
>service cassandra start
> or
>systemctl start cassandra
>
> You still haven't explained to us why you want to start cassandra every
> minute.
>

Hi Martin.

Sometimes, the cassandra-process gets killed (reason unknown as of now).
Doing a manual "service cassandra start" works then.

Adding this in cron would at least ensure that the maximum downtime is 59
seconds (till the time root-cause of cassandra-crashing is known).



>
> Best
>Martin
>



-- 
Regards,
Ajay


Re: Strange issue wherein cassandra not being started from cron

2017-01-11 Thread Ajay Garg
Tried everything.
Every other cron job/script I try works, just the cassandra-service does
not.

On Wed, Jan 11, 2017 at 8:51 AM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

>
>
> On Tuesday, January 10, 2017, Jonathan Haddad <j...@jonhaddad.com> wrote:
>
>> Last I checked, cron doesn't load the same, full environment you see when
>> you log in. Also, why put Cassandra on a cron?
>> On Mon, Jan 9, 2017 at 9:47 PM Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>>
>>> Hi Ajay,
>>>
>>> Have you had a look at cron logs? - mine is in path /var/log/cron
>>>
>>> Thanks & Regards,
>>>
>>> On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg <ajaygargn...@gmail.com>
>>> wrote:
>>>
>>>> Hi All.
>>>>
>>>> Facing a very weird issue, wherein the command
>>>>
>>>> */etc/init.d/cassandra start*
>>>>
>>>> causes cassandra to start when the command is run from command-line.
>>>>
>>>>
>>>> However, if I put the above as a cron job
>>>>
>>>>
>>>>
>>>> ** * * * * /etc/init.d/cassandra start*
>>>> cassandra never starts.
>>>>
>>>>
>>>> I have checked, and "cron" service is running.
>>>>
>>>>
>>>> Any ideas what might be wrong?
>>>> I am pasting the cassandra script for brevity.
>>>>
>>>>
>>>> Thanks and Regards,
>>>> Ajay
>>>>
>>>>
>>>> 
>>>> 
>>>> #! /bin/sh
>>>> ### BEGIN INIT INFO
>>>> # Provides:  cassandra
>>>> # Required-Start:$remote_fs $network $named $time
>>>> # Required-Stop: $remote_fs $network $named $time
>>>> # Should-Start:  ntp mdadm
>>>> # Should-Stop:   ntp mdadm
>>>> # Default-Start: 2 3 4 5
>>>> # Default-Stop:  0 1 6
>>>> # Short-Description: distributed storage system for structured data
>>>> # Description:   Cassandra is a distributed (peer-to-peer) system
>>>> for
>>>> #the management and storage of structured data.
>>>> ### END INIT INFO
>>>>
>>>> # Author: Eric Evans <eev...@racklabs.com>
>>>>
>>>> DESC="Cassandra"
>>>> NAME=cassandra
>>>> PIDFILE=/var/run/$NAME/$NAME.pid
>>>> SCRIPTNAME=/etc/init.d/$NAME
>>>> CONFDIR=/etc/cassandra
>>>> WAIT_FOR_START=10
>>>> CASSANDRA_HOME=/usr/share/cassandra
>>>> FD_LIMIT=10
>>>>
>>>> [ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
>>>> [ -e /etc/cassandra/cassandra.yaml ] || exit 0
>>>> [ -e /etc/cassandra/cassandra-env.sh ] || exit 0
>>>>
>>>> # Read configuration variable file if it is present
>>>> [ -r /etc/default/$NAME ] && . /etc/default/$NAME
>>>>
>>>> # Read Cassandra environment file.
>>>> . /etc/cassandra/cassandra-env.sh
>>>>
>>>> if [ -z "$JVM_OPTS" ]; then
>>>> echo "Initialization failed; \$JVM_OPTS not set!" >&2
>>>> exit 3
>>>> fi
>>>>
>>>> export JVM_OPTS
>>>>
>>>> # Export JAVA_HOME, if set.
>>>> [ -n "$JAVA_HOME" ] && export JAVA_HOME
>>>>
>>>> # Load the VERBOSE setting and other rcS variables
>>>> . /lib/init/vars.sh
>>>>
>>>> # Define LSB log_* functions.
>>>> # Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
>>>> . /lib/lsb/init-functions
>>>>
>>>> #
>>>> # Function that returns 0 if process is running, or nonzero if not.
>>>> #
>>>> # The nonzero value is 3 if the process is simply not running, and 1 if
>>>> the
>>>> # process is not running but the pidfile exists (to match the exit
>>>> codes for
>>>> # the "status" command; see LSB core spec 3.1, section 20.2)
>>>> #
>>>> CMD_PATT="cassandra.+CassandraDaemon"
>>>> is_running()
>>>> {
>>>> if [ -f $PIDFILE ]; then
>>>> pid=`cat $PIDFILE`
>>>> grep -Eq "$CMD_PATT" &

Strange issue wherein cassandra not being started from cron

2017-01-09 Thread Ajay Garg
Hi All.

Facing a very weird issue, wherein the command

*/etc/init.d/cassandra start*

causes cassandra to start when the command is run from command-line.


However, if I put the above as a cron job



** * * * * /etc/init.d/cassandra start*
cassandra never starts.


I have checked, and "cron" service is running.


Any ideas what might be wrong?
I am pasting the cassandra script for brevity.


Thanks and Regards,
Ajay



#! /bin/sh
### BEGIN INIT INFO
# Provides:  cassandra
# Required-Start:$remote_fs $network $named $time
# Required-Stop: $remote_fs $network $named $time
# Should-Start:  ntp mdadm
# Should-Stop:   ntp mdadm
# Default-Start: 2 3 4 5
# Default-Stop:  0 1 6
# Short-Description: distributed storage system for structured data
# Description:   Cassandra is a distributed (peer-to-peer) system for
#the management and storage of structured data.
### END INIT INFO

# Author: Eric Evans <eev...@racklabs.com>

DESC="Cassandra"
NAME=cassandra
PIDFILE=/var/run/$NAME/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
CONFDIR=/etc/cassandra
WAIT_FOR_START=10
CASSANDRA_HOME=/usr/share/cassandra
FD_LIMIT=10

[ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
[ -e /etc/cassandra/cassandra.yaml ] || exit 0
[ -e /etc/cassandra/cassandra-env.sh ] || exit 0

# Read configuration variable file if it is present
[ -r /etc/default/$NAME ] && . /etc/default/$NAME

# Read Cassandra environment file.
. /etc/cassandra/cassandra-env.sh

if [ -z "$JVM_OPTS" ]; then
echo "Initialization failed; \$JVM_OPTS not set!" >&2
exit 3
fi

export JVM_OPTS

# Export JAVA_HOME, if set.
[ -n "$JAVA_HOME" ] && export JAVA_HOME

# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh

# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions

#
# Function that returns 0 if process is running, or nonzero if not.
#
# The nonzero value is 3 if the process is simply not running, and 1 if the
# process is not running but the pidfile exists (to match the exit codes for
# the "status" command; see LSB core spec 3.1, section 20.2)
#
CMD_PATT="cassandra.+CassandraDaemon"
is_running()
{
if [ -f $PIDFILE ]; then
pid=`cat $PIDFILE`
grep -Eq "$CMD_PATT" "/proc/$pid/cmdline" 2>/dev/null && return 0
return 1
fi
return 3
}
#
# Function that starts the daemon/service
#
do_start()
{
# Return
#   0 if daemon has been started
#   1 if daemon was already running
#   2 if daemon could not be started

ulimit -l unlimited
ulimit -n "$FD_LIMIT"

cassandra_home=`getent passwd cassandra | awk -F ':' '{ print $6; }'`
heap_dump_f="$cassandra_home/java_`date +%s`.hprof"
error_log_f="$cassandra_home/hs_err_`date +%s`.log"

[ -e `dirname "$PIDFILE"` ] || \
install -d -ocassandra -gcassandra -m755 `dirname $PIDFILE`



start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -q -p
"$PIDFILE" -t >/dev/null || return 1

start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p
"$PIDFILE" -- \
-p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null ||
return 2

}

#
# Function that stops the daemon/service
#
do_stop()
{
# Return
#   0 if daemon has been stopped
#   1 if daemon was already stopped
#   2 if daemon could not be stopped
#   other if a failure occurred
start-stop-daemon -K -p "$PIDFILE" -R TERM/30/KILL/5 >/dev/null
RET=$?
rm -f "$PIDFILE"
return $RET
}

case "$1" in
  start)
[ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
do_start
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
  stop)
[ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
do_stop
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
  restart|force-reload)
log_daemon_msg "Restarting $DESC" "$NAME"
do_stop
case "$?" in
  0|1)
do_start
case "$?" in
  0|1)
do_start
case "$?" in
0) log_end_msg 0 ;;
1) log_

Re: Basic query in setting up secure inter-dc cluster

2016-04-25 Thread Ajay Garg
Hi Everyone.

Kindly reply in "yes" or "no", as to whether it is possible to setup
encryption only between particular pair of nodes?
Or is it an "all" or "none" feature, where encryption is present between
EVERY PAIR of nodes, or in NO PAIR of nodes.


Thanks and Regards,
Ajay

On Mon, Apr 18, 2016 at 9:55 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:

> Also, wondering what is the difference between "all" and "dc" in
> "internode_encryption".
> Perhaps my answer lies in this?
>
> On Mon, Apr 18, 2016 at 9:51 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> Ok, trying to wake up this thread again.
>>
>> I went through the following links ::
>>
>>
>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>
>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html
>>
>>
>> and I am wondering *if it is possible to setup secure
>> inter-communication only between some nodes*.
>>
>> In particular, if I have a 2*2 cluster, is it possible to setup secure
>> communication ONLY between the nodes of DC2?
>> Once it works well, we would then setup secure-communication everywhere.
>>
>> We are wanting this, because DC2 is the backup centre, while DC1 is the
>> primary-centre connected directly to the application-server. We don't want
>> to screw things if something goes bad in DC1.
>>
>>
>> Will be grateful for pointers.
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>> On Sun, Jan 17, 2016 at 9:09 PM, Ajay Garg <ajaygargn...@gmail.com>
>> wrote:
>>
>>> Hi All.
>>>
>>> A gentle query-reminder.
>>>
>>> I will be grateful if I could be given a brief technical overview, as to
>>> how secure-communication occurs between two nodes in a cluster.
>>>
>>> Please note that I wish for some information on the "how it works below
>>> the hood", and NOT "how to set it up".
>>>
>>>
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>> On Wed, Jan 6, 2016 at 4:16 PM, Ajay Garg <ajaygargn...@gmail.com>
>>> wrote:
>>>
>>>> Thanks everyone for the reply.
>>>>
>>>> I actually have a fair bit of questions, but it will be nice if someone
>>>> could please tell me the flow (implementation-wise), as to how node-to-node
>>>> encryption works in a cluster.
>>>>
>>>> Let's say node1 from DC1, wishes to talk securely to node 2 from DC2
>>>> (with *"require_client_auth: false*").
>>>> I presume it would be like below (please correct me if am wrong) ::
>>>>
>>>> a)
>>>> node1 tries to connect to node2, using the certificate *as defined on
>>>> node1* in cassandra.yaml.
>>>>
>>>> b)
>>>> node2 will confirm if the certificate being offered by node1 is in the
>>>> truststore *as defined on node2* in cassandra.yaml.
>>>> if it is, secure-communication is allowed.
>>>>
>>>>
>>>> Is my thinking right?
>>>> I
>>>>
>>>> On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave <nehajtriv...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Ajay,
>>>>> Have a look here :
>>>>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>>>>
>>>>> You can configure for DC level Security:
>>>>>
>>>>> Procedure
>>>>>
>>>>> On each node under sever_encryption_options:
>>>>>
>>>>>- Enable internode_encryption.
>>>>>The available options are:
>>>>>   - all
>>>>>   - none
>>>>>   - dc: Cassandra encrypts the traffic between the data centers.
>>>>>   - rack: Cassandra encrypts the traffic between the racks.
>>>>>
>>>>> regards
>>>>>
>>>>> Neha
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet <
>>>>> absi...@informatica.com> wrote:
>>>>>
>>>>>> Security is a very wide concept. What exactly do you want to achieve ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>>>>>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>>>>>> *To:* user@cassandra.apache.org
>>>>>> *Subject:* Basic query in setting up secure inter-dc cluster
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi All.
>>>>>>
>>>>>> We have a 2*2 cluster deployed, but no security as of now.
>>>>>>
>>>>>> As a first stage, we wish to implement inter-dc security.
>>>>>>
>>>>>> Is it possible to enable security one machine at a time?
>>>>>>
>>>>>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>>>>>
>>>>>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>>>>>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>>>>>> AFTER the changes are made in all the 4 machines?
>>>>>>
>>>>>> Asking here, because I don't want to screw up a live cluster due to
>>>>>> my lack of experience.
>>>>>>
>>>>>> Looking forward to some pointers.
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Regards,
>>>>>> Ajay
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ajay
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay


Re: Basic query in setting up secure inter-dc cluster

2016-04-17 Thread Ajay Garg
Also, wondering what is the difference between "all" and "dc" in
"internode_encryption".
Perhaps my answer lies in this?

On Mon, Apr 18, 2016 at 9:51 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:

> Ok, trying to wake up this thread again.
>
> I went through the following links ::
>
>
> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>
> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html
>
>
> and I am wondering *if it is possible to setup secure inter-communication
> only between some nodes*.
>
> In particular, if I have a 2*2 cluster, is it possible to setup secure
> communication ONLY between the nodes of DC2?
> Once it works well, we would then setup secure-communication everywhere.
>
> We are wanting this, because DC2 is the backup centre, while DC1 is the
> primary-centre connected directly to the application-server. We don't want
> to screw things if something goes bad in DC1.
>
>
> Will be grateful for pointers.
>
>
> Thanks and Regards,
> Ajay
>
> On Sun, Jan 17, 2016 at 9:09 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> Hi All.
>>
>> A gentle query-reminder.
>>
>> I will be grateful if I could be given a brief technical overview, as to
>> how secure-communication occurs between two nodes in a cluster.
>>
>> Please note that I wish for some information on the "how it works below
>> the hood", and NOT "how to set it up".
>>
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>> On Wed, Jan 6, 2016 at 4:16 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>>
>>> Thanks everyone for the reply.
>>>
>>> I actually have a fair bit of questions, but it will be nice if someone
>>> could please tell me the flow (implementation-wise), as to how node-to-node
>>> encryption works in a cluster.
>>>
>>> Let's say node1 from DC1, wishes to talk securely to node 2 from DC2
>>> (with *"require_client_auth: false*").
>>> I presume it would be like below (please correct me if am wrong) ::
>>>
>>> a)
>>> node1 tries to connect to node2, using the certificate *as defined on
>>> node1* in cassandra.yaml.
>>>
>>> b)
>>> node2 will confirm if the certificate being offered by node1 is in the
>>> truststore *as defined on node2* in cassandra.yaml.
>>> if it is, secure-communication is allowed.
>>>
>>>
>>> Is my thinking right?
>>> I
>>>
>>> On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave <nehajtriv...@gmail.com>
>>> wrote:
>>>
>>>> Hi Ajay,
>>>> Have a look here :
>>>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>>>
>>>> You can configure for DC level Security:
>>>>
>>>> Procedure
>>>>
>>>> On each node under sever_encryption_options:
>>>>
>>>>- Enable internode_encryption.
>>>>The available options are:
>>>>   - all
>>>>   - none
>>>>   - dc: Cassandra encrypts the traffic between the data centers.
>>>>   - rack: Cassandra encrypts the traffic between the racks.
>>>>
>>>> regards
>>>>
>>>> Neha
>>>>
>>>>
>>>>
>>>> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet <
>>>> absi...@informatica.com> wrote:
>>>>
>>>>> Security is a very wide concept. What exactly do you want to achieve ?
>>>>>
>>>>>
>>>>>
>>>>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>>>>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:* Basic query in setting up secure inter-dc cluster
>>>>>
>>>>>
>>>>>
>>>>> Hi All.
>>>>>
>>>>> We have a 2*2 cluster deployed, but no security as of now.
>>>>>
>>>>> As a first stage, we wish to implement inter-dc security.
>>>>>
>>>>> Is it possible to enable security one machine at a time?
>>>>>
>>>>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>>>>
>>>>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>>>>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>>>>> AFTER the changes are made in all the 4 machines?
>>>>>
>>>>> Asking here, because I don't want to screw up a live cluster due to my
>>>>> lack of experience.
>>>>>
>>>>> Looking forward to some pointers.
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>> Ajay
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay


Re: Basic query in setting up secure inter-dc cluster

2016-04-17 Thread Ajay Garg
Ok, trying to wake up this thread again.

I went through the following links ::

https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html


and I am wondering *if it is possible to setup secure inter-communication
only between some nodes*.

In particular, if I have a 2*2 cluster, is it possible to setup secure
communication ONLY between the nodes of DC2?
Once it works well, we would then setup secure-communication everywhere.

We are wanting this, because DC2 is the backup centre, while DC1 is the
primary-centre connected directly to the application-server. We don't want
to screw things if something goes bad in DC1.


Will be grateful for pointers.


Thanks and Regards,
Ajay

On Sun, Jan 17, 2016 at 9:09 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:

> Hi All.
>
> A gentle query-reminder.
>
> I will be grateful if I could be given a brief technical overview, as to
> how secure-communication occurs between two nodes in a cluster.
>
> Please note that I wish for some information on the "how it works below
> the hood", and NOT "how to set it up".
>
>
>
> Thanks and Regards,
> Ajay
>
> On Wed, Jan 6, 2016 at 4:16 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> Thanks everyone for the reply.
>>
>> I actually have a fair bit of questions, but it will be nice if someone
>> could please tell me the flow (implementation-wise), as to how node-to-node
>> encryption works in a cluster.
>>
>> Let's say node1 from DC1, wishes to talk securely to node 2 from DC2
>> (with *"require_client_auth: false*").
>> I presume it would be like below (please correct me if am wrong) ::
>>
>> a)
>> node1 tries to connect to node2, using the certificate *as defined on
>> node1* in cassandra.yaml.
>>
>> b)
>> node2 will confirm if the certificate being offered by node1 is in the
>> truststore *as defined on node2* in cassandra.yaml.
>> if it is, secure-communication is allowed.
>>
>>
>> Is my thinking right?
>> I
>>
>> On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave <nehajtriv...@gmail.com> wrote:
>>
>>> Hi Ajay,
>>> Have a look here :
>>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>>
>>> You can configure for DC level Security:
>>>
>>> Procedure
>>>
>>> On each node under sever_encryption_options:
>>>
>>>- Enable internode_encryption.
>>>The available options are:
>>>   - all
>>>   - none
>>>   - dc: Cassandra encrypts the traffic between the data centers.
>>>   - rack: Cassandra encrypts the traffic between the racks.
>>>
>>> regards
>>>
>>> Neha
>>>
>>>
>>>
>>> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet <
>>> absi...@informatica.com> wrote:
>>>
>>>> Security is a very wide concept. What exactly do you want to achieve ?
>>>>
>>>>
>>>>
>>>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>>>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Basic query in setting up secure inter-dc cluster
>>>>
>>>>
>>>>
>>>> Hi All.
>>>>
>>>> We have a 2*2 cluster deployed, but no security as of now.
>>>>
>>>> As a first stage, we wish to implement inter-dc security.
>>>>
>>>> Is it possible to enable security one machine at a time?
>>>>
>>>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>>>
>>>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>>>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>>>> AFTER the changes are made in all the 4 machines?
>>>>
>>>> Asking here, because I don't want to screw up a live cluster due to my
>>>> lack of experience.
>>>>
>>>> Looking forward to some pointers.
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> Ajay
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay


Can we set TTL on individual fields (columns) using the Datastax java-driver

2016-02-08 Thread Ajay Garg
Something like ::


##
class A {

  @Id
  @Column (name = "pojo_key")
  int key;

  @Ttl(10)
  @Column (name = "pojo_temporary_guest")
  String guest;

}
##


When I persist, let's say value "ajay" in guest-field (pojo_temporary_guest
column), it stays forever, and does not become "null" after 10 seconds.

Kindly point me what I am doing wrong.
I will be grateful.


Thanks and Regards,
Ajay


Re: Basic query in setting up secure inter-dc cluster

2016-01-17 Thread Ajay Garg
Hi All.

A gentle query-reminder.

I will be grateful if I could be given a brief technical overview, as to
how secure-communication occurs between two nodes in a cluster.

Please note that I wish for some information on the "how it works below the
hood", and NOT "how to set it up".



Thanks and Regards,
Ajay

On Wed, Jan 6, 2016 at 4:16 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:

> Thanks everyone for the reply.
>
> I actually have a fair bit of questions, but it will be nice if someone
> could please tell me the flow (implementation-wise), as to how node-to-node
> encryption works in a cluster.
>
> Let's say node1 from DC1, wishes to talk securely to node 2 from DC2 (with 
> *"require_client_auth:
> false*").
> I presume it would be like below (please correct me if am wrong) ::
>
> a)
> node1 tries to connect to node2, using the certificate *as defined on
> node1* in cassandra.yaml.
>
> b)
> node2 will confirm if the certificate being offered by node1 is in the
> truststore *as defined on node2* in cassandra.yaml.
> if it is, secure-communication is allowed.
>
>
> Is my thinking right?
> I
>
> On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave <nehajtriv...@gmail.com> wrote:
>
>> Hi Ajay,
>> Have a look here :
>> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>>
>> You can configure for DC level Security:
>>
>> Procedure
>>
>> On each node under sever_encryption_options:
>>
>>- Enable internode_encryption.
>>The available options are:
>>   - all
>>   - none
>>   - dc: Cassandra encrypts the traffic between the data centers.
>>   - rack: Cassandra encrypts the traffic between the racks.
>>
>> regards
>>
>> Neha
>>
>>
>>
>> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet <absi...@informatica.com
>> > wrote:
>>
>>> Security is a very wide concept. What exactly do you want to achieve ?
>>>
>>>
>>>
>>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Basic query in setting up secure inter-dc cluster
>>>
>>>
>>>
>>> Hi All.
>>>
>>> We have a 2*2 cluster deployed, but no security as of now.
>>>
>>> As a first stage, we wish to implement inter-dc security.
>>>
>>> Is it possible to enable security one machine at a time?
>>>
>>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>>
>>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>>> AFTER the changes are made in all the 4 machines?
>>>
>>> Asking here, because I don't want to screw up a live cluster due to my
>>> lack of experience.
>>>
>>> Looking forward to some pointers.
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Ajay
>>>
>>
>>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay


Re: Basic query in setting up secure inter-dc cluster

2016-01-06 Thread Ajay Garg
Thanks everyone for the reply.

I actually have a fair bit of questions, but it will be nice if someone
could please tell me the flow (implementation-wise), as to how node-to-node
encryption works in a cluster.

Let's say node1 from DC1, wishes to talk securely to node 2 from DC2
(with *"require_client_auth:
false*").
I presume it would be like below (please correct me if am wrong) ::

a)
node1 tries to connect to node2, using the certificate *as defined on node1*
in cassandra.yaml.

b)
node2 will confirm if the certificate being offered by node1 is in the
truststore *as defined on node2* in cassandra.yaml.
if it is, secure-communication is allowed.


Is my thinking right?
I

On Wed, Jan 6, 2016 at 1:55 PM, Neha Dave <nehajtriv...@gmail.com> wrote:

> Hi Ajay,
> Have a look here :
> https://docs.datastax.com/en/cassandra/1.2/cassandra/security/secureSSLNodeToNode_t.html
>
> You can configure for DC level Security:
>
> Procedure
>
> On each node under sever_encryption_options:
>
>- Enable internode_encryption.
>The available options are:
>   - all
>   - none
>   - dc: Cassandra encrypts the traffic between the data centers.
>   - rack: Cassandra encrypts the traffic between the racks.
>
> regards
>
> Neha
>
>
>
> On Wed, Jan 6, 2016 at 12:48 PM, Singh, Abhijeet <absi...@informatica.com>
> wrote:
>
>> Security is a very wide concept. What exactly do you want to achieve ?
>>
>>
>>
>> *From:* Ajay Garg [mailto:ajaygargn...@gmail.com]
>> *Sent:* Wednesday, January 06, 2016 11:27 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Basic query in setting up secure inter-dc cluster
>>
>>
>>
>> Hi All.
>>
>> We have a 2*2 cluster deployed, but no security as of now.
>>
>> As a first stage, we wish to implement inter-dc security.
>>
>> Is it possible to enable security one machine at a time?
>>
>> For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
>>
>> If I make the changes JUST IN DC2M2 and restart it, will the traffic
>> between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
>> AFTER the changes are made in all the 4 machines?
>>
>> Asking here, because I don't want to screw up a live cluster due to my
>> lack of experience.
>>
>> Looking forward to some pointers.
>>
>>
>> --
>>
>> Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay


Basic query in setting up secure inter-dc cluster

2016-01-05 Thread Ajay Garg
Hi All.

We have a 2*2 cluster deployed, but no security as of now.
As a first stage, we wish to implement inter-dc security.

Is it possible to enable security one machine at a time?

For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2.
If I make the changes JUST IN DC2M2 and restart it, will the traffic
between DC1M1/DC1M2 and DC2M2 be secure? Or security will kick in ONLY
AFTER the changes are made in all the 4 machines?

Asking here, because I don't want to screw up a live cluster due to my lack
of experience.

Looking forward to some pointers.

-- 
Regards,
Ajay


Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-04 Thread Ajay Garg
Hi All.

I think we got the root-cause.

One of the fields in one of the class was marked with "@Version"
annotation, which was causing the Cassandra-Java-Driver to insert "If Not
Exists" in the insert query, thus invoking SERIAL consistency-level.

We removed the annotation (didn't really need that), and we have not
observed the error since about an hour or so.


Thanks Eric and Bryan for the help !!!


Thanks and Regards,
Ajay

On Wed, Nov 4, 2015 at 8:51 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:

> Hmm... ok.
>
> Ideally, we require ::
>
> a)
> The intra-DC-node-syncing takes place at the statement/query level.
>
> b)
> The inter-DC-node-syncing takes place at cassandra level.
>
>
> That way, we don't spend too much delay at the statement/query level.
>
>
> For the so-called CAS/lightweight transactions, the above are impossible
> then?
>
> On Wed, Nov 4, 2015 at 5:58 AM, Bryan Cheng <br...@blockcypher.com> wrote:
>
>> What Eric means is that SERIAL consistency is a special type of
>> consistency that is only invoked for a subset of operations: those that use
>> CAS/lightweight transactions, for example "IF NOT EXISTS" queries.
>>
>> The differences between CAS operations and standard operations are
>> significant and there are large repercussions for tunable consistency. The
>> amount of time such an operation takes is greatly increased as well; you
>> may need to increase your internal node-to-node timeouts .
>>
>> On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>>
>>> Hi Eric,
>>>
>>> I am sorry, but I don't understand.
>>>
>>> If there had been some issue in the configuration, then the
>>> consistency-issue would be seen everytime (I guess).
>>> As of now, the error is seen sometimes (probably 30% of times).
>>>
>>> On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens <migh...@gmail.com> wrote:
>>>
>>>> Serial consistency gets invoked at the protocol level when doing
>>>> lightweight transactions such as CAS operations.  If you're expecting that
>>>> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
>>>> there aren't enough nodes available to satisfy serial consistency.
>>>>
>>>> See
>>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
>>>>
>>>> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg <ajaygargn...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All.
>>>>>
>>>>> I have a 2*2 Network-Topology Replication setup, and I run my
>>>>> application via DataStax-driver.
>>>>>
>>>>> I frequently get the errors of type ::
>>>>> *Cassandra timeout during write query at consistency SERIAL (3 replica
>>>>> were required but only 0 acknowledged the write)*
>>>>>
>>>>> I have already tried passing a "write-options with LOCAL_QUORUM
>>>>> consistency-level" in all create/save statements, but I still get this
>>>>> error.
>>>>>
>>>>> Does something else need to be changed in
>>>>> /etc/cassandra/cassandra.yaml too?
>>>>> Or may be some another place?
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Ajay
>>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay


Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-03 Thread Ajay Garg
Hmm... ok.

Ideally, we require ::

a)
The intra-DC-node-syncing takes place at the statement/query level.

b)
The inter-DC-node-syncing takes place at cassandra level.


That way, we don't spend too much delay at the statement/query level.


For the so-called CAS/lightweight transactions, the above are impossible
then?

On Wed, Nov 4, 2015 at 5:58 AM, Bryan Cheng <br...@blockcypher.com> wrote:

> What Eric means is that SERIAL consistency is a special type of
> consistency that is only invoked for a subset of operations: those that use
> CAS/lightweight transactions, for example "IF NOT EXISTS" queries.
>
> The differences between CAS operations and standard operations are
> significant and there are large repercussions for tunable consistency. The
> amount of time such an operation takes is greatly increased as well; you
> may need to increase your internal node-to-node timeouts .
>
> On Mon, Nov 2, 2015 at 8:01 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> Hi Eric,
>>
>> I am sorry, but I don't understand.
>>
>> If there had been some issue in the configuration, then the
>> consistency-issue would be seen everytime (I guess).
>> As of now, the error is seen sometimes (probably 30% of times).
>>
>> On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens <migh...@gmail.com> wrote:
>>
>>> Serial consistency gets invoked at the protocol level when doing
>>> lightweight transactions such as CAS operations.  If you're expecting that
>>> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
>>> there aren't enough nodes available to satisfy serial consistency.
>>>
>>> See
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
>>>
>>> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg <ajaygargn...@gmail.com> wrote:
>>>
>>>> Hi All.
>>>>
>>>> I have a 2*2 Network-Topology Replication setup, and I run my
>>>> application via DataStax-driver.
>>>>
>>>> I frequently get the errors of type ::
>>>> *Cassandra timeout during write query at consistency SERIAL (3 replica
>>>> were required but only 0 acknowledged the write)*
>>>>
>>>> I have already tried passing a "write-options with LOCAL_QUORUM
>>>> consistency-level" in all create/save statements, but I still get this
>>>> error.
>>>>
>>>> Does something else need to be changed in /etc/cassandra/cassandra.yaml
>>>> too?
>>>> Or may be some another place?
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ajay
>>>>
>>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay


Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-02 Thread Ajay Garg
Hi All.

I have a 2*2 Network-Topology Replication setup, and I run my application
via DataStax-driver.

I frequently get the errors of type ::
*Cassandra timeout during write query at consistency SERIAL (3 replica were
required but only 0 acknowledged the write)*

I have already tried passing a "write-options with LOCAL_QUORUM
consistency-level" in all create/save statements, but I still get this
error.

Does something else need to be changed in /etc/cassandra/cassandra.yaml too?
Or may be some another place?

-- 
Regards,
Ajay


Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-02 Thread Ajay Garg
Hi Eric,

I am sorry, but I don't understand.

If there had been some issue in the configuration, then the
consistency-issue would be seen everytime (I guess).
As of now, the error is seen sometimes (probably 30% of times).

On Mon, Nov 2, 2015 at 10:24 PM, Eric Stevens <migh...@gmail.com> wrote:

> Serial consistency gets invoked at the protocol level when doing
> lightweight transactions such as CAS operations.  If you're expecting that
> your topology is RF=2, N=2, it seems like some keyspace has RF=3, and so
> there aren't enough nodes available to satisfy serial consistency.
>
> See
> http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html
>
> On Mon, Nov 2, 2015 at 1:29 AM Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> Hi All.
>>
>> I have a 2*2 Network-Topology Replication setup, and I run my application
>> via DataStax-driver.
>>
>> I frequently get the errors of type ::
>> *Cassandra timeout during write query at consistency SERIAL (3 replica
>> were required but only 0 acknowledged the write)*
>>
>> I have already tried passing a "write-options with LOCAL_QUORUM
>> consistency-level" in all create/save statements, but I still get this
>> error.
>>
>> Does something else need to be changed in /etc/cassandra/cassandra.yaml
>> too?
>> Or may be some another place?
>>
>>
>> --
>> Regards,
>> Ajay
>>
>


-- 
Regards,
Ajay


Can consistency-levels be different for "read" and "write" in Datastax Java-Driver?

2015-10-26 Thread Ajay Garg
Right now, I have setup "LOCAL QUORUM" as the consistency level in the
driver, but it seems that "SERIAL" is being used during writes, and I
consistently get this error of type ::

*Cassandra timeout during write query at consistency SERIAL (3 replica were
required but only 0 acknowledged the write)*


Am I missing something?


-- 
Regards,
Ajay


Re: Is replication possible with already existing data?

2015-10-25 Thread Ajay Garg
Some more observations ::

a)
CAS11 and CAS12 are down, CAS21 and CAS22 up.
If I connect via the driver to the cluster using only CAS21 and CAS22 as
contact-points, even then the exception occurs.

b)
CAS11 down, CAS12 up, CAS21 and CAS22 up.
If I connect via the driver to the cluster using only CAS21 and CAS22 as
contact-points, then connection goes fine.

c)
CAS11 up, CAS12 down, CAS21 and CAS22 up.
If I connect via the driver to the cluster using only CAS21 and CAS22 as
contact-points, then connection goes fine.


Seems the java-driver is kinda always requiring either one of CAS11 or
CAS12 to be up (although the expectation is that the driver must work fine
if ANY of the 4 nodes is up).


Thoughts, experts !? :)



On Sat, Oct 24, 2015 at 9:40 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:

> Ideas please, on what I may be doing wrong?
>
> On Sat, Oct 24, 2015 at 5:48 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> Hi All.
>>
>> I have been doing extensive testing, and replication works fine, even if
>> any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought up.
>> Syncing always takes place (obviously, as long as continuous-downtime-value
>> does not exceed *max_hint_window_in_ms*).
>>
>>
>> However, things behave weird when I try connecting via DataStax
>> Java-Driver.
>> I always add the nodes to the cluster in the order ::
>>
>>  CAS11, CAS12, CAS21, CAS22
>>
>> during "cluster.connect" method.
>>
>>
>> Now, following happens ::
>>
>> a)
>> If CAS11 goes down, data is persisted fine (presumably first in CAS12,
>> and later replicated to CAS21 and CAS22).
>>
>> b)
>> If CAS11 and CAS12 go down, data is NOT persisted.
>> Instead the following exceptions are observed in the Java-Driver ::
>>
>>
>> ##
>> Exception in thread "main"
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>> tried for query failed (no host was tried)
>> at
>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>> at
>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
>> at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
>> at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
>> at
>> com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
>> at
>> com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
>> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
>> All host(s) tried for query failed (no host was tried)
>> at
>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>> at
>> com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
>> at
>> com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
>> at
>> com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
>> at
>> com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
>> at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
>> ... 3 more
>>
>> ###
>>
>>
>> I have already tried ::
>>
>> 1)
>> Increasing driver-read-timeout from 12 seconds to 30 seconds.
>>
>> 2)
>> Increasing driver-connect-timeout from 5 seconds to 30 seconds.
>>
>> 3)
>> I have also confirmed that each of the 4 nodes are telnet-able over ports
>> 9042 and 9160 each.
>>
>>
>> Definitely seems to be some driver-issue, since
>> data-persistence/replication works perfect (with any permutation) if
>> data-persistence is done via "cqlsh".
>>
>>
>> Kindly provide some pointers.
>> Ultimately, it is the Java-driver that will be used in production, so it
>> is imperative that data-persistence/replication happens for any downing of
>> any permutation of node(s).
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay


Re: Is replication possible with already existing data?

2015-10-25 Thread Ajay Garg
Bingo !!!

Using "LoadBalancingPolicy" did the trick.
Exactly what was needed !!!


Thanks and Regards,
Ajay

On Sun, Oct 25, 2015 at 5:52 PM, Ryan Svihla <r...@foundev.pro> wrote:

> Ajay,
>
> So It's the default driver behavior to pin requests to the first data
> center it connects to (DCAwareRoundRobin strategy). but let me explain why
> this is.
>
> I think you're thinking about data centers in Cassandra as a unit of
> failure, and while you can have say a rack fail, as you scale up and use
> rack awareness, it's rare you lose a whole "data center" in the sense
> you're thinking about, so lets reset a bit:
>
>1. If I'm designing a multidc architecture, usually the nature of
>latency I will not want my app servers connecting _across_ data centers.
>2. So since the common desire is not to magically have very high
>latency requests  bleed out to remote data centers, the default behavior of
>the driver is to pin to the first data center it connects too, you can
>change this with a different Load Balancing Policy (
>
> http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/policies/LoadBalancingPolicy.html
>)
>3. However, I generally do NOT advise users connecting to an app
>server from another data center, since Cassandra is a masterless
>architecture you typically have issues that affect nodes, and not an entire
>data center and if they affect an entire data center (say the intra DC link
>is down) then it's going to affect your app server as well!
>
> So for new users, I typically just recommend pinning an app server to a DC
> and do your data center level switching further up. You can get more
> advanced and handle bleed out later, but you have to think of latencies.
>
> Final point, rely on repairs for your data consistency, hints are great
> and all but repair is how you make sure you're in sync.
>
> On Sun, Oct 25, 2015 at 3:10 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> Some more observations ::
>>
>> a)
>> CAS11 and CAS12 are down, CAS21 and CAS22 up.
>> If I connect via the driver to the cluster using only CAS21 and CAS22 as
>> contact-points, even then the exception occurs.
>>
>> b)
>> CAS11 down, CAS12 up, CAS21 and CAS22 up.
>> If I connect via the driver to the cluster using only CAS21 and CAS22 as
>> contact-points, then connection goes fine.
>>
>> c)
>> CAS11 up, CAS12 down, CAS21 and CAS22 up.
>> If I connect via the driver to the cluster using only CAS21 and CAS22 as
>> contact-points, then connection goes fine.
>>
>>
>> Seems the java-driver is kinda always requiring either one of CAS11 or
>> CAS12 to be up (although the expectation is that the driver must work fine
>> if ANY of the 4 nodes is up).
>>
>>
>> Thoughts, experts !? :)
>>
>>
>>
>> On Sat, Oct 24, 2015 at 9:40 PM, Ajay Garg <ajaygargn...@gmail.com>
>> wrote:
>>
>>> Ideas please, on what I may be doing wrong?
>>>
>>> On Sat, Oct 24, 2015 at 5:48 PM, Ajay Garg <ajaygargn...@gmail.com>
>>> wrote:
>>>
>>>> Hi All.
>>>>
>>>> I have been doing extensive testing, and replication works fine, even
>>>> if any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought
>>>> up. Syncing always takes place (obviously, as long as
>>>> continuous-downtime-value does not exceed *max_hint_window_in_ms*).
>>>>
>>>>
>>>> However, things behave weird when I try connecting via DataStax
>>>> Java-Driver.
>>>> I always add the nodes to the cluster in the order ::
>>>>
>>>>  CAS11, CAS12, CAS21, CAS22
>>>>
>>>> during "cluster.connect" method.
>>>>
>>>>
>>>> Now, following happens ::
>>>>
>>>> a)
>>>> If CAS11 goes down, data is persisted fine (presumably first in CAS12,
>>>> and later replicated to CAS21 and CAS22).
>>>>
>>>> b)
>>>> If CAS11 and CAS12 go down, data is NOT persisted.
>>>> Instead the following exceptions are observed in the Java-Driver ::
>>>>
>>>>
>>>> ##
>>>> Exception in thread "main"
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>>> tried for query failed (no host was tried)
>>>> at
>&g

Re: Is replication possible with already existing data?

2015-10-24 Thread Ajay Garg
Ideas please, on what I may be doing wrong?

On Sat, Oct 24, 2015 at 5:48 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:

> Hi All.
>
> I have been doing extensive testing, and replication works fine, even if
> any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought up.
> Syncing always takes place (obviously, as long as continuous-downtime-value
> does not exceed *max_hint_window_in_ms*).
>
>
> However, things behave weird when I try connecting via DataStax
> Java-Driver.
> I always add the nodes to the cluster in the order ::
>
>  CAS11, CAS12, CAS21, CAS22
>
> during "cluster.connect" method.
>
>
> Now, following happens ::
>
> a)
> If CAS11 goes down, data is persisted fine (presumably first in CAS12, and
> later replicated to CAS21 and CAS22).
>
> b)
> If CAS11 and CAS12 go down, data is NOT persisted.
> Instead the following exceptions are observed in the Java-Driver ::
>
>
> ##
> Exception in thread "main"
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
> tried for query failed (no host was tried)
> at
> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
> at
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
> at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
> at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
> at
> com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
> at
> com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
> All host(s) tried for query failed (no host was tried)
> at
> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
> at
> com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
> at
> com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
> at
> com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
> at
> com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
> at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
> ... 3 more
>
> ###
>
>
> I have already tried ::
>
> 1)
> Increasing driver-read-timeout from 12 seconds to 30 seconds.
>
> 2)
> Increasing driver-connect-timeout from 5 seconds to 30 seconds.
>
> 3)
> I have also confirmed that each of the 4 nodes are telnet-able over ports
> 9042 and 9160 each.
>
>
> Definitely seems to be some driver-issue, since
> data-persistence/replication works perfect (with any permutation) if
> data-persistence is done via "cqlsh".
>
>
> Kindly provide some pointers.
> Ultimately, it is the Java-driver that will be used in production, so it
> is imperative that data-persistence/replication happens for any downing of
> any permutation of node(s).
>
>
> Thanks and Regards,
> Ajay
>



-- 
Regards,
Ajay


Re: Downtime-Limit for a node in Network-Topology-Replication-Cluster?

2015-10-24 Thread Ajay Garg
Never mind Vasileios, you have been a great help !!
Thanks a ton again !!!


Thanks and Regards,
Ajay

On Sat, Oct 24, 2015 at 10:17 PM, Vasileios Vlachos <
vasileiosvlac...@gmail.com> wrote:

> I am not sure I fully understand the question, because nodetool repair is
> one of the three ways for Cassandra to ensure consistency. If by "affect"
> you mean "make your data consistent and ensure all replicas are
> up-to-date", then yes, that's what I think it does.
>
> And yes, I would expect nodetool repair (especially depending on the
> options appended to it) to have a performance impact, but how big that
> impact is going to be depends on many things.
>
> We currently perform no scheduled repairs because of our workload and the
> consistency level that we use. So, as you can understand I am certainly not
> the best person to analyse that bit...
>
> Regards,
> Vasilis
>
> On Sat, Oct 24, 2015 at 5:09 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> Thanks a ton Vasileios !!
>>
>> Just one last question ::
>> Does running "nodetool repair" affect the functionality of cluster for
>> current-live data?
>>
>> It's ok if the insertions/deletions of current-live data become a little
>> slow during the process, but data-consistency must be maintained. If that
>> is the case, I think we are good.
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>> On Sat, Oct 24, 2015 at 6:03 PM, Vasileios Vlachos <
>> vasileiosvlac...@gmail.com> wrote:
>>
>>> Hello Ajay,
>>>
>>> Here is a good link:
>>>
>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesManualRepair.html
>>>
>>> Generally, I find the DataStax docs to be OK. You could consult them for
>>> all usual operations etc. Ofc there are occasions where a given concept is
>>> not as clear, but you can always ask this list for clarification.
>>>
>>> If you find that something is wrong in the docs just email them (more
>>> info and contact email here: http://docs.datastax.com/en/ ).
>>>
>>> Regards,
>>> Vasilis
>>>
>>> On Sat, Oct 24, 2015 at 1:04 PM, Ajay Garg <ajaygargn...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Vasileios for the reply !!!
>>>> That makes sense !!!
>>>>
>>>> I will be grateful if you could point me to the node-repair command for
>>>> Cassandra-2.1.10.
>>>> I don't want to get stuck in a wrong-versioned documentation (already
>>>> bitten once hard when setting up replication).
>>>>
>>>> Thanks again...
>>>>
>>>>
>>>> Thanks and Regards,
>>>> Ajay
>>>>
>>>> On Sat, Oct 24, 2015 at 4:14 PM, Vasileios Vlachos <
>>>> vasileiosvlac...@gmail.com> wrote:
>>>>
>>>>> Hello Ajay,
>>>>>
>>>>> Have a look in the *max_hint_window_in_ms* :
>>>>>
>>>>>
>>>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
>>>>>
>>>>> My understanding is that if a node remains down for more than
>>>>> *max_hint_window_in_ms*, then you will need to repair that node.
>>>>>
>>>>> Thanks,
>>>>> Vasilis
>>>>>
>>>>> On Sat, Oct 24, 2015 at 7:48 AM, Ajay Garg <ajaygargn...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> If a node in the cluster goes down and comes up, the data gets synced
>>>>>> up on this downed node.
>>>>>> Is there a limit on the interval for which the node can remain down?
>>>>>> Or the data will be synced up even if the node remains down for
>>>>>> weeks/months/years?
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Ajay
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ajay
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay


Re: Downtime-Limit for a node in Network-Topology-Replication-Cluster?

2015-10-24 Thread Ajay Garg
Thanks a ton Vasileios !!

Just one last question ::
Does running "nodetool repair" affect the functionality of cluster for
current-live data?

It's ok if the insertions/deletions of current-live data become a little
slow during the process, but data-consistency must be maintained. If that
is the case, I think we are good.


Thanks and Regards,
Ajay

On Sat, Oct 24, 2015 at 6:03 PM, Vasileios Vlachos <
vasileiosvlac...@gmail.com> wrote:

> Hello Ajay,
>
> Here is a good link:
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesManualRepair.html
>
> Generally, I find the DataStax docs to be OK. You could consult them for
> all usual operations etc. Ofc there are occasions where a given concept is
> not as clear, but you can always ask this list for clarification.
>
> If you find that something is wrong in the docs just email them (more info
> and contact email here: http://docs.datastax.com/en/ ).
>
> Regards,
> Vasilis
>
> On Sat, Oct 24, 2015 at 1:04 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> Thanks Vasileios for the reply !!!
>> That makes sense !!!
>>
>> I will be grateful if you could point me to the node-repair command for
>> Cassandra-2.1.10.
>> I don't want to get stuck in a wrong-versioned documentation (already
>> bitten once hard when setting up replication).
>>
>> Thanks again...
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>> On Sat, Oct 24, 2015 at 4:14 PM, Vasileios Vlachos <
>> vasileiosvlac...@gmail.com> wrote:
>>
>>> Hello Ajay,
>>>
>>> Have a look in the *max_hint_window_in_ms* :
>>>
>>>
>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
>>>
>>> My understanding is that if a node remains down for more than
>>> *max_hint_window_in_ms*, then you will need to repair that node.
>>>
>>> Thanks,
>>> Vasilis
>>>
>>> On Sat, Oct 24, 2015 at 7:48 AM, Ajay Garg <ajaygargn...@gmail.com>
>>> wrote:
>>>
>>>> If a node in the cluster goes down and comes up, the data gets synced
>>>> up on this downed node.
>>>> Is there a limit on the interval for which the node can remain down? Or
>>>> the data will be synced up even if the node remains down for
>>>> weeks/months/years?
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Ajay
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay


Some questions about setting public/private IP-Addresses in Cassandra Cluster

2015-10-24 Thread Ajay Garg
Hi All.

We have a scenario, where the Application-Server (APP), Node-1 (CAS11), and
Node-2 (CAS12) are hosted in DC1.
Node-3 (CAS21) and Node-4 (CAS22) are in DC2.

The intention is that we provide 4-way redundancy to APP, by specifying
CAS11, CAS12, CAS21 and CAS22 as the addresses via Java-Cassandra-connector.
That means, as long as at least one of the 4 nodes are up, the APP should
work.

We are using Network-Topology, with Murmur3Paritioning.
Each Cassandra-Node has two IPs :: one public, and one
private-within-the-same-data-center.


Following are our IP-Addresses configuration ::

a)
Everywhere in "cassandra-topology.properties", we have specified
Public-IP-Addresses of all 4 nodes.

b)
In each of "listen_address" in /etc/cassandra/cassandra.yaml, we have
specified the corresponding Public-IP-Address of the node.

c)
For CAS11 and CAS12, we have specified the corresponding private-IP-Address
for "rpc_address" in /etc/cassandra/cassandra.yaml (since APP is hosted in
the same data-center).
For CAS21 and CAS22, we have specified the corresponding public-IP-Address
for "rpc_address" in /etc/cassandra/cassandra.yaml (since APP can only
communicate over public IP-Addresses with these nodes).


Are any further optimizations possible, in the sense that specifying
private-IP-Addresses would work?
I ask this, because we need to minimize network-latency, so possibility of
private-IP-addresses will help in this regard.


Thanks and Regards,
Ajay


Downtime-Limit for a node in Network-Topology-Replication-Cluster?

2015-10-24 Thread Ajay Garg
If a node in the cluster goes down and comes up, the data gets synced up on
this downed node.
Is there a limit on the interval for which the node can remain down? Or the
data will be synced up even if the node remains down for weeks/months/years?



-- 
Regards,
Ajay


Re: Downtime-Limit for a node in Network-Topology-Replication-Cluster?

2015-10-24 Thread Ajay Garg
Thanks Vasileios for the reply !!!
That makes sense !!!

I will be grateful if you could point me to the node-repair command for
Cassandra-2.1.10.
I don't want to get stuck in a wrong-versioned documentation (already
bitten once hard when setting up replication).

Thanks again...


Thanks and Regards,
Ajay

On Sat, Oct 24, 2015 at 4:14 PM, Vasileios Vlachos <
vasileiosvlac...@gmail.com> wrote:

> Hello Ajay,
>
> Have a look in the *max_hint_window_in_ms* :
>
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
>
> My understanding is that if a node remains down for more than
> *max_hint_window_in_ms*, then you will need to repair that node.
>
> Thanks,
> Vasilis
>
> On Sat, Oct 24, 2015 at 7:48 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> If a node in the cluster goes down and comes up, the data gets synced up
>> on this downed node.
>> Is there a limit on the interval for which the node can remain down? Or
>> the data will be synced up even if the node remains down for
>> weeks/months/years?
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay


Re: Is replication possible with already existing data?

2015-10-24 Thread Ajay Garg
Hi All.

I have been doing extensive testing, and replication works fine, even if
any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought up.
Syncing always takes place (obviously, as long as continuous-downtime-value
does not exceed *max_hint_window_in_ms*).


However, things behave weird when I try connecting via DataStax Java-Driver.
I always add the nodes to the cluster in the order ::

 CAS11, CAS12, CAS21, CAS22

during "cluster.connect" method.


Now, following happens ::

a)
If CAS11 goes down, data is persisted fine (presumably first in CAS12, and
later replicated to CAS21 and CAS22).

b)
If CAS11 and CAS12 go down, data is NOT persisted.
Instead the following exceptions are observed in the Java-Driver ::

##
Exception in thread "main"
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (no host was tried)
at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
at
com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
at com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (no host was tried)
at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at
com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
at
com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
at
com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
at
com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
... 3 more
###


I have already tried ::

1)
Increasing driver-read-timeout from 12 seconds to 30 seconds.

2)
Increasing driver-connect-timeout from 5 seconds to 30 seconds.

3)
I have also confirmed that each of the 4 nodes are telnet-able over ports
9042 and 9160 each.


Definitely seems to be some driver-issue, since
data-persistence/replication works perfect (with any permutation) if
data-persistence is done via "cqlsh".


Kindly provide some pointers.
Ultimately, it is the Java-driver that will be used in production, so it is
imperative that data-persistence/replication happens for any downing of any
permutation of node(s).


Thanks and Regards,
Ajay


Re: Is replication possible with already existing data?

2015-10-23 Thread Ajay Garg
Any ideas, please?
To repeat, we are using the exact same cassandra-version on all 4 nodes
(2.1.10).

On Fri, Oct 23, 2015 at 9:43 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:

> Hi Michael.
>
> Please find below the contents of cassandra.yaml for CAS11 (the files on
> the rest of the three nodes are also exactly the same, except the
> "initial_token" and "listen_address" fields) ::
>
> CAS11 ::
>
> 
> cluster_name: 'InstaMsg Cluster'
> num_tokens: 256
> initial_token: -9223372036854775808
> hinted_handoff_enabled: true
> max_hint_window_in_ms: 1080 # 3 hours
> hinted_handoff_throttle_in_kb: 1024
> max_hints_delivery_threads: 2
> batchlog_replay_throttle_in_kb: 1024
> authenticator: AllowAllAuthenticator
> authorizer: AllowAllAuthorizer
> permissions_validity_in_ms: 2000
> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> data_file_directories:
> - /var/lib/cassandra/data
>
> commitlog_directory: /var/lib/cassandra/commitlog
>
> disk_failure_policy: stop
> commit_failure_policy: stop
> key_cache_size_in_mb:
> key_cache_save_period: 14400
> row_cache_size_in_mb: 0
> row_cache_save_period: 0
> counter_cache_size_in_mb:
> counter_cache_save_period: 7200
> saved_caches_directory: /var/lib/cassandra/saved_caches
> commitlog_sync: periodic
> commitlog_sync_period_in_ms: 1
> commitlog_segment_size_in_mb: 32
> seed_provider:
> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>   parameters:
>   - seeds: "104.239.200.33,119.9.92.77"
>
> concurrent_reads: 32
> concurrent_writes: 32
> concurrent_counter_writes: 32
>
> memtable_allocation_type: heap_buffers
>
> index_summary_capacity_in_mb:
> index_summary_resize_interval_in_minutes: 60
> trickle_fsync: false
> trickle_fsync_interval_in_kb: 10240
> storage_port: 7000
> ssl_storage_port: 7001
> listen_address: 104.239.200.33
> start_native_transport: true
> native_transport_port: 9042
> start_rpc: true
> rpc_address: localhost
> rpc_port: 9160
> rpc_keepalive: true
>
> rpc_server_type: sync
> thrift_framed_transport_size_in_mb: 15
> incremental_backups: false
> snapshot_before_compaction: false
> auto_snapshot: true
>
> tombstone_warn_threshold: 1000
> tombstone_failure_threshold: 10
>
> column_index_size_in_kb: 64
> batch_size_warn_threshold_in_kb: 5
>
> compaction_throughput_mb_per_sec: 16
> compaction_large_partition_warning_threshold_mb: 100
>
> sstable_preemptive_open_interval_in_mb: 50
>
> read_request_timeout_in_ms: 5000
> range_request_timeout_in_ms: 1
>
> write_request_timeout_in_ms: 2000
> counter_write_request_timeout_in_ms: 5000
> cas_contention_timeout_in_ms: 1000
> truncate_request_timeout_in_ms: 6
> request_timeout_in_ms: 1
> cross_node_timeout: false
> endpoint_snitch: PropertyFileSnitch
>
> dynamic_snitch_update_interval_in_ms: 100
> dynamic_snitch_reset_interval_in_ms: 60
> dynamic_snitch_badness_threshold: 0.1
>
> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
>
> server_encryption_options:
> internode_encryption: none
> keystore: conf/.keystore
> keystore_password: cassandra
> truststore: conf/.truststore
> truststore_password: cassandra
>
> client_encryption_options:
> enabled: false
> keystore: conf/.keystore
> keystore_password: cassandra
>
> internode_compression: all
> inter_dc_tcp_nodelay: false
> 
>
>
> What changes need to be made, so that whenever a downed server comes back
> up, the missing data comes back over to it?
>
> Thanks and Regards,
> Ajay
>
>
>
> On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler <mich...@pbandjelly.org>
> wrote:
>
>> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>>
>>> However, CAS11 refuses to come up now.
>>> Following is the error in /var/log/cassandra/system.log ::
>>>
>>>
>>> 
>>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>>> configuration error
>>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>>> the number of tokens from 1 to 256
>>>
>>
>> Check your cassandra.yaml - this node has vnodes enabled in the
>> configuration when it did not, previously. Check all nodes. Something
>> changed. Mixed vnode/non-vnode clusters is bad juju.
>>
>> --
>> Kind regards,
>> Michael
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay


Re: Is replication possible with already existing data?

2015-10-23 Thread Ajay Garg
Thanks Steve and Michael.

Simply uncommenting "initial_token" did the trick !!!

Right now, I was evaluating replication, for the case when everything is a
clean install.
Will now try my hands on integrating/starting replication, with
pre-existing data.


Once again, thanks a ton for all the help guys !!!


Thanks and Regards,
Ajay

On Sat, Oct 24, 2015 at 2:06 AM, Steve Robenalt <sroben...@highwire.org>
wrote:

> Hi Ajay,
>
> Please take a look at the cassandra.yaml configuration reference regarding
> intial_token and num_tokens:
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__initial_token
>
> This is basically what Michael was referring to in his earlier message.
> Setting an initial token overrode your num_tokens setting on initial
> startup, but after initial startup, the initial token setting is ignored,
> so num_tokens comes into play, attempting to start up with 256 vnodes.
> That's where your error comes from.
>
> It's likely that all of your nodes started up like this since you have the
> same config on all of them (hopefully, you at least changed initial_token
> for each node).
>
> After reviewing the doc on the two sections above, you'll need to decide
> which path to take to recover. You can likely bring the downed node up by
> setting num_tokens to 1 (which you'd need to do on all nodes), in which
> case you're not really running vnodes. Alternately, you can migrate the
> cluster to vnodes:
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configVnodesProduction_t.html
>
> BTW, I recommend carefully reviewing the cassandra.yaml configuration
> reference for ANY change you make from the default. As you've experienced
> here, not all settings are intended to work together.
>
> HTH,
> Steve
>
>
>
> On Fri, Oct 23, 2015 at 12:07 PM, Ajay Garg <ajaygargn...@gmail.com>
> wrote:
>
>> Any ideas, please?
>> To repeat, we are using the exact same cassandra-version on all 4 nodes
>> (2.1.10).
>>
>> On Fri, Oct 23, 2015 at 9:43 AM, Ajay Garg <ajaygargn...@gmail.com>
>> wrote:
>>
>>> Hi Michael.
>>>
>>> Please find below the contents of cassandra.yaml for CAS11 (the files on
>>> the rest of the three nodes are also exactly the same, except the
>>> "initial_token" and "listen_address" fields) ::
>>>
>>> CAS11 ::
>>>
>>>
>>>
>>> What changes need to be made, so that whenever a downed server comes
>>> back up, the missing data comes back over to it?
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>>
>>>
>>> On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler <mich...@pbandjelly.org>
>>> wrote:
>>>
>>>> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>>>>
>>>>> However, CAS11 refuses to come up now.
>>>>> Following is the error in /var/log/cassandra/system.log ::
>>>>>
>>>>>
>>>>> 
>>>>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>>>>> configuration error
>>>>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>>>>> the number of tokens from 1 to 256
>>>>>
>>>>
>>>> Check your cassandra.yaml - this node has vnodes enabled in the
>>>> configuration when it did not, previously. Check all nodes. Something
>>>> changed. Mixed vnode/non-vnode clusters is bad juju.
>>>>
>>>> --
>>>> Kind regards,
>>>> Michael
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Steve Robenalt
> Software Architect
> sroben...@highwire.org <bza...@highwire.org>
> (office/cell): 916-505-1785
>
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org
>
> Technology for Scholarly Communication
>



-- 
Regards,
Ajay


Re: Is replication possible with already existing data?

2015-10-22 Thread Ajay Garg
Hi Carlos.


I setup a following setup ::

CAS11 and CAS12 in DC1
CAS21 and CAS22 in DC2

a)
Brought all the 4 up, replication worked perfect !!!

b)
Thereafter, downed CAS11 via "sudo service cassandra stop".
Replication continued to work fine on CAS12, CAS21 and CAS22.

c)
Thereafter, upped CAS11 via "sudo service cassandra start".


However, CAS11 refuses to come up now.
Following is the error in /var/log/cassandra/system.log ::



ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
configuration error
org.apache.cassandra.exceptions.ConfigurationException: Cannot change the
number of tokens from 1 to 256
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:966)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:734)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
[apache-cassandra-2.1.10.jar:2.1.10]
INFO  [StorageServiceShutdownHook] 2015-10-23 03:07:34,271
Gossiper.java:1442 - Announcing shutdown
INFO  [GossipStage:1] 2015-10-23 03:07:34,282 OutboundTcpConnection.java:97
- OutboundTcpConnection using coalescing strategy DISABLED
ERROR [StorageServiceShutdownHook] 2015-10-23 03:07:34,305
CassandraDaemon.java:227 - Exception in thread
Thread[StorageServiceShutdownHook,5,main]
java.lang.NullPointerException: null
at
org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1624)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1632)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1686)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1510)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationStateInternal(Gossiper.java:1412)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationStates(Gossiper.java:1427)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1417)
~[apache-cassandra-2.1.10.jar:2.1.10]
at org.apache.cassandra.gms.Gossiper.stop(Gossiper.java:1443)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:678)
~[apache-cassandra-2.1.10.jar:2.1.10]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-2.1.10.jar:2.1.10]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]



Ideas?


Thanks and Regards,
Ajay



On Mon, Oct 12, 2015 at 3:46 PM, Carlos Alonso <i...@mrcalonso.com> wrote:

> Yes Ajay, in your particular scenario, after all hints are delivered, both
> CAS11 and CAS12 will have the exact same data.
>
> Cheers!
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 11 October 2015 at 05:21, Ajay Garg <ajaygargn...@gmail.com> wrote:
>
>> Thanks a ton Anuja for the help !!!
>>
>> On Fri, Oct 9, 2015 at 12:38 PM, anuja jain <anujaja...@gmail.com> wrote:
>> > Hi Ajay,
>> >
>> >
>> > On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg <ajaygargn...@gmail.com>
>> wrote:
>> >>
>> > In this case, it will be the responsibility of APP1 to start connection
>> to
>> > CAS12. On the other hand if your APP1 is connecting to cassandra using
>> Java
>> > driver, you can add multiple contact points(CAS11 and CAS12 here) so
>> that if
>> > CAS11 is down it will directly connect to CAS12.
>>
>> Great .. Java-driver it will be :)
>>
>>
>>
>>
>> >>
>> > In such a case, CAS12 will store hints for the data to be stored on
>> CAS11
>> > (the tokens of which lies within the range of tokens CAS11 holds)  and
>> > whenever CAS11 is up again, the hints will be transferred to it and the
>> data
>> > will be di

Re: Is replication possible with already existing data?

2015-10-22 Thread Ajay Garg
Hi Michael.

Please find below the contents of cassandra.yaml for CAS11 (the files on
the rest of the three nodes are also exactly the same, except the
"initial_token" and "listen_address" fields) ::

CAS11 ::


cluster_name: 'InstaMsg Cluster'
num_tokens: 256
initial_token: -9223372036854775808
hinted_handoff_enabled: true
max_hint_window_in_ms: 1080 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog

disk_failure_policy: stop
commit_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  - seeds: "104.239.200.33,119.9.92.77"

concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

memtable_allocation_type: heap_buffers

index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: 104.239.200.33
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: localhost
rpc_port: 9160
rpc_keepalive: true

rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true

tombstone_warn_threshold: 1000
tombstone_failure_threshold: 10

column_index_size_in_kb: 64
batch_size_warn_threshold_in_kb: 5

compaction_throughput_mb_per_sec: 16
compaction_large_partition_warning_threshold_mb: 100

sstable_preemptive_open_interval_in_mb: 50

read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 1

write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 6
request_timeout_in_ms: 1
cross_node_timeout: false
endpoint_snitch: PropertyFileSnitch

dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 60
dynamic_snitch_badness_threshold: 0.1

request_scheduler: org.apache.cassandra.scheduler.NoScheduler

server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra

client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra

internode_compression: all
inter_dc_tcp_nodelay: false



What changes need to be made, so that whenever a downed server comes back
up, the missing data comes back over to it?

Thanks and Regards,
Ajay



On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler <mich...@pbandjelly.org>
wrote:

> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>
>> However, CAS11 refuses to come up now.
>> Following is the error in /var/log/cassandra/system.log ::
>>
>>
>> 
>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>> configuration error
>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>> the number of tokens from 1 to 256
>>
>
> Check your cassandra.yaml - this node has vnodes enabled in the
> configuration when it did not, previously. Check all nodes. Something
> changed. Mixed vnode/non-vnode clusters is bad juju.
>
> --
> Kind regards,
> Michael
>



-- 
Regards,
Ajay


Re: Is replication possible with already existing data?

2015-10-10 Thread Ajay Garg
Thanks a ton Anuja for the help !!!

On Fri, Oct 9, 2015 at 12:38 PM, anuja jain <anujaja...@gmail.com> wrote:
> Hi Ajay,
>
>
> On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>>
> In this case, it will be the responsibility of APP1 to start connection to
> CAS12. On the other hand if your APP1 is connecting to cassandra using Java
> driver, you can add multiple contact points(CAS11 and CAS12 here) so that if
> CAS11 is down it will directly connect to CAS12.

Great .. Java-driver it will be :)




>>
> In such a case, CAS12 will store hints for the data to be stored on CAS11
> (the tokens of which lies within the range of tokens CAS11 holds)  and
> whenever CAS11 is up again, the hints will be transferred to it and the data
> will be distributed evenly.
>

Evenly?

Should not the data be """EXACTLY""" equal after CAS11 comes back up
and the sync/transfer/whatever happens?
After all, before CAS11 went down, CAS11 and CAS12 were replicating all data.


Once again, thanks for your help.
I will be even more grateful if you would help me clear the lingering
doubt to second point.


Thanks and Regards,
Ajay


Re: Is replication possible with already existing data?

2015-10-08 Thread Ajay Garg
On Thu, Oct 8, 2015 at 9:47 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:
> Thanks Eric for the reply.
>
>
> On Thu, Oct 8, 2015 at 1:44 AM, Eric Stevens <migh...@gmail.com> wrote:
>> If you're at 1 node (N=1) and RF=1 now, and you want to go N=3 RF=3, you
>> ought to be able to increase RF to 3 before bootstrapping your new nodes,
>> with no downtime and no loss of data (even temporary).  Effective RF is
>> min-bounded by N, so temporarily having RF > N ought to behave as RF = N.
>>
>> If you're starting at N > RF and you want to increase RF, things get
>> harrier
>> if you can't afford temporary consistency issues.
>>
>
> We are ok with temporary consistency issues.
>
> Also, I was going through the following articles
> https://10kloc.wordpress.com/2012/12/27/cassandra-chapter-5-data-replication-strategies/
>
> and following doubts came up in my mind ::
>
>
> a)
> Let's say at site-1, Application-Server (APP1) uses the two
> Cassandra-instances (CAS11 and CAS12), and APP1 generally uses CAS11 for all
> its needs (of course, whatever happens on CAS11, the same is replicated to
> CAS12 at Cassandra-level).
>
> Now, if CAS11 goes down, will it be the responsibility of APP1 to "detect"
> this and pick up CAS12 for its needs?
> Or some automatic Cassandra-magic will happen?
>
>
> b)
> In the same above scenario, let's say before CAS11 goes down, the amount of
> data in both CAS11 and CAS12 was "x".
>
> After CAS11 goes down, the data is being put in CAS12 only.
> After some time, CAS11 comes back up.
>
> Now, data in CAS11 is still "x", while data in CAS12 is "y" (obviously, "y"
>> "x").
>
> Now, will the additional ("y" - "x") data be automatically
> put/replicated/whatever back in CAS11 through Cassandra?
> Or it has to be done manually?
>

Any pointers, please ???

>
> If there are easy recommended solutions to above, I am beginning to think
> that a 2*2 (2 nodes each at 2 data-centres) will be the ideal setup
> (allowing failures of entire site, or a few nodes on the same site).
>
> I am sorry for asking such newbie questions, and I will be grateful if these
> silly questions could be answered by the experts :)
>
>
> Thanks and Regards,
> Ajay



-- 
Regards,
Ajay


Is replication possible with already existing data?

2015-10-07 Thread Ajay Garg
Hi All.

We have a scenario, where till now we had been using a plain, simple
single node, with the keyspace created using ::

CREATE KEYSPACE our_db WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '1'}  AND durable_writes = true;


We now plan to introduce replication (in the true sense) in our scheme
of things, but cannot afford to lose any data.
We, however can take a bit of downtime, and do any data-migration if
required (we have already done data-migration once in the past, when
we moved our plain, simple single node from one physical machine to
another).


So,

a)
Is it possible at all to introduce replication in our scenario?
If yes, what needs to be done to NOT LOSE our current existing data?

b)
Also, will "NetworkTopologyStrategy" work in our scenario (since
NetworkTopologyStrategy seems to be more robust)?


Brief pointers to above will give huge confidence-boosts in our endeavours.


Thanks and Regards,
Ajay


Re: Is replication possible with already existing data?

2015-10-07 Thread Ajay Garg
Hi Sean.

Thanks for the reply.

On Wed, Oct 7, 2015 at 10:13 PM,  <sean_r_dur...@homedepot.com> wrote:
> How many nodes are you planning to add?

I guess 2 more.

> How many replicas do you want?

1 (original) + 2 (replicas).
That makes it a total of 3 copies of every row of data.



> In general, there shouldn't be a problem adding nodes and then altering the 
> keyspace to change replication.

Great !!
I guess 
http://docs.datastax.com/en/cql/3.0/cql/cql_reference/alter_keyspace_r.html
will do the trick for changing schema-replication-details !!


> You will want to run repairs to stream the data to the new replicas.

Hmm.. we'll be really grateful if you could point us to a suitable
link for the above step.
If there is a nice-utility, we would be perfectly set up to start our
fun-exercise, consisting of following steps ::

a)
(As advised by you) Changing the schema, to allow a replication_factor of 3.

b)
(As advised by you) Duplicating the already-existing-data on the other 2 nodes.

c)
Thereafter, let Cassandra create a total of 3 copies for every row of
new-incoming-data.


Once again, thanks a ton for the help !!


Thanks and Regards,
Ajay


> You shouldn't need downtime or data migration -- this is the beauty of
> Cassandra.




>
>
> Sean Durity – Lead Cassandra Admin
>

> 
>
> The information in this Internet Email is confidential and may be legally 
> privileged. It is intended solely for the addressee. Access to this Email by 
> anyone else is unauthorized. If you are not the intended recipient, any 
> disclosure, copying, distribution or any action taken or omitted to be taken 
> in reliance on it, is prohibited and may be unlawful. When addressed to our 
> clients any opinions or advice contained in this Email are subject to the 
> terms and conditions expressed in any applicable governing The Home Depot 
> terms of business or client engagement letter. The Home Depot disclaims all 
> responsibility and liability for the accuracy and content of this attachment 
> and for any damages or losses arising from any inaccuracies, errors, viruses, 
> e.g., worms, trojan horses, etc., or other items of a destructive nature, 
> which may be contained in this attachment and shall not be liable for direct, 
> indirect, consequential or special damages in connection with this e-mail 
> message or its attachment.



-- 
Regards,
Ajay


Re: Possible to restore ENTIRE data from Cassandra-Schema in one go?

2015-09-15 Thread Ajay Garg
Thanks Mam for the reply.

I guess there is manual work needed to bring all the SSTables files
into one directory, so doesn't really solve the purpose I guess. So,
going the "vanilla" way might be simpler :)

Thanks anyways for the help !!!

Thanks and Regards,
Ajay

On Tue, Sep 15, 2015 at 11:34 AM, Neha Dave <nehajtriv...@gmail.com> wrote:
> Havent used it.. but u can try SSTaable Bulk Loader:
>
> http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html
>
> regards
> Neha
>
> On Tue, Sep 15, 2015 at 11:21 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:
>>
>> Hi All.
>>
>> We have a schema on one Cassandra-node, and wish to duplicate the
>> entire schema on another server.
>> Think of this a 2 clusters, each cluster containing one node.
>>
>> We have found the way to dump/restore schema-metainfo at ::
>>
>> https://dzone.com/articles/dumpingloading-schema
>>
>>
>> And dumping/restoring data at ::
>>
>>
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_takes_snapshot_t.html
>>
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html
>>
>>
>> For the restoring data step, it seems that restoring every "table"
>> requires a dedicated step.
>> So, if the schema has 100 "tables", we would need 100 steps.
>>
>>
>> Is it so? If yes, can the entire data be dumped/restored in one go?
>> Just asking, to save time, if it could :)
>>
>>
>>
>>
>> Thanks and Regards,
>> Ajay
>
>



-- 
Regards,
Ajay


Re: Getting intermittent errors while taking snapshot

2015-09-15 Thread Ajay Garg
Hi All.

Granting complete-permissions to the keyspace-folder
(/var/lib/cassandra/data/instamsg) fixed the issue.
Now, multiple, successive snapshot-commands run to completion fine.


sudo chmod -R 777 /var/lib/cassandra/data/instamsg



Thanks and Regards,
Ajay

On Tue, Sep 15, 2015 at 12:04 PM, Ajay Garg <ajaygargn...@gmail.com> wrote:
> Hi All.
>
> Taking snapshots sometimes works, sometimes don't.
> Following is the stacktrace whenever the process fails ::
>
>
> ######
> ajay@ajay-HP-15-Notebook-PC:/var/lib/cassandra/data/instamsg$ nodetool
> -h localhost -p 7199 snapshot instamsgRequested creating snapshot(s)
> for [instamsg] with snapshot name [1442298538121]
> error: 
> /var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/snapshots/1442298538121/instamsg-clients-ka-15-TOC.txt
> -> 
> /var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/instamsg-clients-ka-15-TOC.txt:
> Operation not permitted
> -- StackTrace --
> java.nio.file.FileSystemException:
> /var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/snapshots/1442298538121/instamsg-clients-ka-15-TOC.txt
> -> 
> /var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/instamsg-clients-ka-15-TOC.txt:
> Operation not permitted
> at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at 
> sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476)
> at java.nio.file.Files.createLink(Files.java:1086)
> at 
> org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:94)
> at 
> org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1842)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:2279)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2361)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2355)
> at org.apache.cassandra.db.Keyspace.snapshot(Keyspace.java:207)
> at 
> org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2388)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1466)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
> at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:828)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
>  

Getting intermittent errors while taking snapshot

2015-09-15 Thread Ajay Garg
Hi All.

Taking snapshots sometimes works, sometimes don't.
Following is the stacktrace whenever the process fails ::


##
ajay@ajay-HP-15-Notebook-PC:/var/lib/cassandra/data/instamsg$ nodetool
-h localhost -p 7199 snapshot instamsgRequested creating snapshot(s)
for [instamsg] with snapshot name [1442298538121]
error: 
/var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/snapshots/1442298538121/instamsg-clients-ka-15-TOC.txt
-> 
/var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/instamsg-clients-ka-15-TOC.txt:
Operation not permitted
-- StackTrace --
java.nio.file.FileSystemException:
/var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/snapshots/1442298538121/instamsg-clients-ka-15-TOC.txt
-> 
/var/lib/cassandra/data/instamsg/clients-b32f01b02eec11e5866887c3880d7c45/instamsg-clients-ka-15-TOC.txt:
Operation not permitted
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at 
sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476)
at java.nio.file.Files.createLink(Files.java:1086)
at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:94)
at 
org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1842)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:2279)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2361)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:2355)
at org.apache.cassandra.db.Keyspace.snapshot(Keyspace.java:207)
at 
org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:2388)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1466)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:828)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$251(TCPTransport.java:683)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$1/13812661.run(Unknown
Source)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav

Re: Not able to cqlsh on 2.1.9 on Ubuntu 14.04

2015-09-14 Thread Ajay Garg
Hi All.

Thanks for your replies.

a)
cqlsh  does not work either :(


b)
Following are the parameters as asked ::

listen_address: localhost
rpc_address: localhost

broadcast_rpc_address is not set.
According to the yaml file ::

# RPC address to broadcast to drivers and other Cassandra nodes. This cannot
# be set to 0.0.0.0. If left blank, this will be set to the value of
# rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must
# be set.
# broadcast_rpc_address: 1.2.3.4


c)
Following is the netstat-output, with process information ::

###
ajay@comp:~$ sudo netstat -apn | grep 9042
[sudo] password for admin:
tcp6   0  0 127.0.0.1:9042  :::*
LISTEN  10169/java
###


Kindly let me know what else we can try .. it is really driving us nuttsss :(

On Mon, Sep 14, 2015 at 9:40 PM, Jared Biel
<jared.b...@bolderthinking.com> wrote:
> Whoops, I accidentally pressed a hotkey and sent my message prematurely.
> Here's what netstat should look like with those settings:
>
> sudo netstat -apn | grep 9042
> tcp6   0  0 0.0.0.0:9042:::*LISTEN
> 21248/java
>
> -Jared
>
> On 14 September 2015 at 16:09, Jared Biel <jared.b...@bolderthinking.com>
> wrote:
>>
>> I assume "@ Of node" is ethX's IP address? Has cassandra been restarted
>> since changes were made to cassandra.yaml? The netstat output that you
>> posted doesn't look right; we use settings similar to what you've posted.
>> Here's what it looks like on one of our nodes.
>>
>>
>> -Jared
>>
>> On 14 September 2015 at 10:34, Ahmed Eljami <ahmed.elj...@gmail.com>
>> wrote:
>>>
>>> In cassanrda.yaml:
>>> listen_address:@ Of node
>>> rpc_address:0.0.0.0
>>>
>>> brodcast_rpc_address:@ Of node
>>>
>>> 2015-09-14 11:31 GMT+01:00 Neha Dave <nehajtriv...@gmail.com>:
>>>>
>>>> Try
>>>> >cqlsh 
>>>>
>>>> regards
>>>> Neha
>>>>
>>>> On Mon, Sep 14, 2015 at 3:53 PM, Ajay Garg <ajaygargn...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi All.
>>>>>
>>>>> We have setup a Ubuntu-14.04 server, and followed the steps exactly as
>>>>> per http://wiki.apache.org/cassandra/DebianPackaging
>>>>>
>>>>> Installation completes fine, Cassandra starts fine, however cqlsh does
>>>>> not work.
>>>>> We get the error ::
>>>>>
>>>>>
>>>>> ###
>>>>> ajay@comp:~$ cqlsh
>>>>> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
>>>>> error(None, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
>>>>> None")})
>>>>>
>>>>> ###
>>>>>
>>>>>
>>>>>
>>>>> Version-Info ::
>>>>>
>>>>>
>>>>> ###
>>>>> ajay@comp:~$ dpkg -l | grep cassandra
>>>>> ii  cassandra   2.1.9
>>>>>  all  distributed storage system for structured data
>>>>>
>>>>> ###
>>>>>
>>>>>
>>>>>
>>>>> The port "seems" to be opened fine.
>>>>>
>>>>>
>>>>> ###
>>>>> ajay@comp:~$ netstat -an | grep 9042
>>>>> tcp6   0  0 127.0.0.1:9042  :::*
>>>>> LISTEN
>>>>>
>>>>> ###
>>>>>
>>>>>
>>>>>
>>>>> Firewall-filters ::
>>>>>
>>>>>
>>>>> ###
>>>>> ajay@comp:~$ sudo

Re: Not able to cqlsh on 2.1.9 on Ubuntu 14.04

2015-09-14 Thread Ajay Garg
Hi All.

I re-established my server from scratch, and installed the 21x server.
Now, cqlsh works right out of the box.

When I had last setup the server, I had (accidentally) installed the
20x server on first attempt, removed it, and then installed the 21x
series server. Seems that caused some hidden problem.


I am heartfully grateful to everyone for bearing with me.


Thanks and Regards,
Ajay

On Tue, Sep 15, 2015 at 10:16 AM, Ajay Garg <ajaygargn...@gmail.com> wrote:
> Hi Jared.
>
> Thanks for your help.
>
> I made the config-changes.
> Also, I changed the seed (right now, we are just trying to get one
> instance up and running) ::
>
> 
> seed_provider:
> # Addresses of hosts that are deemed contact points.
> # Cassandra nodes use this list of hosts to find each other and learn
> # the topology of the ring.  You must change this if you are running
> # multiple nodes!
> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>   parameters:
>   # seeds is actually a comma-delimited list of addresses.
>   # Ex: ",,"
>   - seeds: "our.ip.address.here"
> 
>
>
>
>
> Following is the netstat output ::
>
> 
> ajay@comp:~$ sudo netstat -apn | grep 9042
> tcp6   0  0 0.0.0.0:9042:::*
> LISTEN  22469/java
> ####
>
>
>
> Still, when I try, we get ::
>
> 
> ajay@comp:~$ cqlsh our.ip.address.here
> Connection error: ('Unable to connect to any servers',
> {'our.ip.address.here': error(None, "Tried connecting to
> [('our.ip.address.here', 9042)]. Last error: None")})
> 
>
>
> :( :(
>
> On Mon, Sep 14, 2015 at 11:00 PM, Jared Biel
> <jared.b...@bolderthinking.com> wrote:
>> Is there a reason that you're setting listen_address and rpc_address to
>> localhost?
>>
>> listen_address doc: "the Right Thing is to use the address associated with
>> the hostname". So, set the IP address of this to eth0 for example. I believe
>> if it is set to localhost then you won't be able to form a cluster with
>> other nodes.
>>
>> rpc_address: this is the address to which clients will connect. I recommend
>> 0.0.0.0 here so clients can connect to IP address of the server as well as
>> localhost if they happen to reside on the same instance.
>>
>>
>> Here are all of the address settings from our config file. 192.168.1.10 is
>> the IP address of eth0 and broadcast_address is commented out.
>>
>> listen_address: 192.168.1.10
>> # broadcast_address: 1.2.3.4
>> rpc_address: 0.0.0.0
>> broadcast_rpc_address: 192.168.1.10
>>
>> Follow these directions to get up and running with the first node
>> (destructive process):
>>
>> 1. Stop cassandra
>> 2. Remove data from cassandra var directory (rm -rf /var/lib/cassandra/*)
>> 3. Make above changes to config file. Also set seeds to the eth0 IP address
>> 4. Start cassandra
>> 5. Set seeds in config file back to "" after cassandra is up and running.
>>
>> After following that process, you'll be able to connect to the node from any
>> host that can reach Cassandra's ports on that node ("cqlsh" command will
>> work.) To join more nodes to the cluster, follow the steps same steps as
>> above, except the seeds value to the IP address of an already running node.
>>
>> Regarding the empty "seeds" config entry: our configs are automated with
>> configuration management. During the node bootstrap process a script
>> performs the above. The reason that we set seeds back to empty is that we
>> don't want nodes coming up/down to cause the config file to change and thus
>> cassandra to restart needlessly. So far we haven't had any issues with seeds
>> being set to empty after a node has joined the cluster, but this may not be
>> the recommended way of doing things.
>>
>> -Jared
>>
>> On 14 September 2015 at 16:46, Ajay Garg <ajaygargn...@gmail.com> wrote:
>>>
>>> Hi All.
>>>
>>> Thanks for your replies.
>>>
>>> a)
>>> cqlsh  does not work either :(
>>>
>>>
>>> b)
>>> Following are the parameters as asked ::
>>>
>>> listen_address: localhost
>&

Re: Not able to cqlsh on 2.1.9 on Ubuntu 14.04

2015-09-14 Thread Ajay Garg
Hi Jared.

Thanks for your help.

I made the config-changes.
Also, I changed the seed (right now, we are just trying to get one
instance up and running) ::


seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring.  You must change this if you are running
# multiple nodes!
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  # seeds is actually a comma-delimited list of addresses.
  # Ex: ",,"
  - seeds: "our.ip.address.here"





Following is the netstat output ::

####
ajay@comp:~$ sudo netstat -apn | grep 9042
tcp6   0  0 0.0.0.0:9042:::*
LISTEN  22469/java




Still, when I try, we get ::

####
ajay@comp:~$ cqlsh our.ip.address.here
Connection error: ('Unable to connect to any servers',
{'our.ip.address.here': error(None, "Tried connecting to
[('our.ip.address.here', 9042)]. Last error: None")})



:( :(

On Mon, Sep 14, 2015 at 11:00 PM, Jared Biel
<jared.b...@bolderthinking.com> wrote:
> Is there a reason that you're setting listen_address and rpc_address to
> localhost?
>
> listen_address doc: "the Right Thing is to use the address associated with
> the hostname". So, set the IP address of this to eth0 for example. I believe
> if it is set to localhost then you won't be able to form a cluster with
> other nodes.
>
> rpc_address: this is the address to which clients will connect. I recommend
> 0.0.0.0 here so clients can connect to IP address of the server as well as
> localhost if they happen to reside on the same instance.
>
>
> Here are all of the address settings from our config file. 192.168.1.10 is
> the IP address of eth0 and broadcast_address is commented out.
>
> listen_address: 192.168.1.10
> # broadcast_address: 1.2.3.4
> rpc_address: 0.0.0.0
> broadcast_rpc_address: 192.168.1.10
>
> Follow these directions to get up and running with the first node
> (destructive process):
>
> 1. Stop cassandra
> 2. Remove data from cassandra var directory (rm -rf /var/lib/cassandra/*)
> 3. Make above changes to config file. Also set seeds to the eth0 IP address
> 4. Start cassandra
> 5. Set seeds in config file back to "" after cassandra is up and running.
>
> After following that process, you'll be able to connect to the node from any
> host that can reach Cassandra's ports on that node ("cqlsh" command will
> work.) To join more nodes to the cluster, follow the steps same steps as
> above, except the seeds value to the IP address of an already running node.
>
> Regarding the empty "seeds" config entry: our configs are automated with
> configuration management. During the node bootstrap process a script
> performs the above. The reason that we set seeds back to empty is that we
> don't want nodes coming up/down to cause the config file to change and thus
> cassandra to restart needlessly. So far we haven't had any issues with seeds
> being set to empty after a node has joined the cluster, but this may not be
> the recommended way of doing things.
>
> -Jared
>
> On 14 September 2015 at 16:46, Ajay Garg <ajaygargn...@gmail.com> wrote:
>>
>> Hi All.
>>
>> Thanks for your replies.
>>
>> a)
>> cqlsh  does not work either :(
>>
>>
>> b)
>> Following are the parameters as asked ::
>>
>> listen_address: localhost
>> rpc_address: localhost
>>
>> broadcast_rpc_address is not set.
>> According to the yaml file ::
>>
>> # RPC address to broadcast to drivers and other Cassandra nodes. This
>> cannot
>> # be set to 0.0.0.0. If left blank, this will be set to the value of
>> # rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address
>> must
>> # be set.
>> # broadcast_rpc_address: 1.2.3.4
>>
>>
>> c)
>> Following is the netstat-output, with process information ::
>>
>>
>> ###
>> ajay@comp:~$ sudo netstat -apn | grep 9042
>> [sudo] password for admin:
>> tcp6   0  0 127.0.0.1:9042  :::*
>> LISTEN  10169/java
>>
>> 

Possible to restore ENTIRE data from Cassandra-Schema in one go?

2015-09-14 Thread Ajay Garg
Hi All.

We have a schema on one Cassandra-node, and wish to duplicate the
entire schema on another server.
Think of this a 2 clusters, each cluster containing one node.

We have found the way to dump/restore schema-metainfo at ::

https://dzone.com/articles/dumpingloading-schema


And dumping/restoring data at ::

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_takes_snapshot_t.html
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html


For the restoring data step, it seems that restoring every "table"
requires a dedicated step.
So, if the schema has 100 "tables", we would need 100 steps.


Is it so? If yes, can the entire data be dumped/restored in one go?
Just asking, to save time, if it could :)




Thanks and Regards,
Ajay


Test Subject

2015-09-14 Thread Ajay Garg
Testing simple content, as my previous email bounced :(

-- 
Regards,
Ajay


Not able to cqlsh on 2.1.9 on Ubuntu 14.04

2015-09-14 Thread Ajay Garg
Hi All.

We have setup a Ubuntu-14.04 server, and followed the steps exactly as
per http://wiki.apache.org/cassandra/DebianPackaging

Installation completes fine, Cassandra starts fine, however cqlsh does not work.
We get the error ::

###
ajay@comp:~$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1':
error(None, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
None")})
###



Version-Info ::

###
ajay@comp:~$ dpkg -l | grep cassandra
ii  cassandra   2.1.9
 all  distributed storage system for structured data
###



The port "seems" to be opened fine.

###
ajay@comp:~$ netstat -an | grep 9042
tcp6   0  0 127.0.0.1:9042  :::*LISTEN
###



Firewall-filters ::

###
ajay@comp:~$ sudo iptables -L
[sudo] password for ajay:
Chain INPUT (policy ACCEPT)
target prot opt source   destination
ACCEPT all  --  anywhere anywhere state
RELATED,ESTABLISHED
ACCEPT tcp  --  anywhere anywhere tcp dpt:ssh
DROP   all  --  anywhere anywhere

Chain FORWARD (policy ACCEPT)
target prot opt source   destination

Chain OUTPUT (policy ACCEPT)
target prot opt source   destination
###



Even telnet fails :(

###
ajay@comp:~$ telnet localhost 9042
Trying 127.0.0.1...
###



Any ideas please?? We have been stuck on this for a good 3 hours now :(



Thanks and Regards,
Ajay


Re: Can't connect to Cassandra server

2015-07-19 Thread Ajay
Try with the correct IP address as below:

cqlsh 192.248.15.219 -u sinmin -p xx

CQL documentation -
http://docs.datastax.com/en/cql/3.0/cql/cql_reference/cqlsh.html

On Sun, Jul 19, 2015 at 2:00 PM, Chamila Wijayarathna 
cdwijayarat...@gmail.com wrote:

 Hello all,

 After starting cassandra, I tried to connect to cassandra from cqlsh and
 java, but it fails to do so.

 Following is the error I get while trying to connect to cqlsh.

 cqlsh -u sinmin -p xx
 Connection error: ('Unable to connect to any servers', {'127.0.0.1':
 error(111, Tried connecting to [('127.0.0.1', 9042)]. Last error:
 Connection refused)})

 I have set listen_address and rpc_address in cassandra.yaml to the ip
 address of server address like follows.

 listen_address:192.248.15.219
 rpc_address:192.248.15.219

 Following is what I found from cassandra system.log.

 https://gist.githubusercontent.com/cdwijayarathna/a14586a9e39a943f89a0/raw/system%20log

 Following is the netstat result I got.

 maduranga@ubuntu:/var/log/cassandra$ netstat
 Active Internet connections (w/o servers)
 Proto Recv-Q Send-Q Local Address   Foreign Address State

 tcp0  0 ubuntu:ssh  103.21.166.35:54417
 ESTABLISHED
 tcp0  0 ubuntu:1522 ubuntu:30820
  ESTABLISHED
 tcp0  0 ubuntu:30820ubuntu:1522
 ESTABLISHED
 tcp0256 ubuntu:ssh  175.157.41.209:42435
  ESTABLISHED
 Active UNIX domain sockets (w/o servers)
 Proto RefCnt Flags   Type   State I-Node   Path
 unix  9  [ ] DGRAM7936 /dev/log
 unix  3  [ ] STREAM CONNECTED 11737
 unix  3  [ ] STREAM CONNECTED 11736
 unix  3  [ ] STREAM CONNECTED 10949
  /var/run/dbus/system_bus_socket
 unix  3  [ ] STREAM CONNECTED 10948
 unix  2  [ ] DGRAM10947
 unix  2  [ ] STREAM CONNECTED 10801
 unix  3  [ ] STREAM CONNECTED 10641
 unix  3  [ ] STREAM CONNECTED 10640
 unix  3  [ ] STREAM CONNECTED 10444
  /var/run/dbus/system_bus_socket
 unix  3  [ ] STREAM CONNECTED 10443
 unix  3  [ ] STREAM CONNECTED 10437
  /var/run/dbus/system_bus_socket
 unix  3  [ ] STREAM CONNECTED 10436
 unix  3  [ ] STREAM CONNECTED 10430
  /var/run/dbus/system_bus_socket
 unix  3  [ ] STREAM CONNECTED 10429
 unix  2  [ ] DGRAM10424
 unix  3  [ ] STREAM CONNECTED 10422
  /var/run/dbus/system_bus_socket
 unix  3  [ ] STREAM CONNECTED 10421
 unix  2  [ ] DGRAM10420
 unix  2  [ ] STREAM CONNECTED 10215
 unix  2  [ ] STREAM CONNECTED 10296
 unix  2  [ ] STREAM CONNECTED 9988
 unix  2  [ ] DGRAM9520
 unix  3  [ ] STREAM CONNECTED 8769
 /var/run/dbus/system_bus_socket
 unix  3  [ ] STREAM CONNECTED 8768
 unix  2  [ ] DGRAM8753
 unix  2  [ ] DGRAM9422
 unix  3  [ ] STREAM CONNECTED 7000
 @/com/ubuntu/upstart
 unix  3  [ ] STREAM CONNECTED 8485
 unix  2  [ ] DGRAM7947
 unix  3  [ ] STREAM CONNECTED 6712
 /var/run/dbus/system_bus_socket
 unix  3  [ ] STREAM CONNECTED 6711
 unix  3  [ ] STREAM CONNECTED 7760
 /var/run/dbus/system_bus_socket
 unix  3  [ ] STREAM CONNECTED 7759
 unix  3  [ ] STREAM CONNECTED 7754
 unix  3  [ ] STREAM CONNECTED 7753
 unix  3  [ ] DGRAM7661
 unix  3  [ ] DGRAM7660
 unix  3  [ ] STREAM CONNECTED 6490
 @/com/ubuntu/upstart
 unix  3  [ ] STREAM CONNECTED 6475

 What is the issue here? Why I can't connect to Cassandra server? How can I
 fix this?

 Thank You!

 --
 *Chamila Dilshan Wijayarathna,*
 Software Engineer
 Mobile:(+94)788193620
 WSO2 Inc., http://wso2.com/




Re: Cassandra counters

2015-07-10 Thread Ajay
Any pointers on this?.

In 2.1, when updating the counter with UNLOGGED batch using timestamp isn't
safe as other column update with consistency level (with timestamp counter
update can be idempotent? ).

Thanks
Ajay

On 09-Jul-2015 11:47 am, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 What is the accuracy improvement of counter in 2.1 over 2.0?

 This below post, it mentioned 2.0.x issues fixed in 2.1 and perfomance
improvement.

http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

 But how accurate are the counter 2.1.x or any known issues in 2.1 using
UNLOGGED batch for counter update with timestamp?

 Thanks
 Ajay


Cassandra counters

2015-07-09 Thread Ajay
Hi,

What is the accuracy improvement of counter in 2.1 over 2.0?

This below post, it mentioned 2.0.x issues fixed in 2.1 and perfomance
improvement.
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

But how accurate are the counter 2.1.x or any known issues in 2.1 using
UNLOGGED batch for counter update?

Thanks
Ajay


Re: Hbase vs Cassandra

2015-06-08 Thread Ajay
 started is easier with Cassandra. For HBase you need to run HDFS
and Zookeeper, etc.
* I've heard lots of anecdotes about Cassandra working nicely with small
cluster ( 50 nodes) and quick degenerating above that.
* HBase does not have a query language (but you can use Phoenix for full
SQL support)
* HBase does not have secondary indexes (having an eventually consistent
index, similar to what Cassandra has, is easy in HBase, but making it as
consistent as the rest of HBase is hard)

Thanks
Ajay



 On May 29, 2015, at 12:09 PM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).

 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else

 Thanks
 Ajay




Re: Hbase vs Cassandra

2015-06-08 Thread Ajay
Hi Jens,

All the points listed weren't from me. I posted the HBase Vs Cassandra in
both the forums and consolidated here for the discussion.


On Mon, Jun 8, 2015 at 2:27 PM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 Some minor comments:

  2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
 for Cassandra but it doesn't support vnodes.

 Not entirely sure what you mean here, but we ran Cloudera for a while and
 Cloudera Manager was buggy and hard to debug. Overall, our experience
 wasn't very good. This was definitely also due to us not knowing how all
 the Cloudera packages were configured.


* This is the one of the response I got it from HBase forum. Datastax
OpsCenter is there but seems it doesn't support the latest Cassandra
versions (we tried it couple of times and there were bugs too)*


  HBase is always consistent. Machine outages lead to inability to read
 or write data on that machine. With Cassandra you can always write.

 Sort of true. You can decide write consistency and throw an exception if
 write didn't go through consistently. However, do note that Cassandra will
 never rollback failed writes which means writes aren't atomic (as in ACID).

 * If I understand correctly, you mean when we write with QUORUM and
Cassandra writes to few machines and fails to write to few machines and
throws exception if it doesn't satisfy QUORUM, leaving it inconsistent and
doesn't rollback?. *


 We chose Cassandra over HBase mostly due to ease of managability. We are a
 small team, and my feeling is that you will want dedicated people taking
 care of a Hadoop cluster if you are going down the HBase path. A Cassandra
 cluster can be handled by a single engineer and is, in my opinion, easier
 to maintain.


* This is the most popular reason for Cassandra over HBase. But this
alone is not a sufficient driver. *


 Cheers,
 Jens

 On Mon, Jun 8, 2015 at 9:59 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi All,

 Thanks for all the input. I posted the same question in HBase forum and
 got more response.

 Posting the consolidated list here.

 Our case is that a central team builds and maintain the platform
 (Cassandra as a service). We have couple of usecases which fits Cassandra
 like time-series data. But as a platform team, we need to know more
 features and usecases which fits or best handled in Cassandra. Also to
 understand the usecases where HBase performs better (we might need to have
 it as a service too).

 *Cassandra:*

 1) From 2013 both can still be relevant:
 http://www.pythian.com/blog/watch-hbase-vs-cassandra/

 2) Here are some use cases from PlanetCassandra.org of companies who
 chose Cassandra over HBase after evaluation, or migrated to Cassandra from
 HBase.
 The eComNext interview cited on the page touches on time-series data;
 http://planetcassandra.org/hbase-to-cassandra-migration/

 3) From googling, the most popular advantages for Cassandra over HBase is
 easy to deploy, maintain  monitor and no single point of failure.

 4) From our six months research and POC experience in Cassandra, CQL is
 pretty limited. Though CQL is targeted for Real time Read and Write, there
 are cases where need to pull out data differently and we are OK with little
 more latency. But Cassandra doesn't support that. We need MapReduce or
 Spark for those. Then the debate starts why Cassandra and why not HBase if
 we need Hadoop/Spark for MapReduce.

 Expected a few more technical features/usecases that is best handled by
 Cassandra (and how it works).

 *HBase:*

 1) As for the #4 you might be interested in reading
 https://aphyr.com/posts/294-call-me-maybe-cassandra
 Not sure if there is comparable article about HBase (anybody knows?) but
 it can give you another perspective about what else to keep an eye on
 regarding these systems.

 2) See http://hbase.apache.org/book.html#perf.network.call_me_maybe

 3) http://blog.parsely.com/post/1928/cass/
 *Anyone have any comments on this?*

 4) 1. No killer features comparing to hbase
 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
 for Cassandra but it doesn't support vnodes.
 3. Rumors say it fast when it works;) the reason- it can silently drop
 data you try to write.
 4. Timeseries is a nightmare. The easiest approach is just replicate data
 to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala

 5)  Migrated from Cassandra to HBase.
 Reasons:
 Scan is fast with HBase. It fits better with time series data model.
 Please look at opentsdb. Cassandra models it with large rows.
 Server side filtering. You can use to filter some of your time series
 data on the server side.
 Hbase has a better integration with hadoop in general. We had to write
 our own bulk loader using mapreduce for cassandra. hbase has already had a
 tool for that. There is a nice integration with flume and kite.
 High availability didnet matter for us. 10 secs down is fine for our use
 cases.HBase started to support eventually

Hbase vs Cassandra

2015-05-29 Thread Ajay
Hi,

I need some info on Hbase vs Cassandra as a data store (in general plus
specific to time series data).

The comparison in the following helps:
1: features
2: deployment and monitoring
3: performance
4: anything else

Thanks
Ajay


Re: Caching the PreparedStatement (Java driver)

2015-05-15 Thread Ajay
Hi Joseph,

Java driver currently caches the prepared statements but using a weak
reference i.e the cache will hold it as long the client code uses it. So in
turn means that we need to cache the same.

But I am also not sure of what happens when a cached prepared statement is
executed after cassandra nodes restart. Does the server prepared statements
cache is persisted or in memory?. If it is in memory, how do we handle
stale prepared statement in the cache?

Thanks
Ajay


On Fri, May 15, 2015 at 6:28 PM, ja jaa...@gmail.com wrote:

 Hi,

 Isn't it a good to have feature for the java driver to maintain a cache of
 PreparedStatements (PS) . Any reason why it's left to the application to do
 the same? . I am currently implementing a cache of PS that is loaded at app
 startup, but how do i ensure this cache is always good to use? . Say,
 there's a restart on the Cassandra server side, this cache would be stale
 and I assume the next use of a PS from cache would fail. Any way to recover
 from this.

 Thanks,
 Joseph

 On Sunday, March 1, 2015 at 12:46:14 AM UTC+5:30, Vishy Kasar wrote:


 On Feb 28, 2015, at 4:25 AM, Ajay ajay@gmail.com wrote:

 Hi,

 My earlier question was whether it is safe to cache PreparedStatement
 (using Java driver) in the client side for which I got it confirmed by
 Olivier.

 Now the question is do we really need to cache the PreparedStatement in
 the client side?.

 Lets take a scenario as below:

 1) Client fires a REST query SELECT * from Test where Pk = val1;
 2) REST service prepares a statement SELECT * from Test where Pk = ?
 3) Executes the PreparedStatement by setting the values.
 4) Assume we don't cache the PreparedStatement
 5) Client fires another REST query SELECT * from Test where Pk = val2;
 6) REST service prepares a statement SELECT * from Test where Pk = ?
 7) Executes the PreparedStatement by setting the values.


 You should avoid re-preparing the statement (step 6 above). When you
 create a prepared statement, a round trip to server is involved. So you
 should create it once and reuse it. You can bind it with different values
 and execute the bound statement each time.

 In this case, is there any benefit of using the PreparedStatement?

 From the Java driver code, the Session.prepare(query) doesn't check
 whether a similar query was prepared earlier or not. It directly call the
 server passing the query. The return from the server is a PreparedId. Do
 the server maintains a cache of Prepared queries or it still perform the
 all the steps to prepare a query if the client calls to prepare the same
 query more than once (using the same Session and Cluster instance which I
 think doesn't matter)?.

 Thanks
 Ajay


 On Sat, Feb 28, 2015 at 9:17 AM, Ajay ajay@gmail.com wrote:

 Thanks Olivier.

 Most of the REST query calls would come from other applications to
 write/read to/from Cassandra which means most queries from an application
 would be same (same column families but different  values).

 Thanks
 Ajay
 On 28-Feb-2015 6:05 am, Olivier Michallat olivier@datastax.com
 wrote:

 Hi Ajay,

 Yes, it is safe to hold a reference to PreparedStatement instances in
 your client code. If you always run the same pre-defined statements, you
 can store them as fields in your resource classes.

 If your statements are dynamically generated (for example, inserting
 different subsets of the columns depending on what was provided in the REST
 payload), your caching approach is valid. When you evict a
 PreparedStatement from your cache, the driver will also remove the
 corresponding id from its internal cache. If you re-prepare it later it
 might still be in the Cassandra-side cache, but that is not a problem.

 One caveat: you should be reasonably confident that your prepared
 statements will be reused. If your query strings are always different,
 preparing will bring no advantage.

 --
 Olivier Michallat
 Driver  tools engineer, DataStax

 On Fri, Feb 27, 2015 at 7:04 PM, Ajay ajay@gmail.com wrote:

 Hi,

 We are building REST APIs for Cassandra using the Cassandra Java
 Driver.

 So as per the below guidlines from the documentation, we are caching
 the Cluster instance (per cluster) and the Session instance (per keyspace)
 as they are multi thread safe.

 http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/fourSimpleRules.html

 As the Cluster and Session instance(s) are cached in the application
 already and also as the PreparedStatement provide better performance, we
 thought to build the PreparedStatement for REST query implicitly (as REST
 calls are stateless) and cache the PreparedStatemen. Whenever a REST query
 is invoked, we look for a PreparedStatement in the cache and create and 
 put
 it in the cache if it doesn't exists. (The cache is a in-memory fixed size
 LRU based).

 Is a safe approach to cache PreparedStatement in the client side?.
 Looking at the Java driver code, the Cluster class stores the
 PreparedStatements

Re: Hive support on Cassandra

2015-05-07 Thread Ajay
Thanks everyone.

Basically we are looking at Hive because it supports advanced queries (CQL
is limited to the data model).

Does Stratio supports similar to Hive?

Thanks
Ajay


On Thu, May 7, 2015 at 10:33 PM, Andres de la Peña adelap...@stratio.com
wrote:

 You may also find interesting https://github.com/Stratio/crossdata. This
 project provides batch and streaming capabilities for Cassandra and others
 databases though a SQL-like language.

 Disclaimer: I am an employee of Stratio

 2015-05-07 17:29 GMT+02:00 l...@airstreamcomm.net:

 You might also look at Apache Drill, which has support (I think alpha)
 for ANSI SQL queries against Cassandra if that would suit your needs.


  On May 6, 2015, at 12:57 AM, Ajay ajay.ga...@gmail.com wrote:
 
  Hi,
 
  Does Apache Cassandra (not DSE) support Hive Integration?
 
  I found couple of open source efforts but nothing is available
 currently.
 
  Thanks
  Ajay





 --

 Andrés de la Peña


 http://www.stratio.com/
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón, Madrid
 Tel: +34 91 352 59 42 // *@stratiobd https://twitter.com/StratioBD*



Hive support on Cassandra

2015-05-05 Thread Ajay
Hi,

Does Apache Cassandra (not DSE) support Hive Integration?

I found couple of open source efforts but nothing is available currently.

Thanks
Ajay


When to use STCS/DTCS/LCS

2015-04-08 Thread Ajay
Hi,

What are the guidelines on when to use STCS/DTCS/LCS?. Most preferred way
to test it with each of them and find the best fit. But is there some
guidelines or best practices (out of experience) which one to use when?

Thanks
Ajay


Re: Availability testing of Cassandra nodes

2015-04-08 Thread Ajay
Adding Java driver forum.

Even we like to know more on this.

-
Ajay

On Wed, Apr 8, 2015 at 8:15 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Just a couple of quick comments:

 1. The driver is supposed to be doing availability and load balancing
 already.
 2. If your cluster is lightly loaded, it isn't necessary to be so precise
 with load balancing.
 3. If your cluster is heavily loaded, it won't help. Solution is to expand
 your cluster so that precise balancing of requests (beyond what the driver
 does) is not required.

 Is there anything special about your use case that you feel is worth the
 extra treatment?

 If you are having problems with the driver balancing requests and properly
 detecting available nodes or see some room for improvement, make sure to
 the issues so that they can be fixed.


 -- Jack Krupansky

 On Wed, Apr 8, 2015 at 10:31 AM, Jiri Horky ho...@avast.com wrote:

 Hi all,

 we are thinking of how to best proceed with availability testing of
 Cassandra nodes. It is becoming more and more apparent that it is rather
 complex task. We thought that we should try to read and write to each
 cassandra node to monitoring keyspace with a unique value with low
 TTL. This helps to find an issue but it also triggers flapping of
 unaffected hosts, as the key of the value which is beining inserted
 sometimes belongs to an affected host and sometimes not. Now, we could
 calculate the right value to insert so we can be sure it will hit the
 host we are connecting to, but then, you have replication factor and
 consistency level, so you can not be really sure that it actually tests
 ability of the given host to write values.

 So we ended up thinking that the best approach is to connect to each
 individual host, read some system keyspace (which might be on a
 different disk drive...), which should be local, and then check several
 JMX values that could indicate an error + JVM statitics (full heap, gc
 overhead). Moreover, we will more monitor our applications that are
 using cassandra (with mostly datastax driver) and try to get fail node
 information from them.

 How others do the testing?

 Jirka H.





Re: Stable cassandra build for production usage

2015-03-17 Thread Ajay
Hi,

Now that 2.0.13 is out, I don't see nodetool cleanup issue(
https://issues.apache.org/jira/browse/CASSANDRA-8718) been fixed yet. The
bug show priority Minor. Anybody facing this issue?.

Thanks
Ajay

On Thu, Mar 12, 2015 at 11:41 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Mar 12, 2015 at 10:50 AM, Ajay ajay.ga...@gmail.com wrote:

 Please suggest what is the best option in this for production deployment
 in EC2 given that we are deploying Cassandra cluster for the 1st time (so
 likely that we add more data centers/nodes and schema changes in the
 initial few months)


 Voting for 2.0.13 is in process. I'd wait for that. But I don't need
 OpsCenter.

 =Rob




Re: Stable cassandra build for production usage

2015-03-17 Thread Ajay
Yes we see https://issues.apache.org/jira/browse/CASSANDRA-8716 in our
testing

Thanks
Ajay

On Tue, Mar 17, 2015 at 3:20 PM, Marcus Eriksson krum...@gmail.com wrote:

 Do you see the segfault or do you see
 https://issues.apache.org/jira/browse/CASSANDRA-8716 ?

 On Tue, Mar 17, 2015 at 10:34 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 Now that 2.0.13 is out, I don't see nodetool cleanup issue(
 https://issues.apache.org/jira/browse/CASSANDRA-8718) been fixed yet.
 The bug show priority Minor. Anybody facing this issue?.

 Thanks
 Ajay

 On Thu, Mar 12, 2015 at 11:41 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Thu, Mar 12, 2015 at 10:50 AM, Ajay ajay.ga...@gmail.com wrote:

 Please suggest what is the best option in this for production
 deployment in EC2 given that we are deploying Cassandra cluster for the 1st
 time (so likely that we add more data centers/nodes and schema changes in
 the initial few months)


 Voting for 2.0.13 is in process. I'd wait for that. But I don't need
 OpsCenter.

 =Rob







Re: Adding a Cassandra node using OpsCenter

2015-03-12 Thread Ajay
Is there a separate forum for Opscenter?

Thanks
Ajay
On 11-Mar-2015 4:16 pm, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 While adding a Cassandra node using OpsCenter (which is recommended), the
 versions of Cassandra (Datastax community edition) shows only 2.0.9 and not
 later versions in 2.0.x. Is there a reason behind it? 2.0.9 is recommended
 than 2.0.11?

 Thanks
 Ajay



Re: Stable cassandra build for production usage

2015-03-12 Thread Ajay
Hi,

We did our research using 2.0.11 version. While preparing for the
production deployment, found out the following issues:

1) 2.0.12 has nodetool cleanup issue -
https://issues.apache.org/jira/browse/CASSANDRA-8718
2) 2.0.11 has nodetool issue -
https://issues.apache.org/jira/browse/CASSANDRA-8548
3) OpsCenter 5.1.0 supports only - 2.0.9 and not later 2.0.x -
https://issues.apache.org/jira/browse/CASSANDRA-8072
4) 2.0.9 has schema refresh issue -
https://issues.apache.org/jira/browse/CASSANDRA-7734

Please suggest what is the best option in this for production deployment in
EC2 given that we are deploying Cassandra cluster for the 1st time (so
likely that we add more data centers/nodes and schema changes in the
initial few months)

Thanks
Ajay

On Thu, Jan 1, 2015 at 9:49 PM, Neha Trivedi nehajtriv...@gmail.com wrote:

 Use 2.0.11 for production

 On Wed, Dec 31, 2014 at 11:50 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Wed, Dec 31, 2014 at 8:38 AM, Ajay ajay.ga...@gmail.com wrote:

 For my research and learning I am using Cassandra 2.1.2. But I see
 couple of mail threads going on issues in 2.1.2. So what is the stable or
 popular build for production in Cassandra 2.x series.

 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

 =Rob





Re: Steps to do after schema changes

2015-03-12 Thread Ajay
Thanks Mark.

-
Ajay
On 12-Mar-2015 11:08 pm, Mark Reddy mark.l.re...@gmail.com wrote:

 It's always good to run nodetool describecluster after a schema change,
 this will show you all the nodes in your cluster and what schema version
 they have. If they have different versions you have a schema disagreement
 and should follow this guide to resolution:
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_handle_schema_disagree_t.html

 Regards,
 Mark

 On 12 March 2015 at 05:47, Phil Yang ud1...@gmail.com wrote:

 Usually, you have nothing to do. Changes will be synced to every nodes
 automatically.

 2015-03-12 13:21 GMT+08:00 Ajay ajay.ga...@gmail.com:

 Hi,

 Are there any steps to do (like nodetool or restart node) or any
 precautions after schema changes are done in a column family say adding a
 new column or modifying any table properties?

 Thanks
 Ajay




 --
 Thanks,
 Phil Yang





Re: Adding a Cassandra node using OpsCenter

2015-03-12 Thread Ajay
Thanks Nick.

Does it mean that only adding a new node with 2.0.10 or later is a
problem?. If a new node added manually can be monitored from Opscenter?

Thanks
Ajay
On 12-Mar-2015 10:19 pm, Nick Bailey n...@datastax.com wrote:

 There isn't an OpsCenter specific mailing list no.

 To answer your question, the reason OpsCenter provisioning doesn't support
 2.0.10 and 2.0.11 is due to
 https://issues.apache.org/jira/browse/CASSANDRA-8072.

 That bug unfortunately prevents OpsCenter provisioning from working
 correctly, but isn't serious outside of provisioning. OpsCenter may be able
 to come up with a workaround but at the moment those versions are
 unsupported. Sorry for inconvenience.

 -Nick

 On Thu, Mar 12, 2015 at 9:18 AM, Ajay ajay.ga...@gmail.com wrote:

 Is there a separate forum for Opscenter?

 Thanks
 Ajay
 On 11-Mar-2015 4:16 pm, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 While adding a Cassandra node using OpsCenter (which is recommended),
 the versions of Cassandra (Datastax community edition) shows only 2.0.9 and
 not later versions in 2.0.x. Is there a reason behind it? 2.0.9 is
 recommended than 2.0.11?

 Thanks
 Ajay





Adding a Cassandra node using OpsCenter

2015-03-11 Thread Ajay
Hi,

While adding a Cassandra node using OpsCenter (which is recommended), the
versions of Cassandra (Datastax community edition) shows only 2.0.9 and not
later versions in 2.0.x. Is there a reason behind it? 2.0.9 is recommended
than 2.0.11?

Thanks
Ajay


Steps to do after schema changes

2015-03-11 Thread Ajay
Hi,

Are there any steps to do (like nodetool or restart node) or any
precautions after schema changes are done in a column family say adding a
new column or modifying any table properties?

Thanks
Ajay


Re: Optimal Batch size (Unlogged) for Java driver

2015-03-02 Thread Ajay
I have a column family with 15 columns where there are timestamp,
timeuuid,  few text fields and rest int  fields.  If I calculate the size
of its column name  and it's value and divide 5kb (recommended max size for
batch) with the value,  I get result as 12. Is it correct?. Am I missing
something?

Thanks
Ajay
On 02-Mar-2015 12:13 pm, Ankush Goyal ank...@gmail.com wrote:

 Hi Ajay,

 I would suggest, looking at the approximate size of individual elements in
 the batch, and based on that compute max size (chunk size).

 Its not really a straightforward calculation, so I would further suggest
 making that chunk size a runtime parameter that you can tweak and play
 around with until you reach stable state.

 On Sunday, March 1, 2015 at 10:06:55 PM UTC-8, Ajay Garga wrote:

 Hi,

 I am looking at a way to compute the optimal batch size in the client
 side similar to the below mentioned bug in the server side (generic as we
 are exposing REST APIs for Cassandra, the column family and the data are
 different each request).

 https://issues.apache.org/jira/browse/CASSANDRA-6487
 https://www.google.com/url?q=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-6487sa=Dsntz=1usg=AFQjCNGOSliZnS1idXqTHXIr7aNfEN3mMg

 How do we compute(approximately using ColumnDefintions or ColumnMetadata)
 the size of a row of a column family from the client side using Cassandra
 Java driver?

 Thanks
 Ajay

  To unsubscribe from this group and stop receiving emails from it, send an
 email to java-driver-user+unsubscr...@lists.datastax.com.



Re: Optimal Batch size (Unlogged) for Java driver

2015-03-02 Thread Ajay
Hi Ankush,

We are already using Prepared statement and our case is a time series data
as well.

Thanks
Ajay
On 02-Mar-2015 10:00 pm, Ankush Goyal ank...@gmail.com wrote:

 Ajay,

 First of all, I would recommend using PreparedStatements, so you only
 would be sending the variable bound arguments over the wire. Second, I
 think that 5kb limit for WARN is too restrictive, and you could tune that
 on cassandra server side. I think if all you have is 15 columns (as long as
 their values are sanitized and do not go over certain limits), it should be
 fine to send all of them over at the same time. Chunking is necessary, when
 you have time-series type data (for writes) OR you might be reading a lot
 of data via IN query.

 On Monday, March 2, 2015 at 7:55:18 AM UTC-8, Ajay Garga wrote:

 I have a column family with 15 columns where there are timestamp,
 timeuuid,  few text fields and rest int  fields.  If I calculate the size
 of its column name  and it's value and divide 5kb (recommended max size for
 batch) with the value,  I get result as 12. Is it correct?. Am I missing
 something?

 Thanks
 Ajay
 On 02-Mar-2015 12:13 pm, Ankush Goyal ank...@gmail.com wrote:

 Hi Ajay,

 I would suggest, looking at the approximate size of individual elements
 in the batch, and based on that compute max size (chunk size).

 Its not really a straightforward calculation, so I would further suggest
 making that chunk size a runtime parameter that you can tweak and play
 around with until you reach stable state.

 On Sunday, March 1, 2015 at 10:06:55 PM UTC-8, Ajay Garga wrote:

 Hi,

 I am looking at a way to compute the optimal batch size in the client
 side similar to the below mentioned bug in the server side (generic as we
 are exposing REST APIs for Cassandra, the column family and the data are
 different each request).

 https://issues.apache.org/jira/browse/CASSANDRA-6487
 https://www.google.com/url?q=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-6487sa=Dsntz=1usg=AFQjCNGOSliZnS1idXqTHXIr7aNfEN3mMg

 How do we compute(approximately using ColumnDefintions or
 ColumnMetadata) the size of a row of a column family from the client side
 using Cassandra Java driver?

 Thanks
 Ajay

  To unsubscribe from this group and stop receiving emails from it, send
 an email to java-driver-us...@lists.datastax.com.

  To unsubscribe from this group and stop receiving emails from it, send
 an email to java-driver-user+unsubscr...@lists.datastax.com.



Optimal Batch size (Unlogged) for Java driver

2015-03-01 Thread Ajay
Hi,

I am looking at a way to compute the optimal batch size in the client side
similar to the below mentioned bug in the server side (generic as we are
exposing REST APIs for Cassandra, the column family and the data are
different each request).

https://issues.apache.org/jira/browse/CASSANDRA-6487

How do we compute(approximately using ColumnDefintions or ColumnMetadata)
the size of a row of a column family from the client side using Cassandra
Java driver?

Thanks
Ajay


Re: Caching the PreparedStatement (Java driver)

2015-02-28 Thread Ajay
Hi,

My earlier question was whether it is safe to cache PreparedStatement
(using Java driver) in the client side for which I got it confirmed by
Olivier.

Now the question is do we really need to cache the PreparedStatement in the
client side?.

Lets take a scenario as below:

1) Client fires a REST query SELECT * from Test where Pk = val1;
2) REST service prepares a statement SELECT * from Test where Pk = ?
3) Executes the PreparedStatement by setting the values.
4) Assume we don't cache the PreparedStatement
5) Client fires another REST query SELECT * from Test where Pk = val2;
6) REST service prepares a statement SELECT * from Test where Pk = ?
7) Executes the PreparedStatement by setting the values.

In this case, is there any benefit of using the PreparedStatement?

From the Java driver code, the Session.prepare(query) doesn't check
whether a similar query was prepared earlier or not. It directly call the
server passing the query. The return from the server is a PreparedId. Do
the server maintains a cache of Prepared queries or it still perform the
all the steps to prepare a query if the client calls to prepare the same
query more than once (using the same Session and Cluster instance which I
think doesn't matter)?.

Thanks
Ajay


On Sat, Feb 28, 2015 at 9:17 AM, Ajay ajay.ga...@gmail.com wrote:

 Thanks Olivier.

 Most of the REST query calls would come from other applications to
 write/read to/from Cassandra which means most queries from an application
 would be same (same column families but different  values).

 Thanks
 Ajay
 On 28-Feb-2015 6:05 am, Olivier Michallat 
 olivier.michal...@datastax.com wrote:

 Hi Ajay,

 Yes, it is safe to hold a reference to PreparedStatement instances in
 your client code. If you always run the same pre-defined statements, you
 can store them as fields in your resource classes.

 If your statements are dynamically generated (for example, inserting
 different subsets of the columns depending on what was provided in the REST
 payload), your caching approach is valid. When you evict a
 PreparedStatement from your cache, the driver will also remove the
 corresponding id from its internal cache. If you re-prepare it later it
 might still be in the Cassandra-side cache, but that is not a problem.

 One caveat: you should be reasonably confident that your prepared
 statements will be reused. If your query strings are always different,
 preparing will bring no advantage.

 --

 Olivier Michallat

 Driver  tools engineer, DataStax

 On Fri, Feb 27, 2015 at 7:04 PM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 We are building REST APIs for Cassandra using the Cassandra Java Driver.

 So as per the below guidlines from the documentation, we are caching the
 Cluster instance (per cluster) and the Session instance (per keyspace) as
 they are multi thread safe.

 http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/fourSimpleRules.html

 As the Cluster and Session instance(s) are cached in the application
 already and also as the PreparedStatement provide better performance, we
 thought to build the PreparedStatement for REST query implicitly (as REST
 calls are stateless) and cache the PreparedStatemen. Whenever a REST query
 is invoked, we look for a PreparedStatement in the cache and create and put
 it in the cache if it doesn't exists. (The cache is a in-memory fixed size
 LRU based).

 Is a safe approach to cache PreparedStatement in the client side?.
 Looking at the Java driver code, the Cluster class stores the
 PreparedStatements as a weak reference (to rebuild when a node is down or
 a  new node added).

 Thanks
 Ajay

 To unsubscribe from this group and stop receiving emails from it, send
 an email to java-driver-user+unsubscr...@lists.datastax.com.


  To unsubscribe from this group and stop receiving emails from it, send
 an email to java-driver-user+unsubscr...@lists.datastax.com.




Caching the PreparedStatement (Java driver)

2015-02-27 Thread Ajay
Hi,

We are building REST APIs for Cassandra using the Cassandra Java Driver.

So as per the below guidlines from the documentation, we are caching the
Cluster instance (per cluster) and the Session instance (per keyspace) as
they are multi thread safe.
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/fourSimpleRules.html

As the Cluster and Session instance(s) are cached in the application
already and also as the PreparedStatement provide better performance, we
thought to build the PreparedStatement for REST query implicitly (as REST
calls are stateless) and cache the PreparedStatemen. Whenever a REST query
is invoked, we look for a PreparedStatement in the cache and create and put
it in the cache if it doesn't exists. (The cache is a in-memory fixed size
LRU based).

Is a safe approach to cache PreparedStatement in the client side?.  Looking
at the Java driver code, the Cluster class stores the PreparedStatements as
a weak reference (to rebuild when a node is down or a  new node added).

Thanks
Ajay


Re: Pagination support on Java Driver Query API

2015-02-13 Thread Ajay
The syntax suggested by Ondrej is not working in some case in 2.0.11 and
logged an issue for the same.

https://issues.apache.org/jira/browse/CASSANDRA-8797

Thanks
Ajay
On Feb 12, 2015 11:01 PM, Bulat Shakirzyanov 
bulat.shakirzya...@datastax.com wrote:

 Fixed my Mail.app settings so you can see my actual name, sorry.

 On Feb 12, 2015, at 8:55 AM, DataStax bulat.shakirzya...@datastax.com
 wrote:

 Hello,

 As was mentioned earlier, the Java driver doesn’t actually perform
 pagination.

 Instead, it uses cassandra native protocol to set page size of the result
 set. (
 https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v2.spec#L699-L730
 )
 When Cassandra sends the result back to the java driver, it includes a
 some binary token.
 This token represents paging state. To fetch the next page, the driver
 re-executes the same
 statement with original page size and paging state attached. If there is
 another page available,
 Cassandra responds with a new paging state that can be used to fetch it.

 You could also try reporting this issue on the Cassandra user mailing list.

 On Feb 12, 2015, at 8:35 AM, Eric Stevens migh...@gmail.com wrote:

 I don't know what the shape of the page state data is deep inside the
 JavaDriver, I've actually tried to dig into that in the past and understand
 it to see if I could reproduce it as a general purpose any-query kind of
 thing.  I gave up before I fully understood it, but I think it's actually a
 handle to an in-memory state maintained by the coordinator, which is only
 maintained for the lifetime of the statement (i.e. it's not stateless
 paging). That would make it a bad candidate for stateless paging scenarios
 such as REST requests where a typical setup would load balance across HTTP
 hosts, never mind across coordinators.

 It shouldn't be too much work to abstract this basic idea for manual
 paging into a general purpose class that takes List[ClusteringKeyDef[T,
 O:Ordering]], and can produce a connection agnostic PageState from a
 ResultSet or Row, or accepts a PageState to produce a WHERE CQL fragment.



 Also RE: possibly multiple queries to satisfy a page - yes, that's
 unfortunate.  Since you're on 2.0.11, see Ondřej's answer to avoid it.

 On Thu, Feb 12, 2015 at 8:13 AM, Ajay ajay.ga...@gmail.com wrote:

 Thanks Eric. I figured out the same but didn't get time to put it on the
 mail. Thanks.

 But it is highly tied up to how data is stored internally in Cassandra.
 Basically how partition keys are used to distribute (less likely to change.
 We are not directly dependence on the partition algo) and clustering keys
 are used to sort the data with in a partition( multi level sorting and
 henceforth the restrictions on the ORDER BY clause) which I think can
 change likely down the lane in Cassandra 3.x or 4.x in an different way for
 some better storage or retrieval.

 Thats said I am hesitant to implement this client side logic for
 pagination for a) 2+ queries might need more than one query to Cassandra.
 b)  tied up implementation to Cassandra internal storage details which can
 change(though not often). c) in our case, we are building REST Apis which
 will be deployed Tomcat clusters. Hence whatever we cache to support
 pagination, need to be cached in a distributed way for failover support.

 It (pagination support) is best done at the server side like ROWNUM in
 SQL or better done in Java driver to hide the internal details and can be
 optimized better as server sends the paging state with the driver.

 Thanks
 Ajay
 On Feb 12, 2015 8:22 PM, Eric Stevens migh...@gmail.com wrote:

 Your page state then needs to track the last ck1 and last ck2 you saw.
 Pages 2+ will end up needing to be up to two queries if the first query
 doesn't fill the page size.

 CREATE TABLE foo (
   partitionkey int,
   ck1 int,
   ck2 int,
   col1 int,
   col2 int,
   PRIMARY KEY ((partitionkey), ck1, ck2)
 ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);

 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);

 If you're pulling the whole of partition 1 and your page size is 2, your
 first page looks like:

 *PAGE 1*

 SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
  partitionkey | ck1 | ck2 | col1 | col2
 --+-+-+--+--
 1 |   1 |   3 |3 |3
 1 |   1 |   2 |2 |2

 You got enough rows to satisfy the page, Your page state is taken from
 the last row: (ck1=1, ck2=2)


 *PAGE 2*
 Notice that you have a page state, and add some limiting clauses on the
 statement:

 SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Ajay
Thanks Eric. I figured out the same but didn't get time to put it on the
mail. Thanks.

But it is highly tied up to how data is stored internally in Cassandra.
Basically how partition keys are used to distribute (less likely to change.
We are not directly dependence on the partition algo) and clustering keys
are used to sort the data with in a partition( multi level sorting and
henceforth the restrictions on the ORDER BY clause) which I think can
change likely down the lane in Cassandra 3.x or 4.x in an different way for
some better storage or retrieval.

Thats said I am hesitant to implement this client side logic for pagination
for a) 2+ queries might need more than one query to Cassandra. b)  tied up
implementation to Cassandra internal storage details which can
change(though not often). c) in our case, we are building REST Apis which
will be deployed Tomcat clusters. Hence whatever we cache to support
pagination, need to be cached in a distributed way for failover support.

It (pagination support) is best done at the server side like ROWNUM in SQL
or better done in Java driver to hide the internal details and can be
optimized better as server sends the paging state with the driver.

Thanks
Ajay
On Feb 12, 2015 8:22 PM, Eric Stevens migh...@gmail.com wrote:

 Your page state then needs to track the last ck1 and last ck2 you saw.
 Pages 2+ will end up needing to be up to two queries if the first query
 doesn't fill the page size.

 CREATE TABLE foo (
   partitionkey int,
   ck1 int,
   ck2 int,
   col1 int,
   col2 int,
   PRIMARY KEY ((partitionkey), ck1, ck2)
 ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);

 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
 INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);

 If you're pulling the whole of partition 1 and your page size is 2, your
 first page looks like:

 *PAGE 1*

 SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
  partitionkey | ck1 | ck2 | col1 | col2
 --+-+-+--+--
 1 |   1 |   3 |3 |3
 1 |   1 |   2 |2 |2

 You got enough rows to satisfy the page, Your page state is taken from the
 last row: (ck1=1, ck2=2)


 *PAGE 2*
 Notice that you have a page state, and add some limiting clauses on the
 statement:

 SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2  2 LIMIT 2;
  partitionkey | ck1 | ck2 | col1 | col2
 --+-+-+--+--
 1 |   1 |   1 |1 |1

 Oops, we didn't get enough rows to satisfy the page limit, so we need to
 continue on, we just need one more:

 SELECT * FROM foo WHERE partitionkey = 1 AND ck1  1 LIMIT 1;
  partitionkey | ck1 | ck2 | col1 | col2
 --+-+-+--+--
 1 |   2 |   3 |6 |6

 We have enough to satisfy page 2 now, our new page state: (ck1 = 2, ck2 =
 3).


 *PAGE 3*

 SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2  3 LIMIT 2;
  partitionkey | ck1 | ck2 | col1 | col2
 --+-+-+--+--
 1 |   2 |   2 |5 |5
 1 |   2 |   1 |4 |4

 Great, we satisfied this page with only one query, page state: (ck1 = 2,
 ck2 = 1).


 *PAGE 4*

 SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2  1 LIMIT 2;
 (0 rows)

 Oops, our initial query was on the boundary of ck1, but this looks like
 any other time that the initial query returns  pageSize rows, we just move
 on to the next page:

 SELECT * FROM foo WHERE partitionkey = 1 AND ck1  2 LIMIT 2;
 (0 rows)

 Aha, we've exhausted ck1 as well, so there are no more pages, page 3
 actually pulled the last possible value; page 4 is empty, and we're all
 done.  Generally speaking you know you're done when your first clustering
 key is the only non-equality operator in the statement, and you got no rows
 back.






 On Wed, Feb 11, 2015 at 10:55 AM, Ajay ajay.ga...@gmail.com wrote:

 Basically I am trying different queries with your approach.

 One such query is like

 Select * from mycf where condition on partition key order by ck1 asc, ck2
 desc where ck1 and ck2 are clustering keys in that order.

 Here how do we achieve pagination support?

 Thanks
 Ajay
 On Feb 11, 2015 11:16 PM, Ajay ajay.ga...@gmail.com wrote:


 Hi Eric,

 Thanks for your reply.

 I am using Cassandra 2.0.11 and in that I cannot append condition like
 last clustering key column  value of the last row in the previous batch.
 It fails Preceding column is either not restricted or by a non-EQ relation.
 It means I need to specify equal  condition for all preceding clustering
 key columns. With this I cannot get the pagination correct.

 Thanks

Re: Pagination support on Java Driver Query API

2015-02-11 Thread Ajay
Hi Eric,

Thanks for your reply.

I am using Cassandra 2.0.11 and in that I cannot append condition like last
clustering key column  value of the last row in the previous batch. It
fails Preceding column is either not restricted or by a non-EQ relation. It
means I need to specify equal  condition for all preceding clustering key
columns. With this I cannot get the pagination correct.

Thanks
Ajay
 I can't believe that everyone read  process all rows at once (without
pagination).

Probably not too many people try to read all rows in a table as a single
rolling operation with a standard client driver.  But those who do would
use token() to keep track of where they are and be able to resume with that
as well.

But it sounds like you're talking about paginating a subset of data -
larger than you want to process as a unit, but prefiltered by some other
criteria which prevents you from being able to rely on token().  For this
there is no general purpose solution, but it typically involves you
maintaining your own paging state, typically keeping track of the last
partitioning and clustering key seen, and using that to construct your next
query.

For example, we have client queries which can span several partitioning
keys.  We make sure that the List of partition keys generated by a given
client query List(Pq) is deterministic, then our paging state is the index
offset of the final Pq in the response, plus the value of the final
clustering column.  A query coming in with a paging state attached to it
starts the next set of queries from the provided Pq offset where
clusteringKey  the provided value.

So if you can just track partition key offset (if spanning multiple
partitions), and clustering key offset, you can construct your next query
from those instead.

On Tue, Feb 10, 2015 at 6:58 PM, Ajay ajay.ga...@gmail.com wrote:

 Thanks Alex.

 But is there any workaround possible?. I can't believe that everyone read
  process all rows at once (without pagination).

 Thanks
 Ajay
 On Feb 10, 2015 11:46 PM, Alex Popescu al...@datastax.com wrote:


 On Tue, Feb 10, 2015 at 4:59 AM, Ajay ajay.ga...@gmail.com wrote:

 1) Java driver implicitly support Pagination in the ResultSet (using
 Iterator) which can be controlled through FetchSize. But it is limited in a
 way that we cannot skip or go previous. The FetchState is not exposed.


 Cassandra doesn't support skipping so this is not really a limitation of
 the driver.


 --

 [:-a)

 Alex Popescu
 Sen. Product Manager @ DataStax
 @al3xandru

 To unsubscribe from this group and stop receiving emails from it, send an
 email to java-driver-user+unsubscr...@lists.datastax.com.




Re: Pagination support on Java Driver Query API

2015-02-11 Thread Ajay
Basically I am trying different queries with your approach.

One such query is like

Select * from mycf where condition on partition key order by ck1 asc, ck2
desc where ck1 and ck2 are clustering keys in that order.

Here how do we achieve pagination support?

Thanks
Ajay
On Feb 11, 2015 11:16 PM, Ajay ajay.ga...@gmail.com wrote:


 Hi Eric,

 Thanks for your reply.

 I am using Cassandra 2.0.11 and in that I cannot append condition like
 last clustering key column  value of the last row in the previous batch.
 It fails Preceding column is either not restricted or by a non-EQ relation.
 It means I need to specify equal  condition for all preceding clustering
 key columns. With this I cannot get the pagination correct.

 Thanks
 Ajay
  I can't believe that everyone read  process all rows at once (without
 pagination).

 Probably not too many people try to read all rows in a table as a single
 rolling operation with a standard client driver.  But those who do would
 use token() to keep track of where they are and be able to resume with that
 as well.

 But it sounds like you're talking about paginating a subset of data -
 larger than you want to process as a unit, but prefiltered by some other
 criteria which prevents you from being able to rely on token().  For this
 there is no general purpose solution, but it typically involves you
 maintaining your own paging state, typically keeping track of the last
 partitioning and clustering key seen, and using that to construct your next
 query.

 For example, we have client queries which can span several partitioning
 keys.  We make sure that the List of partition keys generated by a given
 client query List(Pq) is deterministic, then our paging state is the
 index offset of the final Pq in the response, plus the value of the final
 clustering column.  A query coming in with a paging state attached to it
 starts the next set of queries from the provided Pq offset where
 clusteringKey  the provided value.

 So if you can just track partition key offset (if spanning multiple
 partitions), and clustering key offset, you can construct your next query
 from those instead.

 On Tue, Feb 10, 2015 at 6:58 PM, Ajay ajay.ga...@gmail.com wrote:

 Thanks Alex.

 But is there any workaround possible?. I can't believe that everyone read
  process all rows at once (without pagination).

 Thanks
 Ajay
 On Feb 10, 2015 11:46 PM, Alex Popescu al...@datastax.com wrote:


 On Tue, Feb 10, 2015 at 4:59 AM, Ajay ajay.ga...@gmail.com wrote:

 1) Java driver implicitly support Pagination in the ResultSet (using
 Iterator) which can be controlled through FetchSize. But it is limited in a
 way that we cannot skip or go previous. The FetchState is not exposed.


 Cassandra doesn't support skipping so this is not really a limitation of
 the driver.


 --

 [:-a)

 Alex Popescu
 Sen. Product Manager @ DataStax
 @al3xandru

 To unsubscribe from this group and stop receiving emails from it, send
 an email to java-driver-user+unsubscr...@lists.datastax.com.





Re: Pagination support on Java Driver Query API

2015-02-10 Thread Ajay
Thanks Alex.

But is there any workaround possible?. I can't believe that everyone read 
process all rows at once (without pagination).

Thanks
Ajay
On Feb 10, 2015 11:46 PM, Alex Popescu al...@datastax.com wrote:


 On Tue, Feb 10, 2015 at 4:59 AM, Ajay ajay.ga...@gmail.com wrote:

 1) Java driver implicitly support Pagination in the ResultSet (using
 Iterator) which can be controlled through FetchSize. But it is limited in a
 way that we cannot skip or go previous. The FetchState is not exposed.


 Cassandra doesn't support skipping so this is not really a limitation of
 the driver.


 --

 [:-a)

 Alex Popescu
 Sen. Product Manager @ DataStax
 @al3xandru

 To unsubscribe from this group and stop receiving emails from it, send an
 email to java-driver-user+unsubscr...@lists.datastax.com.



Pagination support on Java Driver Query API

2015-02-10 Thread Ajay
Hi,

I am working on exposing the Cassandra Query APIs(Java Driver) as REST APIs
for our internal project.

To support Pagination, I looked at the Cassandra documentation, Source code
and other forums.
What I mean by pagination support is like below:

1) Client fires query to REST server
2) Server prepares the statement, caches the query and return a query id
(unique id)
3) Get the query id, offset and limit and return the set of rows according
to the offset and limit and also return the last returned row offset.
4) Client make subsequent calls to the server with the offset returned by
the server until all rows are returned. In case once call fails or times
out, the client will make a call again.

Below are the details I found:

1) Java driver implicitly support Pagination in the ResultSet (using
Iterator) which can be controlled through FetchSize. But it is limited in a
way that we cannot skip or go previous. The FetchState is not exposed.

2) Using token() function on the clustering keys of the last returned row,
we can skip the returned rows and using the LIMIT keyword, we can limit the
number of rows. But the problem I see is that the token() function cannot
be used if the query contains ORDER BY clause.

Is there any other way to achieve the pagination support?

Thanks
Ajay


Re: Performance difference between Regular Statement Vs PreparedStatement

2015-01-29 Thread Ajay
Thanks Eric. I didn't know the point about the token aware routing.

But with points 2 and 3 I didn't notice much improvement with prepared
statement.  I have 2 cassandra nodes running in virtual boxes in the same
machine and test client running in the same machine.

Thanks
Ajay
Prepared statements can take advantage of token aware routing which IIRC
non-prepared statements cannot in the DS Java Driver, so as your cluster
grows you reduce the overhead of statement coordination (assuming you use
token aware routing).  There should also be less data to transfer for
shipping the query (the CQL portion is shipped once during the prepare
stage, and only the data is shipped on subsequent executions).  You'll also
save the cluster the overhead of repeatedly parsing your CQL statements.

On Wed, Jan 28, 2015 at 11:50 PM, Ajay ajay.ga...@gmail.com wrote:

 Hi All,

 I tried both insert and select query (using QueryBuilder) in Regular
 statement and PreparedStatement in a multithreaded code to do the query say
 10k to 50k times. But I don't see any visible improvement using the
 PreparedStatement. What could be the reason?

 Note : I am using the same Session object in multiple threads.

 Cassandra version : 2.0.11
 Driver version : 2.1.4

 Thanks
 Ajay



Re: User audit in Cassandra

2015-01-09 Thread Ajay
Thanks Tyler Hobbs.


We need to capture what are the queries ran by a user in a session and its
time taken. (don't need query plan or so). Is that possible? With
Authenticator we can capture only the session creation right?

Thanks
Ajay


On Sat, Jan 10, 2015 at 6:07 AM, Tyler Hobbs ty...@datastax.com wrote:

 system_traces is for query tracing, which is for diagnosing performance
 problems, not logging activity.

 Cassandra is designed to allow you to write your own Authenticator pretty
 easily.  You can just subclass PasswordAuthenticator and add logging where
 desired.  Compile that into a jar, put it in the lib/ directory for
 Cassandra, and change cassandra.yaml to use that class.

 On Thu, Jan 8, 2015 at 6:34 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 Is there a way to enable user audit or trace if we have enabled
 PasswordAuthenticator in cassandra.yaml and set up the users as well. I
 noticed there are keyspaces system_auth and system_trace. But there is no
 way to find out which user initiated which session. Is there anyway to find
 out?. Also is it recommended to enable system_trace in production or to
 know how many sessions started by a user?

 Thanks
 Ajay




 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: Cassandra primary key design to cater range query

2015-01-09 Thread Ajay
Hi,

I read somewhere that the order of columns in the cluster key matters.
Please correct me if I am wrong.

For example,

PRIMARY KEY((prodgroup), status, productid).

Then the below query cannot run,

select * from product where prodgroup='xyz' and prodid  0

But this query can be run:
select * from product where prodgroup='xyz' and prodid  0 and status = 0

It means all the preceding part of the clustering key has to be provided in
the query. So with that, if you want to query Get details of a specific
product(either active or inactive), you might need to reorder the columns
like PRIMARY KEY((prodgroup), productid, status).

Thanks
Ajay


On Sat, Jan 10, 2015 at 6:03 AM, Tyler Hobbs ty...@datastax.com wrote:

 Your proposed model for the table to handle the last query looks good, so
 I would stick with that.

 On Mon, Jan 5, 2015 at 5:45 AM, Nagesh nageswara.r...@gmail.com wrote:

 Hi All,

 I have designed a column family

 prodgroup text, prodid int, status int, , PRIMARY KEY ((prodgroup),
 prodid, status)

 The data model is to cater

- Get list of products from the product group
- get list of products for a given range of ids
- Get details of a specific product
- Update status of the product acive/inactive
- Get list of products that are active or inactive (select * from
product where prodgroup='xyz' and prodid  0 and status = 0)

 The design works fine, except for the last query . Cassandra not allowing
 to query on status unless I fix the product id. I think defining a super
 column family which has the key PRIMARY KEY((prodgroup), staus,
 productid) should work. Would like to get expert advice on other
 alternatives.
 --
 Thanks,
 Nageswara Rao.V

 *The LORD reigns*




 --
 Tyler Hobbs
 DataStax http://datastax.com/



User audit in Cassandra

2015-01-08 Thread Ajay
Hi,

Is there a way to enable user audit or trace if we have enabled
PasswordAuthenticator in cassandra.yaml and set up the users as well. I
noticed there are keyspaces system_auth and system_trace. But there is no
way to find out which user initiated which session. Is there anyway to find
out?. Also is it recommended to enable system_trace in production or to
know how many sessions started by a user?

Thanks
Ajay


Token function in CQL for composite partition key

2015-01-07 Thread Ajay
Hi,

I have a column family as below:

(Wide row design)
CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY
KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC);

Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to
2015-01-07 14, how do I use the token function in the CQL.

Thanks
Ajay


Re: Token function in CQL for composite partition key

2015-01-07 Thread Ajay
Thanks.

Basically there are two access patterns:
1) For last 1 hour (or more if last batch failed for some reason), get the
clicks data for all Ads. But it seems not possible as Ad Id is part of
Partition key.
2) For last 1 hour (or more if last batch failed for some reason),  get the
clicks data for a specific Ad Id(one or more may be).

How do we support 1 and 2 with a same data model? (I thought to use Ad ID +
Hour data as Partition key to avoid hotspots)

Thanks
Ajay


On Wed, Jan 7, 2015 at 6:34 PM, Sylvain Lebresne sylv...@datastax.com
wrote:

 On Wed, Jan 7, 2015 at 10:18 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 I have a column family as below:

 (Wide row design)
 CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY
 KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC);

 Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to
 2015-01-07 14, how do I use the token function in the CQL.


 From that description, it doesn't appear to me that you need the token
 function. Just do 3 queries for each hour, each queries being something
 along the lines of
   SELECT * FROM clicks WHERE adId=... AND hour='2015-01-07 11' AND ...

 For completness sake, I should note that you could do that with a single
 query by using an IN on the hour column, but it's actually not a better
 solution (provided you submit the 3 queries in an asynchronous fashion at
 least) in that case because of reason explained here:
 https://medium.com/@foundev/cassandra-query-patterns-not-using-the-in-query-e8d23f9b17c7
 .

 --
 Sylvain





Re: Keyspace uppercase name issues

2015-01-07 Thread Ajay
We noticed the same issue. From the cassandra-cli, it allows to use upper
case or mixed case Keyspace name but from cqlsh it auto converts to lower
case.

Thanks
Ajay

On Wed, Jan 7, 2015 at 9:44 PM, Harel Gliksman harelg...@gmail.com wrote:

 Hi,

 We have a Cassandra cluster with Keyspaces that were created using the
 thrift api and thei names contain upper case letters.
 We are trying to use the new Datastax driver (version 2.1.4, maven's
 latest ) but encountering some problems due to upper case handling.

 Datastax provide this guidance on how to handle lower-upper cases:

 http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/ucase-lcase_r.html

 However, there seems to be something confusing in the API.

 Attached a small java code that reproduces the problem.

 Many thanks,
 Harel.



Re: Cassandra nodes in VirtualBox

2015-01-05 Thread Ajay
Neha,

This is just for a trial set up. Anyway, thanks for the suggestion(more
than 1 seed node).

I figured out the problem. The Node2 was having the incorrect Cluster name.
The error seems to be misleading though.

Thanks
Ajay Garga



On Mon, Jan 5, 2015 at 4:21 PM, Neha Trivedi nehajtriv...@gmail.com wrote:

 Hi Ajay,
 1. you should have at least 2 Seed nodes as it will help, Node1 (only one
 seed node) is down.
 2. Check you should be using internal ip address in listen_address and
 rpc_address.




 On Mon, Jan 5, 2015 at 2:07 PM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 I did the Cassandra cluster set up as below:

 Node 1 : Seed Node
 Node 2
 Node 3
 Node 4

 All 4 nodes are Virtual Box VMs with Ubuntu 14.10. I have set the
 listen_address, rpc_address as the inet address with SimpleSnitch.

 When I start Node2 after Node1 is started, I get the
 java.lang.RuntimeException: Unable to news with any seeds.

 What could be the reason?

 Thanks
 Ajay





Cassandra nodes in VirtualBox

2015-01-05 Thread Ajay
Hi,

I did the Cassandra cluster set up as below:

Node 1 : Seed Node
Node 2
Node 3
Node 4

All 4 nodes are Virtual Box VMs with Ubuntu 14.10. I have set the
listen_address, rpc_address as the inet address with SimpleSnitch.

When I start Node2 after Node1 is started, I get the
java.lang.RuntimeException: Unable to news with any seeds.

What could be the reason?

Thanks
Ajay


Re: User click count

2014-12-31 Thread Ajay
Thanks Eric.

Happy new year 2015 for all Cassandra developers and Users :). This group
seems the most active of apache big data projects.

Will come back with more questions :)

Thanks
Ajay
On Dec 31, 2014 8:02 PM, Eric Stevens migh...@gmail.com wrote:

 You can totally avoid the impact of tombstones by rotating your partition
 key in the exact counts table, and only deleting whole partitions once
 you've counted them.  Once you've counted them you never have cause to read
 that partition key again.

 You can totally store the final counts in Cassandra as a standard
 (non-counter) column, and you can even use counters to keep track of the
 time slices which haven't been formally counted yet so that you can get
 reasonably accurate information about time slices that haven't been trued
 up yet.

 This is basically what's called a Lambda architecture - use efficient real
 time processing to get pretty close to accurate values when real time
 performance matters, then use a cleanup process to get perfectly accurate
 values when you can afford non-real-time processing times, and store that
 final computation so that you can continue to access it quickly.

  is there any technical reason behind it (just out of curiosity)?

 Distributed counting is a fundamentally hard problem if you wish to do so
 in a manner that avoids bottlenecks (i.e. not distributed) and also
 provides for perfect accuracy.  There's plenty of research in this area,
 and there isn't a single algorithm that provides for all the properties we
 would hope for.  Instead there are different algorithms that make different
 tradeoffs.

 The way that Cassandra's counters can fail is that most operations in
 Cassandra are idempotent - if we're not sure whether an update has been
 applied correctly or not, we can simply apply it again, because it's safe
 to do twice.  Counters are not idempotent.  If you try to increment a
 counter, and you're not certain whether the increment was successful or
 not, it is *not* safe to try again (if it was successful the previous
 time, you've now incremented twice when it should have been once).

 Most of the time counters are reasonable and accurate, but in failure
 scenarios you may get some changes applied more than once, or not at all.
 With that in mind, you might find that being perfectly accurate most of the
 time, and being within a fraction of a percent the other times is
 acceptable.  If so, counters are your friend, and if not, a more complex
 lambda style approach as we've been advocating here is best.

 On Tue, Dec 30, 2014 at 10:54 PM, Ajay ajay.ga...@gmail.com wrote:

 Thanks Janne and Rob.

 The idea is like this : To store the User clicks on Cassandra and a
 scheduler to count/aggregate the  clicks per link or ad
 hourly/daily/monthly and store in My SQL (or may be in Cassandra itself).
 Since tombstones will be deleted only after some days (as per
 configuration), could the subsequent queries to count the rows get affected
 (I mean say thousands of tombstones will affect the performance of the
 query) ?

 Secondly as I understand from this mail thread, the counter is not
 correct for this use case, is there any technical reason behind it (just
 out of curiosity)?

 Thanks
 Ajay

 On Tue, Dec 30, 2014 at 10:37 PM, Janne Jalkanen 
 janne.jalka...@ecyrd.com wrote:


 Hi!

 Yes, since all the writes for a partition (or row if you speak Thrift)
 always go to the same replicas, you will need to design to avoid hotspots -
 a pure day row will cause all the writes for a single day to go to the same
 replicas, so those nodes will have to work really hard for a day, and then
 the next day it’s again hard work for some other nodes.  If you have an
 user id there in front, then it would distribute better.

 For tombstone purposes think of your access patterns; if you have a
 date-based system, it probably does not matter since you will scan those
 UUIDs once, and then they will be tombstoned away.  It’s cleaner if you can
 delete the entire row with a single command, but as long as you never read
 it again, I don’t think this matters much.

 The real problems with wide rows come with compaction, and you shouldn’t
 have much problems with compaction because this is an append-only row, so
 it should be fine as a fairly wide row.  Make some back-of-the-envelope
 calculations and if it looks like you’re going to be hitting tens of
 millions of columns per day, then store per hour.

 One important thing: in order not to lose clicks, always use timeuuids
 instead of timestamps (or else two clicks coming in for the same id would
 overwrite itself and count as one).

 /Janne

 On 30 Dec 2014, at 06:28, Ajay ajay.ga...@gmail.com wrote:

 Thanks Janne, Alain and Eric.

 Now say I go with counters (hourly, daily, monthly) and also store UUID
 as below:

 user Id : /mm/dd as row key and dynamic columns for each click with
 column key as timestamp and value as empty. Periodically count the columns
 and rows

Stable cassandra build for production usage

2014-12-31 Thread Ajay
Hi All,

For my research and learning I am using Cassandra 2.1.2. But I see couple
of mail threads going on issues in 2.1.2. So what is the stable or popular
build for production in Cassandra 2.x series.

Thanks
Ajay


Re: User click count

2014-12-30 Thread Ajay
Thanks Janne and Rob.

The idea is like this : To store the User clicks on Cassandra and a
scheduler to count/aggregate the  clicks per link or ad
hourly/daily/monthly and store in My SQL (or may be in Cassandra itself).
Since tombstones will be deleted only after some days (as per
configuration), could the subsequent queries to count the rows get affected
(I mean say thousands of tombstones will affect the performance of the
query) ?

Secondly as I understand from this mail thread, the counter is not correct
for this use case, is there any technical reason behind it (just out of
curiosity)?

Thanks
Ajay

On Tue, Dec 30, 2014 at 10:37 PM, Janne Jalkanen janne.jalka...@ecyrd.com
wrote:


 Hi!

 Yes, since all the writes for a partition (or row if you speak Thrift)
 always go to the same replicas, you will need to design to avoid hotspots -
 a pure day row will cause all the writes for a single day to go to the same
 replicas, so those nodes will have to work really hard for a day, and then
 the next day it’s again hard work for some other nodes.  If you have an
 user id there in front, then it would distribute better.

 For tombstone purposes think of your access patterns; if you have a
 date-based system, it probably does not matter since you will scan those
 UUIDs once, and then they will be tombstoned away.  It’s cleaner if you can
 delete the entire row with a single command, but as long as you never read
 it again, I don’t think this matters much.

 The real problems with wide rows come with compaction, and you shouldn’t
 have much problems with compaction because this is an append-only row, so
 it should be fine as a fairly wide row.  Make some back-of-the-envelope
 calculations and if it looks like you’re going to be hitting tens of
 millions of columns per day, then store per hour.

 One important thing: in order not to lose clicks, always use timeuuids
 instead of timestamps (or else two clicks coming in for the same id would
 overwrite itself and count as one).

 /Janne

 On 30 Dec 2014, at 06:28, Ajay ajay.ga...@gmail.com wrote:

 Thanks Janne, Alain and Eric.

 Now say I go with counters (hourly, daily, monthly) and also store UUID as
 below:

 user Id : /mm/dd as row key and dynamic columns for each click with
 column key as timestamp and value as empty. Periodically count the columns
 and rows and correct the counters. Now in this case, there will be one row
 per day but as many columns as user click.

 Other way is to store row per hour
 user id : /mm/dd/hh as row key and dynamic columns for each click with
 column key as timestamp and value as empty.

 Is there any difference (in performance or any known issues) between more
 rows Vs more columns as Cassandra deletes them through tombstones (say by
 default 20 days).

 Thanks
 Ajay

 On Mon, Dec 29, 2014 at 7:47 PM, Eric Stevens migh...@gmail.com wrote:

  If the counters get incorrect, it could't be corrected

 You'd have to store something that allowed you to correct it.  For
 example, the TimeUUID approach to keep true counts, which are slow to read
 but accurate, and a background process that trues up your counter columns
 periodically.

 On Mon, Dec 29, 2014 at 7:05 AM, Ajay ajay.ga...@gmail.com wrote:

 Thanks for the clarification.

 In my case, Cassandra is the only storage. If the counters get
 incorrect, it could't be corrected. For that if we store raw data, we can
 as well go that approach. But the granularity has to be as seconds level as
 more than one user can click the same link. So the data will be huge with
 more writes and more rows to count for reads right?

 Thanks
 Ajay


 On Mon, Dec 29, 2014 at 7:10 PM, Alain RODRIGUEZ arodr...@gmail.com
 wrote:

 Hi Ajay,

 Here is a good explanation you might want to read.


 http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

 Though we use counters for 3 years now, we used them from start C* 0.8
 and we are happy with them. Limits I can see in both ways are:

 Counters:

 - accuracy indeed (Tend to be small in our use case  5% - when the
 business allow 10%, so fair enough for us) + we recount them through a
 batch processing tool (spark / hadoop - Kind of lambda architecture). So
 our real-time stats are inaccurate and after a few minutes or hours we have
 the real value.
 - Read-Before-Write model, which is an anti-pattern. Makes you use more
 machine due to the pressure involved, affordable for us too.

 Raw data (counted)

 - Space used (can become quite impressive very fast, depending on your
 business) !
 - Time to answer a request (we expose the data to customer, they don't
 want to wait 10 sec for Cassandra to read 1 000 000 + columns)
 - Performances in o(n) (linear) instead of o(1) (constant). Customer
 won't always understand that for you it is harder to read 1 than 1 000 000,
 since it should be reading 1 number in both case, and your interface will
 have very unstable read time.

 Pick the best solution

User click count

2014-12-29 Thread Ajay
Hi,

Is it better to use Counter to User click count than maintaining creating
new row as user id : timestamp and count it.

Basically we want to track the user clicks and use the same for
hourly/daily/monthly report.

Thanks
Ajay


Re: User click count

2014-12-29 Thread Ajay
Hi,

So you mean to say counters are not accurate? (It is highly likely that
multiple parallel threads trying to increment the counter as users click
the links).

Thanks
Ajay


On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen janne.jalka...@ecyrd.com
wrote:


 Hi!

 It’s really a tradeoff between accurate and fast and your read access
 patterns; if you need it to be fairly fast, use counters by all means, but
 accept the fact that they will (especially in older versions of cassandra
 or adverse network conditions) drift off from the true click count.  If you
 need accurate, use a timeuuid and count the rows (this is fairly safe for
 replays too).  However, if using timeuuids your storage will need lots of
 space; and your reads will be slow if the click counts are huge (because
 Cassandra will need to read every item).  Using counters makes it easy to
 just grab a slice of the time series data and shove it to a client for
 visualization.

 You could of course do a hybrid system; use timeuuids and then
 periodically count and add the result to a regular column, and then remove
 the columns.  Note that you might want to optimize this so that you don’t
 end up with a lot of tombstones, e.g. by bucketing the writes so that you
 can delete everything with just a single partition delete.

 At Thinglink some of the more important counters that we use are backed up
 by the actual data. So for speed purposes we use always counters for reads,
 but there’s a repair process that fixes the counter value if we suspect it
 starts drifting off the real data too much.  (You might be able to tell
 that we’ve been using counters for quite some time :-P)

 /Janne

 On 29 Dec 2014, at 13:00, Ajay ajay.ga...@gmail.com wrote:

  Hi,
 
  Is it better to use Counter to User click count than maintaining
 creating new row as user id : timestamp and count it.
 
  Basically we want to track the user clicks and use the same for
 hourly/daily/monthly report.
 
  Thanks
  Ajay




Re: User click count

2014-12-29 Thread Ajay
Thanks for the clarification.

In my case, Cassandra is the only storage. If the counters get incorrect,
it could't be corrected. For that if we store raw data, we can as well go
that approach. But the granularity has to be as seconds level as more than
one user can click the same link. So the data will be huge with more writes
and more rows to count for reads right?

Thanks
Ajay


On Mon, Dec 29, 2014 at 7:10 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Hi Ajay,

 Here is a good explanation you might want to read.


 http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

 Though we use counters for 3 years now, we used them from start C* 0.8 and
 we are happy with them. Limits I can see in both ways are:

 Counters:

 - accuracy indeed (Tend to be small in our use case  5% - when the
 business allow 10%, so fair enough for us) + we recount them through a
 batch processing tool (spark / hadoop - Kind of lambda architecture). So
 our real-time stats are inaccurate and after a few minutes or hours we have
 the real value.
 - Read-Before-Write model, which is an anti-pattern. Makes you use more
 machine due to the pressure involved, affordable for us too.

 Raw data (counted)

 - Space used (can become quite impressive very fast, depending on your
 business) !
 - Time to answer a request (we expose the data to customer, they don't
 want to wait 10 sec for Cassandra to read 1 000 000 + columns)
 - Performances in o(n) (linear) instead of o(1) (constant). Customer won't
 always understand that for you it is harder to read 1 than 1 000 000, since
 it should be reading 1 number in both case, and your interface will have
 very unstable read time.

 Pick the best solution (or combination) for your use case. Those
 disadvantages lists are not exhaustive, just things that came to my mind
 right now.

 C*heers

 Alain

 2014-12-29 13:33 GMT+01:00 Ajay ajay.ga...@gmail.com:

 Hi,

 So you mean to say counters are not accurate? (It is highly likely that
 multiple parallel threads trying to increment the counter as users click
 the links).

 Thanks
 Ajay


 On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen janne.jalka...@ecyrd.com
  wrote:


 Hi!

 It’s really a tradeoff between accurate and fast and your read access
 patterns; if you need it to be fairly fast, use counters by all means, but
 accept the fact that they will (especially in older versions of cassandra
 or adverse network conditions) drift off from the true click count.  If you
 need accurate, use a timeuuid and count the rows (this is fairly safe for
 replays too).  However, if using timeuuids your storage will need lots of
 space; and your reads will be slow if the click counts are huge (because
 Cassandra will need to read every item).  Using counters makes it easy to
 just grab a slice of the time series data and shove it to a client for
 visualization.

 You could of course do a hybrid system; use timeuuids and then
 periodically count and add the result to a regular column, and then remove
 the columns.  Note that you might want to optimize this so that you don’t
 end up with a lot of tombstones, e.g. by bucketing the writes so that you
 can delete everything with just a single partition delete.

 At Thinglink some of the more important counters that we use are backed
 up by the actual data. So for speed purposes we use always counters for
 reads, but there’s a repair process that fixes the counter value if we
 suspect it starts drifting off the real data too much.  (You might be able
 to tell that we’ve been using counters for quite some time :-P)

 /Janne

 On 29 Dec 2014, at 13:00, Ajay ajay.ga...@gmail.com wrote:

  Hi,
 
  Is it better to use Counter to User click count than maintaining
 creating new row as user id : timestamp and count it.
 
  Basically we want to track the user clicks and use the same for
 hourly/daily/monthly report.
 
  Thanks
  Ajay






Re: Counter Column

2014-12-27 Thread Ajay
Thanks.

I went through some articles which mentioned that the client to pass the
timestamp for insert and update. Is that anyway we can avoid it and
Cassandra assume the current time of the server?

Thanks
Ajay
On Dec 26, 2014 10:50 PM, Eric Stevens migh...@gmail.com wrote:

 Timestamps are timezone independent.  This is a property of timestamps,
 not a property of Cassandra. A given moment is the same timestamp
 everywhere in the world.  To display this in a human readable form, you
 then need to know what timezone you're attempting to represent the
 timestamp as, this is the information necessary to convert it to local time.

 On Fri, Dec 26, 2014 at 2:05 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 If the nodes of Cassandra ring are in different timezone, could it affect
 the counter column as it depends on the timestamp?

 Thanks
 Ajay




Counter Column

2014-12-26 Thread Ajay
Hi,

If the nodes of Cassandra ring are in different timezone, could it affect
the counter column as it depends on the timestamp?

Thanks
Ajay


Throughput Vs Latency

2014-12-25 Thread Ajay
Hi,

I am new to No SQL (and Cassandra). As I am going through few articles on
Cassandra, it says Cassandra achieves highest throughput among various No
SQL solutions but at the cost of high  read and write latency. I have a
basic question here - (If my understanding is right) Latency means the time
taken to accept input, process and respond back. If Latency is more how
come the Throughput is high?

Thanks
Ajay


Re: Throughput Vs Latency

2014-12-25 Thread Ajay
Thanks Thomas for the clarification.

If I use the Consistency level of QUORUM for Read and Write, the Latency
would affect the Throughput right?

Thanks
Ajay

On Fri, Dec 26, 2014 at 11:15 AM, Job Thomas j...@suntecgroup.com wrote:

  Hi,

 First of all,the write latency of cassandra is not high(Read is high).

 The high throughput is achieved through distributes read and write.

 Your doubt ( If Latency is more how come the Throughput is high ) is some
 what right if you put high consistency to both read and write.

 You will get distributed abilities since it is not Master/Slave
 architecture(Like HBase).

  If  your consistency is lesser,then some nodes out of all replica nodes
 are free and will be used for another read/write . [ Think you are using
 multithreaded
 application ]

  Thanks  Regards
 Job M Thomas
 Platform  Technology
 Mob : 7560885748

 --
 *From:* Ajay [mailto:ajay.ga...@gmail.com]
 *Sent:* Fri 12/26/2014 10:46 AM
 *To:* user@cassandra.apache.org
 *Subject:* Throughput Vs Latency

   Hi,

 I am new to No SQL (and Cassandra). As I am going through few articles on
 Cassandra, it says Cassandra achieves highest throughput among various No
 SQL solutions but at the cost of high  read and write latency. I have a
 basic question here - (If my understanding is right) Latency means the time
 taken to accept input, process and respond back. If Latency is more how
 come the Throughput is high?

 Thanks
 Ajay



Re: Throughput Vs Latency

2014-12-25 Thread Ajay
Hi Thomas,

I am little confused when you say multithreaded client. Actually we don't
explicitly invoke read on multiple servers (for replicated data) from the
client code. So how does multithreaded client fix this?

Thanks
Ajay


On Fri, Dec 26, 2014 at 12:08 PM, Job Thomas j...@suntecgroup.com wrote:

 Hi Ajay,

 My understanding is this,If you have a cluster of 3 nodes with replication
 factor of 3 , then the latency has more roll in throughput.

 It the cluster size is 6 with replication factor or 3 and if  you are
 using multithreaded client, then the latency remain same and you will get
 better throughput.(Not because of 6 node but because of 6 nodes and
 multiple threads).

 Thanks  Regards
 Job M Thomas
 Platform  Technology
 Mob : 7560885748

 

 From: Ajay [mailto:ajay.ga...@gmail.com]
 Sent: Fri 12/26/2014 11:57 AM
 To: user@cassandra.apache.org
 Subject: Re: Throughput Vs Latency


 Thanks Thomas for the clarification.


 If I use the Consistency level of QUORUM for Read and Write, the Latency
 would affect the Throughput right?


 Thanks

 Ajay


 On Fri, Dec 26, 2014 at 11:15 AM, Job Thomas j...@suntecgroup.com wrote:


 Hi,

 First of all,the write latency of cassandra is not high(Read is
 high).

 The high throughput is achieved through distributes read and write.

 Your doubt ( If Latency is more how come the Throughput is high )
 is some what right if you put high consistency to both read and write.

 You will get distributed abilities since it is not Master/Slave
 architecture(Like HBase).

  If  your consistency is lesser,then some nodes out of all replica
 nodes are free and will be used for another read/write . [ Think you are
 using multithreaded
 application ]

 Thanks  Regards
 Job M Thomas
 Platform  Technology
 Mob : 7560885748

 

 From: Ajay [mailto:ajay.ga...@gmail.com]
 Sent: Fri 12/26/2014 10:46 AM
 To: user@cassandra.apache.org
 Subject: Throughput Vs Latency


 Hi,


 I am new to No SQL (and Cassandra). As I am going through few
 articles on Cassandra, it says Cassandra achieves highest throughput among
 various No SQL solutions but at the cost of high  read and write latency. I
 have a basic question here - (If my understanding is right) Latency means
 the time taken to accept input, process and respond back. If Latency is
 more how come the Throughput is high?


 Thanks

 Ajay






Re: Cassandra for Analytics?

2014-12-18 Thread Ajay
Thanks Ryan and Peter for the suggestions.

Our requirement(an ecommerce company) at a higher level is to build a
Datawarehouse as a platform or service(for different product teams to
consume) as below:

Datawarehouse as a platform/service
 |
Spark SQL
 |
Spark in memory computation engine (We were considering Drill/Flink but
Spark is better mature and in production)
 |
Cassandra/HBase (Yet to be decided. Aggregated views + data
directly written to this. So 40%-50% writes, 50-60% reads)
 |
Streaming processing (Spark Streaming or Storm. Yet to be decided.
Spark streaming is relatively new)
|
 My SQL/Mongo/Real Time data

Since we are planning to build it as a service, we cannot consider a
particular data access pattern.

Thanks
Ajay


On Thu, Dec 18, 2014 at 7:00 PM, Peter Lin wool...@gmail.com wrote:


 for the record I think spark is good and I'm glad we have options.

 my point wasn't to bad mouth spark. I'm not comparing spark to storm at
 all, so I think there's some confusion here. I'm thinking of espers,
 streambase, and other stream processing products. My point is to think
 about the problems that needs to be solved before picking a solution. Like
 everyone else, I've been guilty of this in the past, so it's not propaganda
 for or against any specific product.

 I've seen customers user IBM infosphere streams when something like storm
 or spark would work, but I've also seen cases where open source doesn't
 provide equivalent functionality. If spark meets the needs, then either
 hbase or cassandra will probably work fine. The bigger question is what
 patterns do you use in the architecture? Do you store the data first before
 doing analysis? Is the data noisy and needs filtering before persistence?
 What kinds of patterns/queries and operations are needed?

 having worked on trading systems and other real-time use cases, not all
 stream processing is the same.

 On Thu, Dec 18, 2014 at 8:18 AM, Ryan Svihla rsvi...@datastax.com wrote:

 I'll decline to continue the commentary on spark, as again this probably
 belongs on another list, other than to say, microbatches is an intentional
 design tradeoff that has notable benefits for the same use cases you're
 referring too, and that while you may disagree with those tradeoffs, it's a
 bit harsh to dismiss as basic something that was chosen and provides some
 improvements over say..the Storm model.

 On Thu, Dec 18, 2014 at 7:13 AM, Peter Lin wool...@gmail.com wrote:


 some of the most common types of use cases in stream processing is
 sliding windows based on time or count. Based on my understanding of spark
 architecture and spark streaming, it does not provide the same
 functionality. One can fake it by setting spark streaming to really small
 micro-batches, but that's not the same.

 if the use case fits that model, than using spark is fine. For other
 kinds of use cases, spark may not be a good fit. Some people store all
 events before analyzing it, which works for some use cases. While other
 uses cases like trading systems, store before analysis isn't feasible or
 practical. Other use cases like command control also don't fit store before
 analysis model.

 Try to avoid putting the cart infront of the horse. Picking a tool
 before you have a clear understanding of the problem is a good recipe for
 disaster

 On Thu, Dec 18, 2014 at 8:04 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 Since Ajay is already using spark the Spark Cassandra Connector really
 gets them where they want to be pretty easily
 https://github.com/datastax/spark-cassandra-connector (joins, etc).

 As far as spark streaming having basic support I'd challenge that
 assertion (namely Storm has a number of problems with delivery guarantees
 that Spark basically solves), however, this isn't a Spark mailing list, and
 perhaps this conversation is better had there.

 If the question Is Cassandra used in real time analytics cases with
 Spark? the answer is absolutely yes (and Storm for that matter). If the
 question is Can you do your analytics queries on Cassandra while you have
 Spark sitting there doing nothing? then of course the answer is no, but
 that'd be a bizzare question, they already have Spark in use.

 On Thu, Dec 18, 2014 at 6:52 AM, Peter Lin wool...@gmail.com wrote:

 that depends on what you mean by real-time analytics.

 For things like continuous data streams, neither are appropriate
 platforms for doing analytics. They're good for storing the results (aka
 output) of the streaming analytics. I would suggest before you decide
 cassandra vs hbase, first figure out exactly what kind of analytics you
 need to do. Start with prototyping and look at what kind of queries and
 patterns you need to support.

 neither hbase or cassandra are good for complex patterns that do joins
 or cross joins (aka mdx), so using either one you have

Re: Cassandra for Analytics?

2014-12-18 Thread Ajay
Hi Peter,

You are right.The idea is to directly query the data from No SQL, in our
case via Spark SQL on Spark (as largely Spark support
Mongo/Cassandra/HBase/Hadoop). As you said, the business users still need
to query using Spark SQL. We are already using No SQL BI tools like Pentaho
(which also plans to support Spark SQL soon). The idea is to abstract the
business users from the storage solutions (more than one. Cassandra/HBase 
Mongo).

Thanks
Ajay

On Thu, Dec 18, 2014 at 8:01 PM, Peter Lin wool...@gmail.com wrote:


 by data warehouse, what kind do you mean?

 is it the traditional warehouse where people create multi-dimensional
 cubes?
 or is it the newer class of UI tools that makes it easier for users to
 explore data and the warehouse is mostly a denormalized (ie flattened)
 format of the OLTP?
 or is it a combination of both?

 from my experience, the biggest challenge of data warehousing isn't
 storing the data. It's making it easy to explore for adhoc mdx-like
 queries. In the old days, the DBA's would define the cubes, write the ETL
 routines and let the data load for days/weeks. In the new nosql model, you
 can avoid the cube + ETL phase, but discovering the data and understanding
 the format still requires a developer.

 getting the data into an user friendly format like a cube with Spark
 still requires a developer. I find that business users hate to go to the
 developer, because we tend to ask what's the functional specs? Most of
 the time business users don't know, they just want to explore. At that
 point, the storage engine largely doesn't matter to the end user. It
 matters to the developers, but business users don't care.

 based on the description, I would watch out for how many aggregated views
 the platform creates. search the mailing list to see past discussions on
 the maximum recommended number of column families.

 where classic data warehouse caused lots of pain is creating cubes. Any
 general solution attempting to replace/supplement existing products needs
 to make it easy and trivial to define adhoc cubes and then query against
 it. There are existing products that already connect to a few nosql
 databases for data exploration. hope that helps

 peter



 On Thu, Dec 18, 2014 at 9:01 AM, Ajay ajay.ga...@gmail.com wrote:

 Thanks Ryan and Peter for the suggestions.

 Our requirement(an ecommerce company) at a higher level is to build a
 Datawarehouse as a platform or service(for different product teams to
 consume) as below:

 Datawarehouse as a platform/service
  |
 Spark SQL
  |
 Spark in memory computation engine (We were considering Drill/Flink but
 Spark is better mature and in production)
  |
 Cassandra/HBase (Yet to be decided. Aggregated views + data
 directly written to this. So 40%-50% writes, 50-60% reads)
  |
 Streaming processing (Spark Streaming or Storm. Yet to be
 decided. Spark streaming is relatively new)
 |
  My SQL/Mongo/Real Time data

 Since we are planning to build it as a service, we cannot consider a
 particular data access pattern.

 Thanks
 Ajay


 On Thu, Dec 18, 2014 at 7:00 PM, Peter Lin wool...@gmail.com wrote:


 for the record I think spark is good and I'm glad we have options.

 my point wasn't to bad mouth spark. I'm not comparing spark to storm at
 all, so I think there's some confusion here. I'm thinking of espers,
 streambase, and other stream processing products. My point is to think
 about the problems that needs to be solved before picking a solution. Like
 everyone else, I've been guilty of this in the past, so it's not propaganda
 for or against any specific product.

 I've seen customers user IBM infosphere streams when something like
 storm or spark would work, but I've also seen cases where open source
 doesn't provide equivalent functionality. If spark meets the needs, then
 either hbase or cassandra will probably work fine. The bigger question is
 what patterns do you use in the architecture? Do you store the data first
 before doing analysis? Is the data noisy and needs filtering before
 persistence? What kinds of patterns/queries and operations are needed?

 having worked on trading systems and other real-time use cases, not all
 stream processing is the same.

 On Thu, Dec 18, 2014 at 8:18 AM, Ryan Svihla rsvi...@datastax.com
 wrote:

 I'll decline to continue the commentary on spark, as again this
 probably belongs on another list, other than to say, microbatches is an
 intentional design tradeoff that has notable benefits for the same use
 cases you're referring too, and that while you may disagree with those
 tradeoffs, it's a bit harsh to dismiss as basic something that was chosen
 and provides some improvements over say..the Storm model.

 On Thu, Dec 18, 2014 at 7:13 AM, Peter Lin wool...@gmail.com wrote:


 some of the most common types of use cases in stream processing is
 sliding

Cassandra for Analytics?

2014-12-17 Thread Ajay
Hi,

Can Cassandra be used or best fit for Real Time Analytics? I went through
couple of benchmark between Cassandra Vs HBase (most of it was done 3 years
ago) and it mentioned that Cassandra is designed for intensive writes and
Cassandra has higher latency for reads than HBase. In our case, we will
have writes and reads (but reads will be more say 40% writes and 60%
reads). We are planning to use Spark as the in memory computation engine.

Thanks
Ajay


Spark SQL Vs CQL performance on Cassandra

2014-12-11 Thread Ajay
Hi,

To test Spark SQL Vs CQL performance on Cassandra, I did the following:

1) Cassandra standalone server (1 server in a cluster)
2) Spark Master and 1 Worker
Both running in a Thinkpad laptop with 4 cores and 8GB RAM.
3) Written Spark SQL code using Cassandra-Spark Driver from Cassandra
(JavaApiDemo.java. Run with spark://127.0.0.1:7077 127.0.0.1)
4) Writen CQL code using Java driver from Cassandra
(CassandraJavaApiDemo.java)
In both the case, I create 1 millions rows and query for 1

Observation:
1) It takes less than 10 milliseconds using CQL (SELECT * FROM users WHERE
name='Anna')
2) It takes around .6 second using Spark (either SELECT * FROM users WHERE
name='Anna' or javaFunctions(sc).cassandraTable(test, people,
mapRowTo(Person.class)).where(name=?, Anna);

Please let me know if I am missing something in Spark configuration or
Cassandra-Spark Driver.

Thanks
Ajay Garga
package com.datastax.demo;

import java.text.SimpleDateFormat;
import java.util.Date;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.ExecutionInfo;
import com.datastax.driver.core.QueryTrace;
import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Row;
import com.datastax.driver.core.Session;
import com.datastax.driver.core.SimpleStatement;
import com.datastax.driver.core.Statement;
import com.datastax.driver.core.querybuilder.QueryBuilder;

public class CassandraJavaApiDemo {
	private static SimpleDateFormat format = new SimpleDateFormat(
			HH:mm:ss.SSS);

	public static void main(String[] args) {
		Cluster cluster = null;
		Session session = null;

		try {
			cluster = Cluster.builder().addContactPoint(127.0.0.1).build();
			session = cluster.connect();

			session.execute(DROP KEYSPACE IF EXISTS test2);
			session.execute(CREATE KEYSPACE test2 WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1});
			session.execute(CREATE TABLE test2.users (id INT, name TEXT, birth_date  TIMESTAMP, PRIMARY KEY (id) ));
			session.execute(CREATE INDEX people_name_idx2 ON test2.users(name));

			session = cluster.connect(test2);
			Statement insert = null;
			for (int i = 0; i  100; i++) {
insert = QueryBuilder.insertInto(users).value(id, i)
		.value(name, Anna + i)
		.value(birth_date, new Date());
session.execute(insert);
			}

			long start = System.currentTimeMillis();
			Statement scan = new SimpleStatement(
	SELECT * FROM users WHERE name='Anna0';);
			scan.enableTracing();
			ResultSet results = session.execute(scan);
			for (Row row : results) {
System.out.format(%d %s\n, row.getInt(id),
		row.getString(name));
			}
			long end = System.currentTimeMillis();
			System.out.println( Time Taken  +  (end - start));
			ExecutionInfo executionInfo = results.getExecutionInfo();
			QueryTrace queryTrace = executionInfo.getQueryTrace();

			System.out.printf(%-38s | %-12s | %-10s | %-12s\n, activity,
	timestamp, source, source_elapsed);
			System.out
	.println(---+--++--);
			for (QueryTrace.Event event : queryTrace.getEvents()) {
System.out.printf(%38s | %12s | %10s | %12s\n,
		event.getDescription(),
		millis2Date(event.getTimestamp()), event.getSource(),
		event.getSourceElapsedMicros());
			}

		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			if (session != null) {
session.close();
			}
			if (cluster != null) {
cluster.close();
			}
		}
	}

	private static Object millis2Date(long timestamp) {
		return format.format(timestamp);
	}
}
package com.datastax.spark.connector.demo;

import com.datastax.driver.core.Session;
import com.datastax.spark.connector.cql.CassandraConnector;
import com.datastax.spark.connector.japi.CassandraRow;
import com.google.common.base.Objects;

import org.apache.hadoop.util.StringUtils;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.SchemaRDD;
import org.apache.spark.sql.cassandra.CassandraSQLContext;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
import java.util.List;

import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapRowTo;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapToRow;

/**
 * This Spark application demonstrates how to use Spark Cassandra Connector with
 * Java.
 * p/
 * In order to run it, you will need to run Cassandra database, and create the
 * following keyspace, table and secondary index:
 * p/
 * 
 * pre
 * CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
 * 
 * CREATE TABLE test.people (
 *  id  INT,
 *  nameTEXT,
 *  birth_date  TIMESTAMP,
 *  PRIMARY KEY (id