Re: Snapshot verification

2017-10-31 Thread Varun Gupta
We use COPY command to generate a file, from source and destination. After
that you can use diff tool.
On Mon, Oct 30, 2017 at 10:11 PM Pradeep Chhetri 
wrote:

> Hi,
>
> We are taking daily snapshots for backing up our cassandra data and then
> use our backups to restore in a different environment. I would like to
> verify that the data is consistent and all the data during the time backup
> was taken is actually restored.
>
> Currently I just count the number of rows in each table. Was wondering if
> there any inbuilt way to accomplish this.
>
> Thank you.
> Pradeep
>


Re: Best approach to prepare to shutdown a cassandra node

2017-10-19 Thread Varun Gupta
Does, nodetool stopdaemon, implicitly drain too? or we should invoke drain
and then stopdaemon?

On Mon, Oct 16, 2017 at 4:54 AM, Simon Fontana Oscarsson <
simon.fontana.oscars...@ericsson.com> wrote:

> Looking at the code in trunk, the stopdemon command invokes the
> CassandraDaemon.stop() function which does a graceful shutdown by stopping
> jmxServer and drains the node by the shutdown hook.
>
> /Simon
>
>
> On 2017-10-13 20:42, Javier Canillas wrote:
>
> As far as I know, the nodetool stopdaemon is doing a "kill -9".
>
> Or did it change?
>
> 2017-10-12 23:49 GMT-03:00 Anshu Vajpayee :
>
>> Why are you killing when we have nodetool stopdaemon ?
>>
>> On Fri, Oct 13, 2017 at 1:49 AM, Javier Canillas <
>> javier.canil...@gmail.com> wrote:
>>
>>> That's what I thought.
>>>
>>> Thanks!
>>>
>>> 2017-10-12 14:26 GMT-03:00 Hannu Kröger :
>>>
 Hi,

 Drain should be enough.  It stops accepting writes and after that
 cassandra can be safely shut down.

 Hannu

 On 12 October 2017 at 20:24:41, Javier Canillas (
 javier.canil...@gmail.com) wrote:

 Hello everyone,

 I have some time working with Cassandra, but every time I need to
 shutdown a node (for any reason like upgrading version or moving instance
 to another host) I see several errors on the client applications (yes, I'm
 using the official java driver).

 By the way, I'm starting C* as a stand-alone process
 ,
 and C* version is 3.11.0.

 The way I have implemented the shutdown process is something like the
 following:

 *# Drain all information from commitlog into sstables*

 *bin/nodetool drain *


 *cassandra_pid=`ps -ef|grep "java.*apache-cassandra"|grep -v "grep"|awk
 '{print $2}'` *
 *if [ ! -z "$cassandra_pid" ] && [ "$cassandra_pid" -ne "1" ]; then*
 *echo "Asking Cassandra to shutdown (nodetool drain doesn't
 stop cassandra)"*
 *kill $cassandra_pid*

 *echo -n "+ Checking it is down. "*
 *counter=10*
 *while [ "$counter" -ne 0 -a ! kill -0 $cassandra_pid >
 /dev/null 2>&1 ]*
 *do*
 *echo -n ". "*
 *((counter--))*
 *sleep 1s*
 *done*
 *echo ""*
 *if ! kill -0 $cassandra_pid > /dev/null 2>&1; then*
 *echo "+ Its down."*
 *else*
 *echo "- Killing Cassandra."*
 *kill -9 $cassandra_pid*
 *fi*
 *else*
 *echo "Care there was a problem finding Cassandra PID"*
 *fi*

 Should I add at the beginning the following lines?

 echo "shutdowing cassandra gracefully with: nodetool disable gossip"
 $CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool disablegossip
 echo "shutdowing cassandra gracefully with: nodetool disable binary
 protocol"
 $CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool disablebinary
 echo "shutdowing cassandra gracefully with: nodetool thrift"
 $CASSANDRA_HOME/$CASSANDRA_APP/bin/nodetool disablethrift

 The shutdown log is the following:

 *WARN  [RMI TCP Connection(10)-127.0.0.1] 2017-10-12 14:20:52,343
 StorageService.java:321 - Stopping gossip by operator request*
 *INFO  [RMI TCP Connection(10)-127.0.0.1] 2017-10-12 14:20:52,344
 Gossiper.java:1532 - Announcing shutdown*
 *INFO  [RMI TCP Connection(10)-127.0.0.1] 2017-10-12 14:20:52,355
 StorageService.java:2268 - Node /10.254.169.36  state
 jump to shutdown*
 *INFO  [RMI TCP Connection(12)-127.0.0.1] 2017-10-12 14:20:56,141
 Server.java:176 - Stop listening for CQL clients*
 *INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,472
 StorageService.java:1442 - DRAINING: starting drain process*
 *INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,474
 HintsService.java:220 - Paused hints dispatch*
 *INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,477
 Gossiper.java:1532 - Announcing shutdown*
 *INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:20:59,480
 StorageService.java:2268 - Node /127.0.0.1  state jump to
 shutdown*
 *INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:21:01,483
 MessagingService.java:984 - Waiting for messaging service to quiesce*
 *INFO  [ACCEPT-/192.168.6.174 ] 2017-10-12
 14:21:01,485 MessagingService.java:1338 - MessagingService has terminated
 the accept() thread*
 *INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:21:02,095
 HintsService.java:220 - Paused hints dispatch*
 *INFO  [RMI TCP Connection(16)-127.0.0.1] 2017-10-12 14:21:02,111
 StorageService.java:1442 

Re: Corrupted commit log prevents Cassandra start

2017-07-08 Thread Varun Gupta
If you already have a regular cadence of repair then you can set
"commit_failure_policy" to ignore in cassandra.yaml. So that C* process
does not crash on corrupt commit log.

On Fri, Jul 7, 2017 at 2:10 AM, Hannu Kröger  wrote:

> Hello,
>
> yes, that’s what we do when things like this happen.
>
> My thinking is just that when commit log is corrupted, you cannot really
> do anything else but exactly those steps. Delete corrupted file and run
> repair after starting. At least I haven’t heard of any tools for salvaging
> commit log sections.
>
> Current behaviour gives DBA control over when to do those things and of
> course DBA realizes this way that things didn’t go ok but that’s about it.
> There is no alternative way of healing the system or anything.
>
> Hannu
>
> On 7 July 2017 at 12:03:06, benjamin roth (brs...@gmail.com) wrote:
>
> Hi Hannu,
>
> I remember there have been discussions about this in the past. Most
> probably there is already a JIRA for this.
> I roughly remember a consense like that:
> - Default behaviour should remain
> - It should be configurable to the needs and preferences of the DBA
> - It should at least spit out errors in the logs
>
> ... of course it would be even better to have the underlying issue fixed
> that commit logs should not be corrupt but I remember that this is not so
> easy due to some "architectural implications" of Cassandra. IIRC Ed
> Capriolo posted something related to that some months ago.
>
> For a quick fix, I'd recommend:
> - Delete the affected log file
> - Start the node
> - Run a full-range (not -pr) repair on that node
>
> 2017-07-07 10:57 GMT+02:00 Hannu Kröger :
>
>> Hello,
>>
>> We had a test server crashing for some reason (not related to Cassandra
>> probably) and now when trying to start cassandra, it gives following error:
>>
>> ERROR [main] 2017-07-06 09:29:56,140 JVMStabilityInspector.java:82 -
>> Exiting due to error while processing commit log during initialization.
>> org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException:
>> Mutation checksum failure at 24240116 in Next section at 24239690 in
>> CommitLog-6-1498576271195.log
>> at 
>> org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:332)
>> [apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.commitlog.CommitLogReader.readCommit
>> LogSegment(CommitLogReader.java:201) [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:84)
>> [apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFi
>> les(CommitLogReplayer.java:140) [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:177)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:158)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:326)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735)
>> [apache-cassandra-3.10.jar:3.10]
>>
>> Shouldn’t Cassandra tolerate this situation?
>>
>> Of course we can delete commit logs and life goes on. But isn’t this a
>> bug or something?
>>
>> Hannu
>>
>>
>


Re: "nodetool repair -dc"

2017-07-08 Thread Varun Gupta
I do not see the need to run repair, as long as cluster was in healthy
state on adding new nodes.

On Fri, Jul 7, 2017 at 8:37 AM, vasu gunja  wrote:

> Hi ,
>
> I have a question regarding "nodetool repair -dc" option. recently we
> added multiple nodes to one DC center, we want to perform repair only on
> current DC.
>
> Here is my question.
>
> Do we need to perform "nodetool repair -dc" on all nodes belongs to that
> DC ?
> or only one node of that DC?
>
>
>
> thanks,
> V
>


[IMPORTANT UPDATE]: PLEASE DO NOT UPDATE SCHEMA

2017-06-27 Thread Varun Gupta
Hi Cassandra-Users,

C* 3.0.13 RELEASE HAS A CORNER CASE BUG ON SCHEMA UPDATE, WHICH CORRUPTS
THE DATA. PLEASE DO NOT UPDATE SCHEMA. OUR TEAM IS WORKING ON FIXING THE
ISSUE!

Thanks, Varun


Re: Convert single node C* to cluster (rebalancing problem)

2017-06-15 Thread Varun Gupta
Akhil,

As per the blog, nodetool status shows data size for node1 even for token
ranges it does not own. Ain't this is bug in Cassandra?

Yes, on disk data will be present but it should be reflected in nodetool
status.

On Thu, Jun 15, 2017 at 6:17 PM, Akhil Mehra  wrote:

> Hi,
>
> I put together a blog explaining possible reasons for an unbalanced
> Cassandra nodes.
>
> http://abiasforaction.net/unbalanced-cassandra-cluster/
>
> Let me know if you have any questions.
>
> Cheers,
> Akhil
>
>
> On Thu, Jun 15, 2017 at 5:54 PM, Affan Syed  wrote:
>
>> John,
>>
>> I am a co-worker with Junaid -- he is out sick, so just wanted to confirm
>> that one of your shots in the dark is correct. This is a RF of 1x
>>
>> "CREATE KEYSPACE orion WITH replication = {'class': 'SimpleStrategy',
>> 'replication_factor': '1'}  AND durable_writes = true;"
>>
>> However, how does the RF affect the redistribution of key/data?
>>
>> Affan
>>
>> - Affan
>>
>> On Wed, Jun 14, 2017 at 1:16 AM, John Hughes 
>> wrote:
>>
>>> OP, I was just looking at your original numbers and I have some
>>> questions:
>>>
>>> 270GB on one node and 414KB on the other, but something close to 50/50
>>> on "Owns(effective)".
>>> What replication factor are your keyspaces set up with? 1x or 2x or ??
>>>
>>> I would say you are seeing 50/50 because the tokens are allocated
>>> 50/50(others on the list please correct what are for me really just
>>> assumptions), but I would hazard a guess that your replication factor
>>> is still 1x, so it isn't moving anything around. Or your keyspace
>>> rplication is incorrect and isn't being distributed(I have had issues with
>>> the AWSMultiRegionSnitch and not getting the region correct[us-east vs
>>> us-east-1). It doesn't throw an error, but it doesn't work very well either
>>> =)
>>>
>>> Can you do a 'describe keyspace XXX' and show the first line(the CREATE
>>> KEYSPACE line).
>>>
>>> Mind you, these are all just shots in the dark from here.
>>>
>>> Cheers,
>>>
>>>
>>> On Tue, Jun 13, 2017 at 3:13 AM Junaid Nasir  wrote:
>>>
 Is the OP expecting a perfect 50%/50% split?


 best result I got was 240gb/30gb split, which I think is not properly
 balanced.


> Also, what are your outputs when you call out specific keyspaces? Do
> the numbers get more even?


 i don't know what you mean by *call out specific key spaces?* can you
 please explain that a bit.


 If your schema is not modelled correctly you can easily end up unevenly
> distributed data.


 I think that is the problem. initial 270gb data might not by modeled
 correctly. I have run a lot of tests on 270gb data including downsizing it
 to 5gb, they all resulted in same uneven distribution. I also tested a
 dummy dataset of 2gb which was balanced evenly. coming from rdb, I didn't
 give much thought to data modeling. can anyone please point me to some
 resources regarding this problem.

 On Tue, Jun 13, 2017 at 3:24 AM, Akhil Mehra 
 wrote:

> Great point John.
>
> The OP should also note that data distribution also depends on your
> schema and incoming data profile.
>
> If your schema is not modelled correctly you can easily end up
> unevenly distributed data.
>
> Cheers,
> Akhil
>
> On Tue, Jun 13, 2017 at 3:36 AM, John Hughes 
> wrote:
>
>> Is the OP expecting a perfect 50%/50% split? That, to my experience,
>> is not going to happen, it is almost always shifted from a fraction of a
>> percent to a couple percent.
>>
>> Datacenter: eu-west
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  AddressLoad   Tokens   Owns (effective)  Host ID
>>   Rack
>> UN  XX.XX.XX.XX22.71 GiB  256  47.6%
>> 57dafdde-2f62-467c-a8ff-c91e712f89c9  1c
>> UN  XX.XX.XX.XX  17.17 GiB  256  51.3%
>> d2a65c51-087d-48de-ae1f-a41142eb148d  1b
>> UN  XX.XX.XX.XX  26.15 GiB  256  52.4%
>> acf5dd34-5b81-4e5b-b7be-85a7fccd8e1c  1c
>> UN  XX.XX.XX.XX   16.64 GiB  256  50.2%
>> 6c8842dd-a966-467c-a7bc-bd6269ce3e7e  1a
>> UN  XX.XX.XX.XX  24.39 GiB  256  49.8%
>> fd92525d-edf2-4974-8bc5-a350a8831dfa  1a
>> UN  XX.XX.XX.XX   23.8 GiB   256  48.7%
>> bdc597c0-718c-4ef6-b3ef-7785110a9923  1b
>>
>> Though maybe part of what you are experiencing can be cleared up by
>> repair/compaction/cleanup. Also, what are your outputs when you call out
>> specific keyspaces? Do the numbers get more even?
>>
>> Cheers,
>>
>> On Mon, Jun 12, 2017 at 5:22 AM Akhil Mehra 
>> wrote:
>>
>>> auto_bootstrap is true by default. Ensure its set 

Re: [Cassandra] Ignoring interval time

2017-05-30 Thread Varun Gupta
Can you please check Cassandra Stats, if cluster is under too much load.
This is the symptom, not the root cause.

On Tue, May 30, 2017 at 2:33 AM, Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> Hi All,
>
>
>
> Please let me know why this debug log is coming:
>
>
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:31,496 FailureDetector.java:456 -
> Ignoring interval time of 2000686406 for /XXX.XX.XXX.204
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:34,497 FailureDetector.java:456 -
> Ignoring interval time of 2349724693 <(234)%20972-4693> for
> /XXX.XX.XXX.207
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:34,497 FailureDetector.java:456 -
> Ignoring interval time of 2000655389 for /XXX.XX.XXX.206
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:34,497 FailureDetector.java:456 -
> Ignoring interval time of 2000721304 for /XXX.XX.XXX.201
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:34,497 FailureDetector.java:456 -
> Ignoring interval time of 2000770809 for /XXX.XX.XXX.202
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:34,497 FailureDetector.java:456 -
> Ignoring interval time of 2000825217 for /XXX.XX.XXX.209
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:35,449 FailureDetector.java:456 -
> Ignoring interval time of 2953167747 for /XXX.XX.XXX.205
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:37,497 FailureDetector.java:456 -
> Ignoring interval time of 2047662469 <(204)%20766-2469> for
> /XXX.XX.XXX.205
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:37,497 FailureDetector.java:456 -
> Ignoring interval time of 2000717144 for /XXX.XX.XXX.207
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:37,497 FailureDetector.java:456 -
> Ignoring interval time of 2000780785 for /XXX.XX.XXX.201
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:38,497 FailureDetector.java:456 -
> Ignoring interval time of 2000113606 for /XXX.XX.XXX.209
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:39,121 FailureDetector.java:456 -
> Ignoring interval time of 2334491585 for /XXX.XX.XXX.204
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:39,497 FailureDetector.java:456 -
> Ignoring interval time of 2000209788 for /XXX.XX.XXX.207
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:39,497 FailureDetector.java:456 -
> Ignoring interval time of 2000226568 for /XXX.XX.XXX.208
>
> DEBUG [GossipStage:1] 2017-05-30 15:01:42,178 FailureDetector.java:456 -
> Ignoring interval time of 2390977968 for /XXX.XX.XXX.204
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>


Re: Restarting nodes and reported load

2017-05-30 Thread Varun Gupta
Can you please check if you have incremental backup enabled and snapshots
are occupying the space.

run nodetool clearsnapshot command.

On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol 
wrote:

> It's 3-4TB per node, and by load rises, I'm talking about load as reported
> by nodetool status.
>
>
>
> On May 30 2017, at 10:25 am, daemeon reiydelle 
> wrote:
>
>> When you say "the load rises ... ", could you clarify what you mean by
>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>> in neither case would that be relevant to transient or persisted disk. Am I
>> missing something?
>>
>>
>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli 
>> wrote:
>>
>> 3-4 TB per node or in total?
>>
>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol 
>> wrote:
>>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>> wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli 
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>
>>
>> - To
> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: user-h...@cassandra.apache.org
>


Re: How to know when repair repaired something?

2017-05-30 Thread Varun Gupta
I am missing the point, why do you want to re-trigger the process post
repair. Repair will sync the data correctly.

On Mon, May 29, 2017 at 8:07 AM, Jan Algermissen  wrote:

> Hi,
>
> is it possible to extract from repair logs the writetime of the writes
> that needed to be repaired?
>
> I have some processes I would like to re-trigger from a time point if
> repair found problems.
>
> Is that useful? Possible?
>
> Jan
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Is it safe to upgrade 2.2.6 to 3.0.13?

2017-05-19 Thread Varun Gupta
We upgraded from 2.2.5 to 3.0.11 and it works fine. I will suggest not to go 
with 3.013, we are seeing some issues with schema mismatch due to which we had 
to rollback to 3.0.11.

Thanks,
Varun

> On May 19, 2017, at 7:43 AM, Stefano Ortolani  wrote:
> 
> Here (https://github.com/apache/cassandra/blob/cassandra-3.0/NEWS.txt) is 
> stated that the minimum supported version for the 2.2.X branch is 2.2.2.
> 
>> On Fri, May 19, 2017 at 2:16 PM, Nicolas Guyomar  
>> wrote:
>> Hi Xihui,
>> 
>> I was looking for this documentation also, but I believe datastax removed 
>> it, and it is not available yet on the apache website
>> 
>> As far as I remember, intermediate version was needed if  C* Version < 2.1.7.
>> 
>> You should be safe starting from 2.2.6, but testing the upgrade on a 
>> dedicated platform is always a good idea.
>> 
>> Nicolas
>> 
>>> On 19 May 2017 at 09:02, Xihui He  wrote:
>>> Hi All,
>>> 
>>> We are planning to upgrade our production cluster to 3.x, but I can't find 
>>> the upgrade guide anymore.
>>> Can I upgrade to 3.0.13 from 2.2.6 directly? Is a interim version necessary?
>>> 
>>> Thanks,
>>> Xihui
>> 
> 


Re: Cassandra Server 3.10 unable to Start after crash - commitlog needs to be removed

2017-05-19 Thread Varun Gupta
Yes the bugs need to be fixed, but as a work around on dev environment, you can 
enable cassandra.yaml option to override any corrupted commit log file.


Thanks,
Varun

> On May 19, 2017, at 11:31 AM, Jeff Jirsa  wrote:
> 
> 
> 
>> On 2017-05-19 08:13 (-0700), Haris Altaf  wrote: 
>> Hi All,
>> I am using Cassandra 3.10 for my project and whenever my local windows
>> system, which is my development environment, crashes then cassandra server
>> is unable to start. I have to delete commitlog directory after every system
>> crash. This is actually annoying and what's the purpose of commitlog if it
>> itself gets crashed. I have uploaded the entire dump of Cassandra Server
>> (along with logs, commitlogs, data, configs etc) at the link below. Kindly
>> share its solution. I believe it needs to be fixed.
>> 
> 
> You need to share the exact stack trace. In cassandra 3.0+, we became much 
> less tolerant of surprises in commitlog state - perhaps a bit too aggressive, 
> failing to start in many cases when only minor things were wrong. We've 
> recently fixed a handful of these, but they may not be released yet for the 
> version you're using. 
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Nodes stopping

2017-05-11 Thread Varun Gupta
Maybe this article helps you.

http://stackoverflow.com/questions/26285133/who-sends-a-sigkill-to-my-process-mysteriously-on-ubuntu-server

On Thu, May 11, 2017 at 1:52 PM, Daniel Steuernol <dan...@sendwithus.com>
wrote:

> There is nothing in the system log about it being drained or shutdown, I'm
> not sure how else it would be pre-empted. No one else on the team is on the
> servers and I haven't been shutting them down. There also is no java memory
> dump on the server either. It appears that the process just died.
>
>
>
> On May 11 2017, at 1:36 pm, Varun Gupta <var...@uber.com> wrote:
>
>>
>> What do you mean by "no obvious error in the logs", do you see node was
>> drained or shutdown. Are you sure, no other process is calling nodetool
>> drain or shutdown, OR pre-empting cassandra process?
>>
>> On Thu, May 11, 2017 at 1:30 PM, Daniel Steuernol <dan...@sendwithus.com>
>> wrote:
>>
>>
>> I have a 6 node cassandra cluster running, and frequently a node will go
>> down with no obvious error in the logs. This is starting to happen quite
>> often, almost daily now. Any suggestions on how to track down what is
>> causing the node to stop? --
>> --- To unsubscribe, e-mail:
>> user-unsubscr...@cassandra.apache.org For additional commands, e-mail:
>> user-h...@cassandra.apache.org
>>
>>
>>


Re: Cassandra Snapshots and directories

2017-05-11 Thread Varun Gupta
I did not get your question completely, with "snapshot files are mixed with
files and backup files".

When you call nodetool snapshot, it will create a directory with snapshot
name if specified or current timestamp at
/data///backup/. This directory will
have all sstables, metadata files and schema.cql (if using 3.0.9 or higher).


On Thu, May 11, 2017 at 2:37 AM, Daniel Hölbling-Inzko <
daniel.hoelbling-in...@bitmovin.com> wrote:

> Hi,
> I am going through this guide to do backup/restore of cassandra data to a
> new cluster:
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_
> snapshot_restore_t.html#task_ds_cmf_11r_gk
>
> When creating a snapshot I get the snapshot files mixed in with the normal
> data files and backup files, so it's all over the place and very hard
> (especially with lots of tables per keyspace) to transfer ONLY the snapshot.
> (Mostly since there is a snapshot directory per table..)
>
> Am I missing something or is there some arcane shell command that filters
> out only the snapshots?
> Because this way it's much easier to just backup the whole data directory.
>
> greetings Daniel
>


Re: Nodes stopping

2017-05-11 Thread Varun Gupta
What do you mean by "no obvious error in the logs", do you see node was
drained or shutdown. Are you sure, no other process is calling nodetool
drain or shutdown, OR pre-empting cassandra process?

On Thu, May 11, 2017 at 1:30 PM, Daniel Steuernol 
wrote:

>
> I have a 6 node cassandra cluster running, and frequently a node will go
> down with no obvious error in the logs. This is starting to happen quite
> often, almost daily now. Any suggestions on how to track down what is
> causing the node to stop? --
> --- To unsubscribe, e-mail:
> user-unsubscr...@cassandra.apache.org For additional commands, e-mail:
> user-h...@cassandra.apache.org


Re: repair question (-dc option)

2017-05-11 Thread Varun Gupta
If there was no node down during that period, and you are using
LOCAL_QUORUM read/write, then yes above command works.

On Thu, May 11, 2017 at 11:59 AM, Gopal, Dhruva 
wrote:

> Hi –
>
>   I have a question on running a repair after bringing up a node that was
> down (brought down gracefully) for a few days within a data center. Can we
> just run nodetool repair –dc  on a single node (within that DC –
> specifically the downed node, after it is brought online) and have that
> entire DC repaired?
>
>
>
> Regards,
>
> Dhruva
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>


Re: Node containing all data of the cluster

2017-05-10 Thread Varun Gupta
Hi Igor,

You can setup cluster with configuration as below.

Replication: DC1: 3 and DC2: 1.

If you are using datastax java driver, then use dcaware load balancing
policy and pass DC1, as input. As well as add DC2 node in ignore nodes, so
request never goes to that node.

Thanks,
Varun

On Wed, May 10, 2017 at 1:21 PM, Igor Leão  wrote:

> Hey everyone,
>
> Imagine a have Cassandra cluster with 4 nodes.
>
> Is it possible to have a separate node which would not receive requests
> but would by in sync with the rest of the cluster? Ideally this super node
> would have all data of the cluster.
>
> I want to take a snapshot of this node from time to time in order to
> reproduce scenarios that are happening in production.
>
> Thanks in advance!
>
>
>
>
>
>
>
>


Sanity checks to run post restore data?

2016-11-30 Thread Varun Gupta
Hi,

We are periodically backing up sstables, and need to learn, what sanity
checks should be performed after restoring them?

Thanks,
Varun