Re: Rollback Cassandra after 1 node upgrade

2020-09-04 Thread Aakash Pandhi
How much data to restore and repair on that node?

Sincerely,

Aakash Pandhi
 

On Friday, September 4, 2020, 11:08:56 PM CDT, manish khandelwal 
 wrote:  
 
 3.11.2 to 2.1.16
On Sat, Sep 5, 2020 at 9:27 AM Surbhi Gupta  wrote:

Hi Manish,
Please provide both versions.
Thanks Surbhi
On Fri, Sep 4, 2020 at 8:55 PM manish khandelwal  
wrote:

Hi 
We have been forced into rolling back our Cassandra after 1 node upgrade. The 
node was upgraded 10 days ago. We have the backup of the old data.
Strategy one which we are thinking : 1. Rollback to old binaries and 
configuration.2. Restore the old data from backup.3. Run Repair.
Another Strategy is to bootstrap the node as new after changing the binaries.
Which of the strategies is best?
RegardsManish
 

  

Re: Cassandra Delete vs Update

2020-05-23 Thread Aakash Pandhi
Laxmikant, 
You mentioned that you need to filter records based on status='pending' in 
option-1. I don't see that filtering is done in that option. You are setting 
status as 'processed' when partition key is matched for table. For delete 
(option-2) it will completely remove whole partition for records_by_date table 
if that's what you want. 
Regards,
Aakash Pandhi
 

On Saturday, May 23, 2020, 09:09:48 AM CDT, Laxmikant Upadhyay 
 wrote:  
 
 Hi All,I have a query regarding Cassandra data modelling:  I have created two 
tables:
1. CREATE TABLE ks.records_by_id ( id uuid PRIMARY KEY,  status text, details 
text);
2. CREATE TABLE ks.records_by_date ( date date, id uuid,  status text, PRIMARY 
KEY(date, id));

I need to fetch records by date and then process each of them.Which of the 
following options will be better when the record is processed?

Option-1 : 
BEGIN BATCH
UPDATE ks.records_by_id SET status = 'processed' WHERE id = ;
UPDATE ks.records_by_date SET status = 'processed' WHERE id =  and 
date='date1';
APPLY BATCH ;

Option-2
BEGIN BATCH
UPDATE ks.records_by_id SET status = 'processed' WHERE id = ;
DELETE FROM ks.records_by_date WHERE id =  and date='date1';
APPLY BATCH ;

Option-1 will not create tombstones but i need to filter the records based of 
status='pending' at application layer for each date. Option-2 will create 
tombstone (however number of tombstones will be limited in a partition) but it 
will not require application side filtering.

I think that we should avoid tombstones specially row-level so should go with 
option-1. Kindly suggest on above or any other better approach ?

-- 

regards,Laxmikant Upadhyay
  

Re: Issues, understanding how CQL works

2020-04-22 Thread Aakash Pandhi
Marc,
In DSE CQL offers option called CAPTURE, which can save output of query to a 
directed file. May be you can use that option to save all values you need in 
that file to see all signalids or whichever columns you need. File may grow big 
based on your dataset, so I am not sure what limit it imposes on file size. But 
if you are selecting 1 or 2 columns it should be fine I assume. 
Here is syntax
CAPTURE

| 
| 
|  | 
CAPTURE

Appends query results to a file.
 |

 |

 |


Sincerely,

Aakash Pandhi
 

On Wednesday, April 22, 2020, 08:38:38 AM CDT, Durity, Sean R 
 wrote:  
 
 I thought this might be a single-time use case request. I think my first 
approach would be to use something like dsbulk to unload the data and then 
reload it into a table designed for the query you want to do (as long as you 
have adequate disk space). I think like a DBA/admin first. Dsbulk creates csv 
files, so you could move that data to any kind of database, if you chose.

An alternative approach would be to use a driver that supports paging (I think 
this would be most of them) and write a program to walk the data set and output 
what you need in whatever format you need.

Or, since this is a single node scenario, you could try sstable2json to export 
the sstables (files on disk) into JSON, if that is a more workable format for 
you.

Sean Durity – Staff Systems Engineer, Cassandra

-Original Message-
From: Marc Richter 
Sent: Wednesday, April 22, 2020 6:22 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Issues, understanding how CQL works

Hi Jeff,

thank you for your exhaustive and verbose answer!
Also, a very big "Thank you!" to all the other replyers; I hope you
understand that I summarize all your feedback in this single answer.

 From what I understand from your answers, Cassandra seems to be
optimized to store (and read) data in only exactly that way that the
data structure has been designed for. That makes it very inflexible, but
allows it to do that single job very effectively for a trade-off.

I also understand, the more I dig into Cassandra, that the team I am
supporting is using Cassandra kind of wrong; they for example do have
only one node and so do not use neither the load-balancing, nor the
redundancy-capabilities Cassandra offers.
Thus, maybe relevant side-note: All the data resides on just one single
node; maybe that info is important, because we know on which node the
data is (I know that Cassandra internally is applying the same Hashing -
Voodoo as if there were 1k nodes, but maybe this is important anyways).

Anyways: I do not really care if a query or effort to find this
information is sub-optimal or very "expensive" in means of effectivity
or system-load, since this isn't something that I need to extract on a
regular basis, but only once. Due to that, it doesn't need to be optimal
or effective; I also do not care if it blocks the node for several
hours, since Cassandra is only working on this single request. I really
need this info (most recent "insertdate") only once.
Is, considering this, a way to do that?

 > Because you didnt provide a signalid and monthyear, it doesn't know
 > which machine in your cluster to use to start the query.

I know this already; thanks for confirming that I got this correct! But
what do I do then if I do not know all "signalid"s? How to learn them?

Is it maybe possible to get a full list of all "signalid"s? Or is it
possible to "re-arrange" the data in the cluster or something that
enables me to learn what's the most recent "insertdate"?
I really do not care if I need to do some expensive copy-all-data -
move, but I do not know about what is possible and how to do that.

Best regards,
Marc Richter

On 21.04.20 19:20, Jeff Jirsa wrote:
>
>
> On Tue, Apr 21, 2020 at 6:20 AM Marc Richter  <mailto:m...@marc-richter.info>> wrote:
>
>    Hi everyone,
>
>    I'm very new to Cassandra. I have, however, some experience with SQL.
>
>
> The biggest thing to remember is that Cassandra is designed to scale out
> to massive clusters - like thousands of instances. To do that, you can't
> assume it's ever ok to read all of the data, because that doesn't scale.
> So cassandra takes shortcuts / optimizations to make it possible to
> ADDRESS all of that data, but not SCAN it.
>
>
>    I need to extract some information from a Cassandra database that has
>    the following table definition:
>
>    CREATE TABLE tagdata.central (
>    signalid int,
>    monthyear int,
>    fromtime bigint,
>    totime bigint,
>    avg decimal,
>    insertdate bigint,
>    max decimal,
>    min decimal,
>    readings text,
>    PRIMARY KEY (( signalid, monthyear ), fromtime, totime)
>    )
>
>
> What your primary key REALLY MEANS is:
>
> The database on re

Repair and NodeSync

2020-04-02 Thread Aakash Pandhi
Hi All, 
I am reviewing our data sync procedures to improve so need your input on 
NodeSync. 
Are there any cons of implementing NodeSync over Repair? Is NodeSync a future 
direction for cluster wide data sync? 
Thank You,Aakash

Re: Handling Long running Cassandra Rebuild Process

2020-03-28 Thread Aakash Pandhi
A simple way to do is to measure dataset size of source DC and new DC (one you 
are rebuilding) every hour or so and make sure new DC dataset size is catching 
up. Not a very effective but helps me.  We recently rebuilt a DC and watched 
that way. Another idea is to poll system.log for errors related with stream and 
send notification to yourself.
Sincerely,

Aakash Pandhi
 

On Friday, March 27, 2020, 11:54:41 PM CDT, Jai Bheemsen Rao Dhanwada 
 wrote:  
 
 netstats only gives the active streams, for example if the rebuild fails 
because of a network issue or something there is no trace of it.
regarding nohup: i am trying to create an api

On Friday, March 27, 2020, Erick Ramirez  wrote:

If you run nodetool netstats, you would be able to see the status of the node 
where it would either be "building" or "normal" if it completed. While it's 
building, it will also show you the active streams that are in progress.
Typically, most admins nohup it or at least redirect the output to a log file 
so you still have visibility when you lost your [SSH] session. That's also 
another thing to consider. Cheers!
GOT QUESTIONS? Apache Cassandra experts from the community and DataStax have 
answers! Share your expertise on https://community.datastax. com/.


  

Re: Hints replays very slow in one DC

2020-02-26 Thread Aakash Pandhi
You may find throttle rate of hinted handoff on node and adjust if needed. 
nodetool gethintedhandoffthrottlekb and you may also set by nodetool 
sethintedhandoffthrottlekb
I would also check disk stats where hints are stored either by sar or iostat. 
Sincerely,

Aakash Pandhi
 

On Wednesday, February 26, 2020, 04:36:16 PM CST, Laxmikant Upadhyay 
 wrote:  
 
 Is dc1 a simple standby DC? Or you run some operations(e.g. compute for 
analysis) on the same? Have you found the root cause of the oom?  Do you see 
any specific Cassandra operation (e.g repair) is causing oom?One tip: try 
upgrading to 3.11.6 as lots of bugs has been fixed since 3.11.0
On Wed, Feb 26, 2020, 9:53 PM Krish Donald  wrote:

Nodes are going down due to Out of Memory and we are using 31GB heap size in 
DC1 , however in DC2 (Which serves the traffic) has 16GB heap .Why we had to 
increase heap in DC1 is because , DC1 nodes were going down due Out of Memory 
issue but DC2 nodes never went down .

We also noticed below kind of messages in system.logFailureDetector.java:288 - 
Not marking nodes down due to local pause of 9532654114 > 50



On Tue, Feb 25, 2020 at 9:43 PM Erick Ramirez  
wrote:

What's the reason for nodes going down? Is it because the cluster is 
overloaded? Hints will get handed off periodically when nodes come back to life 
but if they happen to go down again or become unresponsive (for whatever 
reason), the handoff will be delayed until the next cycle. I think it's every 5 
minutes but don't quote me.
Hinted MV updates can be problematic so it is a symptom but with limited info, 
I'm not sure that it's the cause for slow handoffs. Cheers!





  

Re: Mechanism to Bulk Export from Cassandra on daily Basis

2020-02-19 Thread Aakash Pandhi
John,
copy is not recommended for than 2 millions rows, so copy is ruled out in your 
case for those 30 tables you mentioned.

Sincerely,

Aakash Pandhi
 

On Wednesday, February 19, 2020, 02:26:15 PM CST, Amanda Moran 
 wrote:  
 
 HI there-
DataStax recently released their bulkloader into the OSS community. 

I would take a look and at least try it out: 
https://docs.datastax.com/en/dsbulk/doc/dsbulk/dsbulkAbout.html 

Good luck! 

Amanda 

On Wed, Feb 19, 2020 at 12:10 PM JOHN, BIBIN  wrote:


Thanks for the response. We need to export into a flat file and send to another 
analytical application. There are 137 tables and 30 of them are have 300M+ 
records. So “COPY TO” taking lot of time.

 

Thank you

Bibin John

 

From: Aakash Pandhi  
Sent: Wednesday, February 19, 2020 12:51 PM
To: user@cassandra.apache.org
Subject: Re: Mechanism to Bulk Export from Cassandra on daily Basis

 

John,

 

Greetings, 

 

Requirement is to just export data from table and stage it somewhere? OR export 
it and load them in another cluster/table?

 

sstableloader is a utility which can help you as it is designed for bulk 
loading.  

Sincerely,

Aakash Pandhi

 

 

On Wednesday, February 19, 2020, 10:13:32 AM PST, JOHN, BIBIN  
wrote:

 

 

Team,

We have a requirement to bulk export data from Cassandra on daily basis? Table 
contain close to 600M records and cluster is having 12 nodes. What is the best 
approach to do this?

 

 

Thanks

Bibin John

  

Re: Mechanism to Bulk Export from Cassandra on daily Basis

2020-02-19 Thread Aakash Pandhi
John,
Greetings, 
Requirement is to just export data from table and stage it somewhere? OR export 
it and load them in another cluster/table? sstableloader is a utility which can 
help you as it is designed for bulk loading.  
Sincerely,

Aakash Pandhi
 

On Wednesday, February 19, 2020, 10:13:32 AM PST, JOHN, BIBIN 
 wrote:  
 
  
Team,
 
We have a requirement to bulk export data from Cassandra on daily basis? Table 
contain close to 600M records and cluster is having 12 nodes. What is the best 
approach to do this?
 
  
 
  
 
Thanks
 
Bibin John