Re: Rollback Cassandra after 1 node upgrade
How much data to restore and repair on that node? Sincerely, Aakash Pandhi On Friday, September 4, 2020, 11:08:56 PM CDT, manish khandelwal wrote: 3.11.2 to 2.1.16 On Sat, Sep 5, 2020 at 9:27 AM Surbhi Gupta wrote: Hi Manish, Please provide both versions. Thanks Surbhi On Fri, Sep 4, 2020 at 8:55 PM manish khandelwal wrote: Hi We have been forced into rolling back our Cassandra after 1 node upgrade. The node was upgraded 10 days ago. We have the backup of the old data. Strategy one which we are thinking : 1. Rollback to old binaries and configuration.2. Restore the old data from backup.3. Run Repair. Another Strategy is to bootstrap the node as new after changing the binaries. Which of the strategies is best? RegardsManish
Re: Cassandra Delete vs Update
Laxmikant, You mentioned that you need to filter records based on status='pending' in option-1. I don't see that filtering is done in that option. You are setting status as 'processed' when partition key is matched for table. For delete (option-2) it will completely remove whole partition for records_by_date table if that's what you want. Regards, Aakash Pandhi On Saturday, May 23, 2020, 09:09:48 AM CDT, Laxmikant Upadhyay wrote: Hi All,I have a query regarding Cassandra data modelling: I have created two tables: 1. CREATE TABLE ks.records_by_id ( id uuid PRIMARY KEY, status text, details text); 2. CREATE TABLE ks.records_by_date ( date date, id uuid, status text, PRIMARY KEY(date, id)); I need to fetch records by date and then process each of them.Which of the following options will be better when the record is processed? Option-1 : BEGIN BATCH UPDATE ks.records_by_id SET status = 'processed' WHERE id = ; UPDATE ks.records_by_date SET status = 'processed' WHERE id = and date='date1'; APPLY BATCH ; Option-2 BEGIN BATCH UPDATE ks.records_by_id SET status = 'processed' WHERE id = ; DELETE FROM ks.records_by_date WHERE id = and date='date1'; APPLY BATCH ; Option-1 will not create tombstones but i need to filter the records based of status='pending' at application layer for each date. Option-2 will create tombstone (however number of tombstones will be limited in a partition) but it will not require application side filtering. I think that we should avoid tombstones specially row-level so should go with option-1. Kindly suggest on above or any other better approach ? -- regards,Laxmikant Upadhyay
Re: Issues, understanding how CQL works
Marc, In DSE CQL offers option called CAPTURE, which can save output of query to a directed file. May be you can use that option to save all values you need in that file to see all signalids or whichever columns you need. File may grow big based on your dataset, so I am not sure what limit it imposes on file size. But if you are selecting 1 or 2 columns it should be fine I assume. Here is syntax CAPTURE | | | | CAPTURE Appends query results to a file. | | | Sincerely, Aakash Pandhi On Wednesday, April 22, 2020, 08:38:38 AM CDT, Durity, Sean R wrote: I thought this might be a single-time use case request. I think my first approach would be to use something like dsbulk to unload the data and then reload it into a table designed for the query you want to do (as long as you have adequate disk space). I think like a DBA/admin first. Dsbulk creates csv files, so you could move that data to any kind of database, if you chose. An alternative approach would be to use a driver that supports paging (I think this would be most of them) and write a program to walk the data set and output what you need in whatever format you need. Or, since this is a single node scenario, you could try sstable2json to export the sstables (files on disk) into JSON, if that is a more workable format for you. Sean Durity – Staff Systems Engineer, Cassandra -Original Message- From: Marc Richter Sent: Wednesday, April 22, 2020 6:22 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Issues, understanding how CQL works Hi Jeff, thank you for your exhaustive and verbose answer! Also, a very big "Thank you!" to all the other replyers; I hope you understand that I summarize all your feedback in this single answer. From what I understand from your answers, Cassandra seems to be optimized to store (and read) data in only exactly that way that the data structure has been designed for. That makes it very inflexible, but allows it to do that single job very effectively for a trade-off. I also understand, the more I dig into Cassandra, that the team I am supporting is using Cassandra kind of wrong; they for example do have only one node and so do not use neither the load-balancing, nor the redundancy-capabilities Cassandra offers. Thus, maybe relevant side-note: All the data resides on just one single node; maybe that info is important, because we know on which node the data is (I know that Cassandra internally is applying the same Hashing - Voodoo as if there were 1k nodes, but maybe this is important anyways). Anyways: I do not really care if a query or effort to find this information is sub-optimal or very "expensive" in means of effectivity or system-load, since this isn't something that I need to extract on a regular basis, but only once. Due to that, it doesn't need to be optimal or effective; I also do not care if it blocks the node for several hours, since Cassandra is only working on this single request. I really need this info (most recent "insertdate") only once. Is, considering this, a way to do that? > Because you didnt provide a signalid and monthyear, it doesn't know > which machine in your cluster to use to start the query. I know this already; thanks for confirming that I got this correct! But what do I do then if I do not know all "signalid"s? How to learn them? Is it maybe possible to get a full list of all "signalid"s? Or is it possible to "re-arrange" the data in the cluster or something that enables me to learn what's the most recent "insertdate"? I really do not care if I need to do some expensive copy-all-data - move, but I do not know about what is possible and how to do that. Best regards, Marc Richter On 21.04.20 19:20, Jeff Jirsa wrote: > > > On Tue, Apr 21, 2020 at 6:20 AM Marc Richter <mailto:m...@marc-richter.info>> wrote: > > Hi everyone, > > I'm very new to Cassandra. I have, however, some experience with SQL. > > > The biggest thing to remember is that Cassandra is designed to scale out > to massive clusters - like thousands of instances. To do that, you can't > assume it's ever ok to read all of the data, because that doesn't scale. > So cassandra takes shortcuts / optimizations to make it possible to > ADDRESS all of that data, but not SCAN it. > > > I need to extract some information from a Cassandra database that has > the following table definition: > > CREATE TABLE tagdata.central ( > signalid int, > monthyear int, > fromtime bigint, > totime bigint, > avg decimal, > insertdate bigint, > max decimal, > min decimal, > readings text, > PRIMARY KEY (( signalid, monthyear ), fromtime, totime) > ) > > > What your primary key REALLY MEANS is: > > The database on re
Repair and NodeSync
Hi All, I am reviewing our data sync procedures to improve so need your input on NodeSync. Are there any cons of implementing NodeSync over Repair? Is NodeSync a future direction for cluster wide data sync? Thank You,Aakash
Re: Handling Long running Cassandra Rebuild Process
A simple way to do is to measure dataset size of source DC and new DC (one you are rebuilding) every hour or so and make sure new DC dataset size is catching up. Not a very effective but helps me. We recently rebuilt a DC and watched that way. Another idea is to poll system.log for errors related with stream and send notification to yourself. Sincerely, Aakash Pandhi On Friday, March 27, 2020, 11:54:41 PM CDT, Jai Bheemsen Rao Dhanwada wrote: netstats only gives the active streams, for example if the rebuild fails because of a network issue or something there is no trace of it. regarding nohup: i am trying to create an api On Friday, March 27, 2020, Erick Ramirez wrote: If you run nodetool netstats, you would be able to see the status of the node where it would either be "building" or "normal" if it completed. While it's building, it will also show you the active streams that are in progress. Typically, most admins nohup it or at least redirect the output to a log file so you still have visibility when you lost your [SSH] session. That's also another thing to consider. Cheers! GOT QUESTIONS? Apache Cassandra experts from the community and DataStax have answers! Share your expertise on https://community.datastax. com/.
Re: Hints replays very slow in one DC
You may find throttle rate of hinted handoff on node and adjust if needed. nodetool gethintedhandoffthrottlekb and you may also set by nodetool sethintedhandoffthrottlekb I would also check disk stats where hints are stored either by sar or iostat. Sincerely, Aakash Pandhi On Wednesday, February 26, 2020, 04:36:16 PM CST, Laxmikant Upadhyay wrote: Is dc1 a simple standby DC? Or you run some operations(e.g. compute for analysis) on the same? Have you found the root cause of the oom? Do you see any specific Cassandra operation (e.g repair) is causing oom?One tip: try upgrading to 3.11.6 as lots of bugs has been fixed since 3.11.0 On Wed, Feb 26, 2020, 9:53 PM Krish Donald wrote: Nodes are going down due to Out of Memory and we are using 31GB heap size in DC1 , however in DC2 (Which serves the traffic) has 16GB heap .Why we had to increase heap in DC1 is because , DC1 nodes were going down due Out of Memory issue but DC2 nodes never went down . We also noticed below kind of messages in system.logFailureDetector.java:288 - Not marking nodes down due to local pause of 9532654114 > 50 On Tue, Feb 25, 2020 at 9:43 PM Erick Ramirez wrote: What's the reason for nodes going down? Is it because the cluster is overloaded? Hints will get handed off periodically when nodes come back to life but if they happen to go down again or become unresponsive (for whatever reason), the handoff will be delayed until the next cycle. I think it's every 5 minutes but don't quote me. Hinted MV updates can be problematic so it is a symptom but with limited info, I'm not sure that it's the cause for slow handoffs. Cheers!
Re: Mechanism to Bulk Export from Cassandra on daily Basis
John, copy is not recommended for than 2 millions rows, so copy is ruled out in your case for those 30 tables you mentioned. Sincerely, Aakash Pandhi On Wednesday, February 19, 2020, 02:26:15 PM CST, Amanda Moran wrote: HI there- DataStax recently released their bulkloader into the OSS community. I would take a look and at least try it out: https://docs.datastax.com/en/dsbulk/doc/dsbulk/dsbulkAbout.html Good luck! Amanda On Wed, Feb 19, 2020 at 12:10 PM JOHN, BIBIN wrote: Thanks for the response. We need to export into a flat file and send to another analytical application. There are 137 tables and 30 of them are have 300M+ records. So “COPY TO” taking lot of time. Thank you Bibin John From: Aakash Pandhi Sent: Wednesday, February 19, 2020 12:51 PM To: user@cassandra.apache.org Subject: Re: Mechanism to Bulk Export from Cassandra on daily Basis John, Greetings, Requirement is to just export data from table and stage it somewhere? OR export it and load them in another cluster/table? sstableloader is a utility which can help you as it is designed for bulk loading. Sincerely, Aakash Pandhi On Wednesday, February 19, 2020, 10:13:32 AM PST, JOHN, BIBIN wrote: Team, We have a requirement to bulk export data from Cassandra on daily basis? Table contain close to 600M records and cluster is having 12 nodes. What is the best approach to do this? Thanks Bibin John
Re: Mechanism to Bulk Export from Cassandra on daily Basis
John, Greetings, Requirement is to just export data from table and stage it somewhere? OR export it and load them in another cluster/table? sstableloader is a utility which can help you as it is designed for bulk loading. Sincerely, Aakash Pandhi On Wednesday, February 19, 2020, 10:13:32 AM PST, JOHN, BIBIN wrote: Team, We have a requirement to bulk export data from Cassandra on daily basis? Table contain close to 600M records and cluster is having 12 nodes. What is the best approach to do this? Thanks Bibin John