Dropping down replication factor
Hi folks, hopefully a quick one: We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch. It's all in one region but spread across 3 availability zones. It was nicely balanced with 4 nodes in each. But with a couple of failures and subsequent provisions to the wrong az we now have a cluster with : 5 nodes in az A 5 nodes in az B 2 nodes in az C Not sure why, but when adding a third node in AZ C it fails to stream after getting all the way to completion and no apparent error in logs. I've looked at a couple of bugs referring to scrubbing and possible OOM bugs due to metadata writing at end of streaming (sorry don't have ticket handy). I'm worried I might not be able to do much with these since the disk space usage is high and they are under a lot of load given the small number of them for this rack. Rather than troubleshoot this further, what I was thinking about doing was: - drop the replication factor on our keyspace to two - hopefully this would reduce load on these two remaining nodes - run repairs/cleanup across the cluster - then shoot these two nodes in the 'c' rack - run repairs/cleanup across the cluster Would this work with minimal/no disruption? Should I update their "rack" before hand or after ? What else am I not thinking about? My main goal atm is to get back to where the cluster is in a clean consistent state that allows nodes to properly bootstrap. Thanks for your help in advance. - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Dropping down replication factor
Thanks for replying Jeff. Responses below. On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa wrote: > Answers inline > > -- > Jeff Jirsa > > > > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote: > > > > Hi folks, hopefully a quick one: > > > > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch. It's > all in one region but spread across 3 availability zones. It was nicely > balanced with 4 nodes in each. > > > > But with a couple of failures and subsequent provisions to the wrong az > we now have a cluster with : > > > > 5 nodes in az A > > 5 nodes in az B > > 2 nodes in az C > > > > Not sure why, but when adding a third node in AZ C it fails to stream > after getting all the way to completion and no apparent error in logs. > I've looked at a couple of bugs referring to scrubbing and possible OOM > bugs due to metadata writing at end of streaming (sorry don't have ticket > handy). I'm worried I might not be able to do much with these since the > disk space usage is high and they are under a lot of load given the small > number of them for this rack. > > You'll definitely have higher load on az C instances with rf=3 in this > ratio > Streaming should still work - are you sure it's not busy doing something? > Like building secondary index or similar? jstack thread dump would be > useful, or at least nodetool tpstats > > Only other thing might be a backup. We do incrementals x1hr and snapshots x24h; they are shipped to s3 then links are cleaned up. The error I get on the node I'm trying to add to rack C is: ERROR [main] 2017-08-12 23:54:51,546 CassandraDaemon.java:583 - Exception encountered during startup java.lang.RuntimeException: Error during boostrap: Stream failed at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:87) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:944) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) [apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) [apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) [apache-cassandra-2.1.15.jar:2.1.15] Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-2.1.15.jar:2.1.15] at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-16.0.jar:na] at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:209) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:185) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:413) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:700) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:661) ~[apache-cassandra-2.1.15.jar:2.1.15] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:179) ~[apache-cassandra-2.1.15.jar:2.1.15] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_112] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_112] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112] WARN [StorageServiceShutdownHook] 2017-08-12 23:54:51,582 Gossiper.java:1462 - No local state or state is in silent shutdown, not announcing
Re: Dropping down replication factor
nothing in logs on the node that it was streaming from. however, I think I found the issue on the other node in the C rack: ERROR [STREAM-IN-/10.40.17.114] 2017-08-12 16:48:53,354 StreamSession.java:512 - [Stream #08957970-7f7e-11e7-b2a2-a31e21b877e5] Streaming error occurred org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /ephemeral/cassandra/data/... I did a 'cat /var/log/cassandra/system.log|grep Corrupt' and it seems it's a single Index.db file and nothing on the other node. I think nodetool scrub or offline sstablescrub might be in order but with the current load I'm not sure I can take it offline for very long. Thanks again for the help. On Sat, Aug 12, 2017 at 9:38 PM Jeffrey Jirsa wrote: > Compaction is backed up – that may be normal write load (because of the > rack imbalance), or it may be a secondary index build. Hard to say for > sure. ‘nodetool compactionstats’ if you’re able to provide it. The jstack > probably not necessary, streaming is being marked as failed and it’s > turning itself off. Not sure why streaming is marked as failing, though, > anything on the sending sides? > > > > > > From: Brian Spindler > Reply-To: > Date: Saturday, August 12, 2017 at 6:34 PM > To: > Subject: Re: Dropping down replication factor > > Thanks for replying Jeff. > > Responses below. > > On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa wrote: > >> Answers inline >> >> -- >> Jeff Jirsa >> >> >> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote: >> > >> > Hi folks, hopefully a quick one: >> > >> > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch. It's >> all in one region but spread across 3 availability zones. It was nicely >> balanced with 4 nodes in each. >> > >> > But with a couple of failures and subsequent provisions to the wrong az >> we now have a cluster with : >> > >> > 5 nodes in az A >> > 5 nodes in az B >> > 2 nodes in az C >> > >> > Not sure why, but when adding a third node in AZ C it fails to stream >> after getting all the way to completion and no apparent error in logs. >> I've looked at a couple of bugs referring to scrubbing and possible OOM >> bugs due to metadata writing at end of streaming (sorry don't have ticket >> handy). I'm worried I might not be able to do much with these since the >> disk space usage is high and they are under a lot of load given the small >> number of them for this rack. >> >> You'll definitely have higher load on az C instances with rf=3 in this >> ratio > > >> Streaming should still work - are you sure it's not busy doing something? >> Like building secondary index or similar? jstack thread dump would be >> useful, or at least nodetool tpstats >> >> Only other thing might be a backup. We do incrementals x1hr and > snapshots x24h; they are shipped to s3 then links are cleaned up. The > error I get on the node I'm trying to add to rack C is: > > ERROR [main] 2017-08-12 23:54:51,546 CassandraDaemon.java:583 - Exception > encountered during startup > java.lang.RuntimeException: Error during boostrap: Stream failed > at > org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:87) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:944) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391) > [apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) > [apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) > [apache-cassandra-2.1.15.jar:2.1.15] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) > ~[guava-16.0.jar:na] > at
Re: Dropping down replication factor
Hi Jeff, I ran the scrub online and that didn't help. I went ahead and stopped the node, deleted all the corrupted data files --*.db files and planned on running a repair when it came back online. Unrelated I believe, now another CF is corrupted! org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db Caused by: org.apache.cassandra.io.compress.CorruptBlockException: (/ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db): corruption detected, chunk at 101500 of length 26523398. Few days ago when troubleshooting this I did change the OpsCenter keyspace RF == 2 from 3 since I thought that would help reduce load. Did that cause this corruption? running *'nodetool scrub OpsCenter rollups300'* on that node now And now I also see this when running nodetool status: *"Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless"* What to do? I still can't stream to this new node cause of this corruption. Disk space is getting low on these nodes ... On Sat, Aug 12, 2017 at 9:51 PM Brian Spindler wrote: > nothing in logs on the node that it was streaming from. > > however, I think I found the issue on the other node in the C rack: > > ERROR [STREAM-IN-/10.40.17.114] 2017-08-12 16:48:53,354 > StreamSession.java:512 - [Stream #08957970-7f7e-11e7-b2a2-a31e21b877e5] > Streaming error occurred > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /ephemeral/cassandra/data/... > > I did a 'cat /var/log/cassandra/system.log|grep Corrupt' and it seems > it's a single Index.db file and nothing on the other node. > > I think nodetool scrub or offline sstablescrub might be in order but with > the current load I'm not sure I can take it offline for very long. > > Thanks again for the help. > > > On Sat, Aug 12, 2017 at 9:38 PM Jeffrey Jirsa wrote: > >> Compaction is backed up – that may be normal write load (because of the >> rack imbalance), or it may be a secondary index build. Hard to say for >> sure. ‘nodetool compactionstats’ if you’re able to provide it. The jstack >> probably not necessary, streaming is being marked as failed and it’s >> turning itself off. Not sure why streaming is marked as failing, though, >> anything on the sending sides? >> >> >> >> >> >> From: Brian Spindler >> Reply-To: >> Date: Saturday, August 12, 2017 at 6:34 PM >> To: >> Subject: Re: Dropping down replication factor >> >> Thanks for replying Jeff. >> >> Responses below. >> >> On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa wrote: >> >>> Answers inline >>> >>> -- >>> Jeff Jirsa >>> >>> >>> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote: >>> > >>> > Hi folks, hopefully a quick one: >>> > >>> > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch. It's >>> all in one region but spread across 3 availability zones. It was nicely >>> balanced with 4 nodes in each. >>> > >>> > But with a couple of failures and subsequent provisions to the wrong >>> az we now have a cluster with : >>> > >>> > 5 nodes in az A >>> > 5 nodes in az B >>> > 2 nodes in az C >>> > >>> > Not sure why, but when adding a third node in AZ C it fails to stream >>> after getting all the way to completion and no apparent error in logs. >>> I've looked at a couple of bugs referring to scrubbing and possible OOM >>> bugs due to metadata writing at end of streaming (sorry don't have ticket >>> handy). I'm worried I might not be able to do much with these since the >>> disk space usage is high and they are under a lot of load given the small >>> number of them for this rack. >>> >>> You'll definitely have higher load on az C instances with rf=3 in this >>> ratio >> >> >>> Streaming should still work - are you sure it's not busy doing >>> something? Like building secondary index or similar? jstack thread dump >>> would be useful, or at least nodetool tpstats >>> >>> Only other thing might be a backup. We do incrementals x1hr and >> snapshots x24h; they are shipped to s3 then links are cleaned up. The >> error I get on the node I'm trying to add to rack C is: >> >> ERROR [main] 2017-08-12 2
Re: Dropping down replication factor
Do you think with the setup I've described I'd be ok doing that now to recover this node? The node died trying to run the scrub; I've restarted it but I'm not sure it's going to get past a scrub/repair, this is why I deleted the other files as a brute force method. I think I might have to do the same here and then kick off a repair if I can't just replace it? Doing the repair on the node that had the corrupt data deleted should be ok? On Sun, Aug 13, 2017 at 10:29 AM Jeff Jirsa wrote: > Running repairs when you have corrupt sstables can spread the corruption > > In 2.1.15, corruption is almost certainly from something like a bad disk > or bad RAM > > One way to deal with corruption is to stop the node and replace is (with > -Dcassandra.replace_address) so you restream data from neighbors. The > challenge here is making sure you have a healthy replica for streaming > > Please make sure you have backups and snapshots if you have corruption > popping up > > If you're using vnodes, once you get rid of the corruption you may > consider adding another c node with fewer vnodes to try to get it joined > faster with less data. > > > -- > Jeff Jirsa > > > On Aug 13, 2017, at 7:11 AM, Brian Spindler > wrote: > > Hi Jeff, I ran the scrub online and that didn't help. I went ahead and > stopped the node, deleted all the corrupted data files --*.db > files and planned on running a repair when it came back online. > > Unrelated I believe, now another CF is corrupted! > > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db > Caused by: org.apache.cassandra.io.compress.CorruptBlockException: > (/ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db): > corruption detected, chunk at 101500 of length 26523398. > > Few days ago when troubleshooting this I did change the OpsCenter keyspace > RF == 2 from 3 since I thought that would help reduce load. Did that cause > this corruption? > > running *'nodetool scrub OpsCenter rollups300'* on that node now > > And now I also see this when running nodetool status: > > *"Note: Non-system keyspaces don't have the same replication settings, > effective ownership information is meaningless"* > > What to do? > > I still can't stream to this new node cause of this corruption. Disk > space is getting low on these nodes ... > > On Sat, Aug 12, 2017 at 9:51 PM Brian Spindler > wrote: > >> nothing in logs on the node that it was streaming from. >> >> however, I think I found the issue on the other node in the C rack: >> >> ERROR [STREAM-IN-/10.40.17.114] 2017-08-12 16:48:53,354 >> StreamSession.java:512 - [Stream #08957970-7f7e-11e7-b2a2-a31e21b877e5] >> Streaming error occurred >> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: >> /ephemeral/cassandra/data/... >> >> I did a 'cat /var/log/cassandra/system.log|grep Corrupt' and it seems >> it's a single Index.db file and nothing on the other node. >> >> I think nodetool scrub or offline sstablescrub might be in order but with >> the current load I'm not sure I can take it offline for very long. >> >> Thanks again for the help. >> >> >> On Sat, Aug 12, 2017 at 9:38 PM Jeffrey Jirsa wrote: >> >>> Compaction is backed up – that may be normal write load (because of the >>> rack imbalance), or it may be a secondary index build. Hard to say for >>> sure. ‘nodetool compactionstats’ if you’re able to provide it. The jstack >>> probably not necessary, streaming is being marked as failed and it’s >>> turning itself off. Not sure why streaming is marked as failing, though, >>> anything on the sending sides? >>> >>> >>> >>> >>> >>> From: Brian Spindler >>> Reply-To: >>> Date: Saturday, August 12, 2017 at 6:34 PM >>> To: >>> Subject: Re: Dropping down replication factor >>> >>> Thanks for replying Jeff. >>> >>> Responses below. >>> >>> On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa wrote: >>> >>>> Answers inline >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote: >>>> > >>>> > Hi folks, hopefully a quick one: >>>> > >>>> > We are running a 12 node clu
Re: Dropping down replication factor
Thanks Kurt. We had one sstable from a cf of ours. I am actually running a repair on that cf now and then plan to try and join the additional nodes as you suggest. I deleted the opscenter corrupt sstables as well but will not bother repairing that before adding capacity. Been keeping an eye across all nodes for corrupt exceptions - so far no new occurrences. Thanks again. -B On Sun, Aug 13, 2017 at 17:52 kurt greaves wrote: > > > On 14 Aug. 2017 00:59, "Brian Spindler" wrote: > > Do you think with the setup I've described I'd be ok doing that now to > recover this node? > > The node died trying to run the scrub; I've restarted it but I'm not sure > it's going to get past a scrub/repair, this is why I deleted the other > files as a brute force method. I think I might have to do the same here > and then kick off a repair if I can't just replace it? > > is it just opscenter keyspace that has corrupt sstables? if so I wouldn't > worry about repairing too much. If you can get that third node in C to join > I'd say your best bet is to just do that until you have enough nodes in C. > Dropping and increasing RF is pretty risky on a live system. > > It sounds to me like you stand a good chance of getting the new nodes in C > to join so I'd pursue that before trying anything more complicated > > > Doing the repair on the node that had the corrupt data deleted should be > ok? > > Yes. as long as you also deleted corrupt SSTables on any other nodes that > had them. > -- -Brian
Nodes just dieing with OOM
Hi guys, our cluster - around 18 nodes - just starting having nodes die and when restarting them they are dying with OOM. How can we handle this? I've tried adding a couple extra gigs on these machines to help but it's not. Help! -B
Re: Nodes just dieing with OOM
Sorry about that. We eventually found that one column family had some large/corrupt data and causing OOM's Luckily it was a pretty ephemeral data set and we were able to just truncate it. However, it was a guess based on some log messages about reading a large number of tombstones on that column families. I think we should review this column family design so it doesn't generate so many tombstones? Could that be the cause? What else would you recommend? Thank you in advance. On Fri, Oct 6, 2017 at 6:33 AM Brian Spindler wrote: > Hi guys, our cluster - around 18 nodes - just starting having nodes die > and when restarting them they are dying with OOM. How can we handle this? > I've tried adding a couple extra gigs on these machines to help but it's > not. > > Help! > -B > >
Re: Nodes just dieing with OOM
Hi Alain, thanks for getting back to me. I will read through those articles. The truncate did solve the problem. I am using Cassandra 2.1.15 I'll look at cfstats in more detail, we've got some charting from JVM metrics yeah. We're migrating from i2.xl (32GB ram, Local SSD) to m4.xl (16gb, gp2) so we have a mix there, Cassandra JVM set to 10GB When I did a truncate, Cassandra did create a snapshot which I'm hoping to copy over to a developer's machine and find the offending row(s). If it is just huge rows, that's probably more of an application leak. Is 'Compacted partition maximum bytes:' from cfstats the right thing to look at? Thanks again, -B On Fri, Oct 6, 2017 at 10:40 AM Alain RODRIGUEZ wrote: > Hello Brian. > > Sorry to hear, looks like a lot of troubles. > > I think we should review this column family design so it doesn't generate >> so many tombstones? Could that be the cause? > > > It could be indeed, did truncating solved the issue? > > There so nicer approaches you can try to handle tombstones correctly > depending on your use case. I wrote a post and presented a talk about this > last year, I hope you'll find what you are looking for. > > http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html > https://www.youtube.com/watch?v=lReTEcnzl7Y > > What else would you recommend? > > > Well we don't have much information to guess. But I will try to give you > relevant clues with what you gave us so far: > > that one column family had some large/corrupt data and causing OOM's >> > > Are you using Cassandra 3.0.x (x < 14)? You might be facing a bug in > Cassandra corrupting data after schema changes ( > https://issues.apache.org/jira/browse/CASSANDRA-13004). > > You can check large partition using 'nodetool cfstats' or using monitoring > and corresponding metric (per table / columnfamily) > > Other than that what is the memory available, the heap size and GC type > and options in use. Do you see some GC pauses in the logs or do you control > this value through a chart using JVM metrics? > > C*heers, > > ------- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > > > 2017-10-06 14:48 GMT+01:00 Brian Spindler : > >> Sorry about that. We eventually found that one column family had some >> large/corrupt data and causing OOM's >> >> Luckily it was a pretty ephemeral data set and we were able to just >> truncate it. However, it was a guess based on some log messages about >> reading a large number of tombstones on that column families. I think we >> should review this column family design so it doesn't generate so many >> tombstones? Could that be the cause? What else would you recommend? >> >> Thank you in advance. >> >> On Fri, Oct 6, 2017 at 6:33 AM Brian Spindler >> wrote: >> >>> Hi guys, our cluster - around 18 nodes - just starting having nodes die >>> and when restarting them they are dying with OOM. How can we handle this? >>> I've tried adding a couple extra gigs on these machines to help but it's >>> not. >>> >>> Help! >>> -B >>> >>> >
need to reclaim space with TWCS
Hi, I have several column families using TWCS and it’s great. Unfortunately we seem to have missed the great advice in Alex’s article here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html about setting the appropriate aggressive tombstone settings and now we have lots of timestamp overlaps and disk space to reclaim. I am trying to figure the best way out of this. Lots of the SSTables with overlapping timestamps in newer SSTables have droppable tombstones at like 0.895143957 or something similar, very close to 0.90 where the full sstable will drop afaik. I’m thinking to do the following immediately: Set *unchecked_tombstone_compaction = true* Set* tombstone_compaction_interval == TTL + gc_grace_seconds* Set* dclocal_read_repair_chance = 0.0 (currently 0.1)* If I do this, can I expect TWCS/C* to reclaim the space from those SSTables with 0.89* droppable tombstones? Or do I (can I?) manually delete these files and will c* just ignore the overlapping data and treat as tombstoned? What else should/could be done? Thank you in advance for your advice, *__* *Brian Spindler *
Re: need to reclaim space with TWCS
I probably should have mentioned our setup: we’re on Cassandra version 2.1.15. On Sat, Jan 20, 2018 at 9:33 AM Brian Spindler wrote: > Hi, I have several column families using TWCS and it’s great. > Unfortunately we seem to have missed the great advice in Alex’s article > here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html about > setting the appropriate aggressive tombstone settings and now we have lots > of timestamp overlaps and disk space to reclaim. > > > > I am trying to figure the best way out of this. Lots of the SSTables with > overlapping timestamps in newer SSTables have droppable tombstones at like > 0.895143957 or something similar, very close to 0.90 where the full sstable > will drop afaik. > > > > I’m thinking to do the following immediately: > > > > Set *unchecked_tombstone_compaction = true* > > Set* tombstone_compaction_interval == TTL + gc_grace_seconds* > > Set* dclocal_read_repair_chance = 0.0 (currently 0.1)* > > > > If I do this, can I expect TWCS/C* to reclaim the space from those > SSTables with 0.89* droppable tombstones? Or do I (can I?) manually > delete these files and will c* just ignore the overlapping data and treat > as tombstoned? > > > > What else should/could be done? > > > > Thank you in advance for your advice, > > > > *__* > > *Brian Spindler * > > > > >
Re: need to reclaim space with TWCS
Hi Alexander, Thanks for your response! I'll give it a shot. On Sat, Jan 20, 2018 at 10:22 AM Alexander Dejanovski < a...@thelastpickle.com> wrote: > Hi Brian, > > You should definitely set unchecked_tombstone_compaction to true and set > the interval to the default of 1 day. Use a tombstone_threshold of 0.6 for > example and see how that works. > Tombstones will get purged depending on your partitioning as their > partition needs to be fully contained within a single sstable. > > Deleting the sstables by hand is theoretically possible but should be kept > as a last resort option if you're running out of space. > > Cheers, > > Le sam. 20 janv. 2018 à 15:41, Brian Spindler > a écrit : > >> I probably should have mentioned our setup: we’re on Cassandra version >> 2.1.15. >> >> >> On Sat, Jan 20, 2018 at 9:33 AM Brian Spindler >> wrote: >> >>> Hi, I have several column families using TWCS and it’s great. >>> Unfortunately we seem to have missed the great advice in Alex’s article >>> here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html about >>> setting the appropriate aggressive tombstone settings and now we have lots >>> of timestamp overlaps and disk space to reclaim. >>> >>> >>> >>> I am trying to figure the best way out of this. Lots of the SSTables >>> with overlapping timestamps in newer SSTables have droppable tombstones at >>> like 0.895143957 or something similar, very close to 0.90 where the full >>> sstable will drop afaik. >>> >>> >>> >>> I’m thinking to do the following immediately: >>> >>> >>> >>> Set *unchecked_tombstone_compaction = true* >>> >>> Set* tombstone_compaction_interval == TTL + gc_grace_seconds* >>> >>> Set* dclocal_read_repair_chance = 0.0 (currently 0.1)* >>> >>> >>> >>> If I do this, can I expect TWCS/C* to reclaim the space from those >>> SSTables with 0.89* droppable tombstones? Or do I (can I?) manually >>> delete these files and will c* just ignore the overlapping data and treat >>> as tombstoned? >>> >>> >>> >>> What else should/could be done? >>> >>> >>> >>> Thank you in advance for your advice, >>> >>> >>> >>> *__* >>> >>> *Brian Spindler * >>> >>> >>> >>> >>> >> -- > - > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com >
Re: need to reclaim space with TWCS
Hi Alexander, after re-reading this https://issues.apache.org/jira/browse/CASSANDRA-13418 it seems you would recommend leaving dclocal_read_repair at maybe 10% is that true? Also, has this been patched to 2.1? https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 Cheers, -B On Sat, Jan 20, 2018 at 10:49 AM Brian Spindler wrote: > Hi Alexander, Thanks for your response! I'll give it a shot. > > On Sat, Jan 20, 2018 at 10:22 AM Alexander Dejanovski < > a...@thelastpickle.com> wrote: > >> Hi Brian, >> >> You should definitely set unchecked_tombstone_compaction to true and set >> the interval to the default of 1 day. Use a tombstone_threshold of 0.6 for >> example and see how that works. >> Tombstones will get purged depending on your partitioning as their >> partition needs to be fully contained within a single sstable. >> >> Deleting the sstables by hand is theoretically possible but should be >> kept as a last resort option if you're running out of space. >> >> Cheers, >> >> Le sam. 20 janv. 2018 à 15:41, Brian Spindler >> a écrit : >> >>> I probably should have mentioned our setup: we’re on Cassandra version >>> 2.1.15. >>> >>> >>> On Sat, Jan 20, 2018 at 9:33 AM Brian Spindler >>> wrote: >>> >>>> Hi, I have several column families using TWCS and it’s great. >>>> Unfortunately we seem to have missed the great advice in Alex’s article >>>> here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html about >>>> setting the appropriate aggressive tombstone settings and now we have lots >>>> of timestamp overlaps and disk space to reclaim. >>>> >>>> >>>> >>>> I am trying to figure the best way out of this. Lots of the SSTables >>>> with overlapping timestamps in newer SSTables have droppable tombstones at >>>> like 0.895143957 or something similar, very close to 0.90 where the full >>>> sstable will drop afaik. >>>> >>>> >>>> >>>> I’m thinking to do the following immediately: >>>> >>>> >>>> >>>> Set *unchecked_tombstone_compaction = true* >>>> >>>> Set* tombstone_compaction_interval == TTL + gc_grace_seconds* >>>> >>>> Set* dclocal_read_repair_chance = 0.0 (currently 0.1)* >>>> >>>> >>>> >>>> If I do this, can I expect TWCS/C* to reclaim the space from those >>>> SSTables with 0.89* droppable tombstones? Or do I (can I?) manually >>>> delete these files and will c* just ignore the overlapping data and treat >>>> as tombstoned? >>>> >>>> >>>> >>>> What else should/could be done? >>>> >>>> >>>> >>>> Thank you in advance for your advice, >>>> >>>> >>>> >>>> *__* >>>> >>>> *Brian Spindler * >>>> >>>> >>>> >>>> >>>> >>> -- >> - >> Alexander Dejanovski >> France >> @alexanderdeja >> >> Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> >
Re: need to reclaim space with TWCS
Got it. Thanks again. > On Jan 20, 2018, at 11:17 AM, Alexander Dejanovski > wrote: > > I would turn background read repair off on the table to improve the overlap > issue, but you'll still have foreground read repair if you use quorum reads > anyway. > > So put dclocal_... to 0.0. > > The commit you're referring to has been merged in 3.11.1 as 2.1 doesn't > patched anymore. > > >> Le sam. 20 janv. 2018 à 16:55, Brian Spindler a >> écrit : >> Hi Alexander, after re-reading this >> https://issues.apache.org/jira/browse/CASSANDRA-13418 it seems you would >> recommend leaving dclocal_read_repair at maybe 10% is that true? >> >> Also, has this been patched to 2.1? >> https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 >> >> >> Cheers, >> >> -B >> >> >>> On Sat, Jan 20, 2018 at 10:49 AM Brian Spindler >>> wrote: >>> Hi Alexander, Thanks for your response! I'll give it a shot. >>> >>>> On Sat, Jan 20, 2018 at 10:22 AM Alexander Dejanovski >>>> wrote: >>>> Hi Brian, >>>> >>>> You should definitely set unchecked_tombstone_compaction to true and set >>>> the interval to the default of 1 day. Use a tombstone_threshold of 0.6 for >>>> example and see how that works. >>>> Tombstones will get purged depending on your partitioning as their >>>> partition needs to be fully contained within a single sstable. >>>> >>>> Deleting the sstables by hand is theoretically possible but should be kept >>>> as a last resort option if you're running out of space. >>>> >>>> Cheers, >>>> >>>> >>>>> Le sam. 20 janv. 2018 à 15:41, Brian Spindler >>>>> a écrit : >>>>> I probably should have mentioned our setup: we’re on Cassandra version >>>>> 2.1.15. >>>>> >>>>> >>>>>> On Sat, Jan 20, 2018 at 9:33 AM Brian Spindler >>>>>> wrote: >>>>>> Hi, I have several column families using TWCS and it’s great. >>>>>> Unfortunately we seem to have missed the great advice in Alex’s article >>>>>> here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html about >>>>>> setting the appropriate aggressive tombstone settings and now we have >>>>>> lots of timestamp overlaps and disk space to reclaim. >>>>>> >>>>>> I am trying to figure the best way out of this. Lots of the SSTables >>>>>> with overlapping timestamps in newer SSTables have droppable tombstones >>>>>> at like 0.895143957 or something similar, very close to 0.90 where the >>>>>> full sstable will drop afaik. >>>>>> >>>>>> I’m thinking to do the following immediately: >>>>>> >>>>>> Set unchecked_tombstone_compaction = true >>>>>> Set tombstone_compaction_interval == TTL + gc_grace_seconds >>>>>> Set dclocal_read_repair_chance = 0.0 (currently 0.1) >>>>>> >>>>>> If I do this, can I expect TWCS/C* to reclaim the space from those >>>>>> SSTables with 0.89* droppable tombstones? Or do I (can I?) manually >>>>>> delete these files and will c* just ignore the overlapping data and >>>>>> treat as tombstoned? >>>>>> >>>>>> What else should/could be done? >>>>>> >>>>>> Thank you in advance for your advice, >>>>>> >>>>>> __ >>>>>> Brian Spindler >>>>>> >>>>>> >>>> >>>> -- >>>> - >>>> Alexander Dejanovski >>>> France >>>> @alexanderdeja >>>> >>>> Consultant >>>> Apache Cassandra Consulting >>>> http://www.thelastpickle.com > > -- > - > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com
Re: Cassandra Repair Duration.
Hi Karthick, repairs can be tricky. You can (and probably should) run repairs as apart of routine maintenance. And of course absolutely if you lose a node in a bad way. If you decommission a node for example, no “extra” repair needed. If you are using TWCS you should probably not run repairs on those cf. We have a combination of scripts and locks to run repairs across an 18 node cluster 1 node at a time, typically takes around 2-3days and so we run it once a week. The great folks at tlp have put together http://cassandra-reaper.io/ which makes managing repairs even easier and probably more performant since as I understand, it used range repairs. Good luck, -B > On Jan 24, 2018, at 4:57 AM, Karthick V wrote: > > Periodically I have been running Full repair process befor GC Grace period as > mentioned in the best practices.Initially, all went well but as the data size > increases Repair duration has increased drastically and we are also facing > Query timeouts during that time and we have tried incremental repair facing > some OOM issues. > > After running a repair process for more than 80 Hours we have ended up with > the question > > why can't we run a repair process if and only if a Cassandra node got a > downtime? > > Say if there is no downtime during a GC grace period Do we still face > Inconsistency among nodes? if yes, then doesn't Hinted Handoff handle those? > > Cluster Info: Having two DataCenter with 8 machines each with a disk size of > 1TB, C* v_2.1.13 and having around 420GB data each. > >> On Wed, Jan 24, 2018 at 2:46 PM, Karthick V wrote: >> Hi, >> >> >> >> >> >> >
Re: TWCS not deleting expired sstables
I would start here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html Specifically the “Hints and repairs” and “Timestamp overlap” sections might be of use. -B > On Jan 25, 2018, at 11:05 AM, Thakrar, Jayesh > wrote: > > Wondering if I can get some pointers to what's happening here and why > sstables that I think should be expired are not being dropped. > > Here's the table's compaction property - note also set > "unchecked_tombstone_compaction" to true. > > compaction = {'class': > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', > 'compaction_window_size': '7', 'compaction_window_unit': 'DAYS', > 'max_threshold': '4', 'min_threshold': '4', 'unchecked_tombstone_compaction': > 'true'} > > We insert data with timestamp and TTL programmatically. > > Here's one set of sstable that I expect to be removed: > > $ ls -lt *Data.db | tail -5 > -rw-r--r--. 1 vchadoop vchadoop 31245097312 Sep 20 17:16 mc-1308-big-Data.db > -rw-r--r--. 1 vchadoop vchadoop 31524316252 Sep 19 14:27 mc-1187-big-Data.db > -rw-r--r--. 1 vchadoop vchadoop 21405216502 Sep 18 14:14 mc-1070-big-Data.db > -rw-r--r--. 1 vchadoop vchadoop 13609890747 Sep 13 20:53 mc-178-big-Data.db > > $ date +%s > 1516895877 > > $ date > Thu Jan 25 15:58:00 UTC 2018 > > $ sstablemetadata $PWD/mc-130-big-Data.db | head -20 > SSTable: > /ae/disk1/data/ae/raw_logs_by_user-f58b9960980311e79ac26928246f09c1/mc-130-big > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Bloom Filter FP chance: 0.01 > Minimum timestamp: 14966028 > Maximum timestamp: 14980788 > SSTable min local deletion time: 1507924954 > SSTable max local deletion time: 1509400832 > Compressor: org.apache.cassandra.io.compress.LZ4Compressor > Compression ratio: 0.17430158132352797 > TTL min: 2630598 > TTL max: 4086188 > First token: -9177441867697829836 (key=823134638755651936) > Last token: 9155171035305804798 (key=395118640769012487) > minClustringValues: [-1, da, 3, 1498082382078, -9223371818124305448, > -9223371652504795402, -1] > maxClustringValues: [61818, tpt, 325, 149660280, -4611686088173246790, > 9223372014135560885, 1] > Estimated droppable tombstones: 1.1983492967652476 > SSTable Level: 0 > Repaired at: 0 > Replay positions covered: {CommitLogPosition(segmentId=1505171071629, > position=7157684)=CommitLogPosition(segmentId=1505171075152, > position=6263269)} > totalColumnsSet: 111047277 >
Re: Cassandra Repair Duration.
It's all here: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesWhen.html -B On Thu, Jan 25, 2018 at 6:08 AM Karthick V wrote: > *You can (and probably should) run repairs as apart of routine >> maintenance.* > > > Can u explain any use case for why do we need this? > > > > > > On Wed, Jan 24, 2018 at 5:35 PM, wrote: > >> Hi Karthick, repairs can be tricky. >> >> You can (and probably should) run repairs as apart of routine >> maintenance. And of course absolutely if you lose a node in a bad way. If >> you decommission a node for example, no “extra” repair needed. >> >> If you are using TWCS you should probably not run repairs on those cf. >> >> We have a combination of scripts and locks to run repairs across an 18 >> node cluster 1 node at a time, typically takes around 2-3days and so we run >> it once a week. >> >> The great folks at tlp have put together http://cassandra-reaper.io/ which >> makes managing repairs even easier and probably more performant since as I >> understand, it used range repairs. >> >> Good luck, >> -B >> >> >> On Jan 24, 2018, at 4:57 AM, Karthick V wrote: >> >> Periodically I have been running Full repair process befor GC Grace >> period as mentioned in the best practices.Initially, all went well but as >> the data size increases Repair duration has increased drastically and we >> are also facing Query timeouts during that time and we have tried >> incremental repair facing some OOM issues. >> >> After running a repair process for more than 80 Hours we have ended up >> with the question >> >> why can't we run a repair process if and only if a Cassandra node got a >> downtime? >> >> Say if there is no downtime during a GC grace period Do we still face >> Inconsistency among nodes? if yes, then doesn't Hinted Handoff handle >> those? >> >> Cluster Info: Having two DataCenter with 8 machines each with a disk size >> of 1TB, C* v_2.1.13 and having around 420GB data each. >> >> On Wed, Jan 24, 2018 at 2:46 PM, Karthick V >> wrote: >> >>> Hi, >>> >>> >>> >>> >>> >>> >>> >> >
Node won't start
Hi guys, I've got a 2.1.15 node that will not start it seems. Hangs on Opening system.size_estimates. Sometimes it can take a while but I've let it run for 90m and nothing. Should I move this sstable out of the way to let it start? will it rebuild/refresh size estimates if I remove that folder? thanks -B
Re: Node won't start
Thanks Alex. That’s exactly what I ended up doing - it did take maybe 45m to come back up though :( -B Sent from my iPhone > On Feb 3, 2018, at 9:03 AM, Alexander Dejanovski > wrote: > > Hi Brian, > > I just tested this on a CCM cluster and the node started without problem. It > flushed some new SSTables a short while after. > > I honestly do not know the specifics of how size_estimates is used, but if it > prevented a node from restarting I'd definitely remove the sstables to get it > back up. > > Cheers, > >> On Sat, Feb 3, 2018 at 1:53 PM Brian Spindler >> wrote: >> Hi guys, I've got a 2.1.15 node that will not start it seems. Hangs on >> Opening system.size_estimates. Sometimes it can take a while but I've let >> it run for 90m and nothing. Should I move this sstable out of the way to >> let it start? will it rebuild/refresh size estimates if I remove that >> folder? >> >> thanks >> -B > -- > - > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com
TWCS Compaction backed up
Hey guys, quick question: I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on one drive, data on nvme. That was working very well, it's a ts db and has been accumulating data for about 4weeks. The nodes have increased in load and compaction seems to be falling behind. I used to get about 1 file per day for this column family, about ~30GB Data.db file per day. I am now getting hundreds per day at 1mb - 50mb. How to recover from this? I can scale out to give some breathing room but will it go back and compact the old days into nicely packed files for the day? I tried setting compaction throughput to 1000 from 256 and it seemed to make things worse for the CPU, it's configured on i3.2xl with 8 compaction threads. -B Lastly, I have mixed TTLs in this CF and need to run a repair (I think) to get rid of old tombstones, however running repairs in 2.1 on TWCS column families causes a very large spike in sstable counts due to anti-compaction which causes a lot of disruption, is there any other way?
Re: TWCS Compaction backed up
Hi Jonathan, both I believe. The window size is 1 day, full settings: AND compaction = {'timestamp_resolution': 'MILLISECONDS', 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', 'tombstone_threshold': '0.2', 'class': 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} nodetool tpstats Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 068582241832 0 0 ReadStage 0 0 209566303 0 0 RequestResponseStage 0 044680860850 0 0 ReadRepairStage 0 0 24562722 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 1 1203 0 0 GossipStage 0 08471784 0 0 CacheCleanupExecutor 0 0122 0 0 InternalResponseStage 0 0 552125 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor8421433715 0 0 ValidationExecutor0 0 2521 0 0 MigrationStage0 0 527549 0 0 AntiEntropyStage 0 0 7697 0 0 PendingRangeCalculator0 0 17 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 116966 0 0 MemtablePostFlush 0 0 209103 0 0 MemtableReclaimMemory 0 0 116966 0 0 Native-Transport-Requests 1 0 1715937778 0 176262 Message type Dropped READ 2 RANGE_SLICE 0 _TRACE 0 MUTATION 4390 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 1882 PAGED_RANGE 0 READ_REPAIR 0 On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad wrote: > What's your window size? > > When you say backed up, how are you measuring that? Are there pending > tasks or do you just see more files than you expect? > > On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler > wrote: > >> Hey guys, quick question: >> >> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on >> one drive, data on nvme. That was working very well, it's a ts db and has >> been accumulating data for about 4weeks. >> >> The nodes have increased in load and compaction seems to be falling >> behind. I used to get about 1 file per day for this column family, about >> ~30GB Data.db file per day. I am now getting hundreds per day at 1mb - >> 50mb. >> >> How to recover from this? >> >> I can scale out to give some breathing room but will it go back and >> compact the old days into nicely packed files for the day? >> >> I tried setting compaction throughput to 1000 from 256 and it seemed to >> make things worse for the CPU, it's configured on i3.2xl with 8 compaction >> threads. >> >> -B >> >> Lastly, I have mixed TTLs in this CF and need to run a repair (I think) >> to get rid of old tombstones, however running repairs in 2.1 on TWCS column >> families causes a very large spike in sstable counts due to anti-compaction >> which causes a lot of disruption, is there any other way? >> >> >> > > -- > Jon Haddad > http://www.rustyrazorblade.com > twitter: rustyrazorblade >
Re: TWCS Compaction backed up
Hi Jeff, mostly lots of little files, like there will be 4-5 that are 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each. Re incremental repair; Yes one of my engineers started an incremental repair on this column family that we had to abort. In fact, the node that the repair was initiated on ran out of disk space and we ended replacing that node like a dead node. Oddly the new node is experiencing this issue as well. -B On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa wrote: > You could toggle off the tombstone compaction to see if that helps, but > that should be lower priority than normal compactions > > Are the lots-of-little-files from memtable flushes or > repair/anticompaction? > > Do you do normal deletes? Did you try to run Incremental repair? > > -- > Jeff Jirsa > > > On Aug 7, 2018, at 5:00 PM, Brian Spindler > wrote: > > Hi Jonathan, both I believe. > > The window size is 1 day, full settings: > AND compaction = {'timestamp_resolution': 'MILLISECONDS', > 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', > 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', > 'tombstone_threshold': '0.2', 'class': > 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} > > > nodetool tpstats > > Pool NameActive Pending Completed Blocked > All time blocked > MutationStage 0 068582241832 0 > 0 > ReadStage 0 0 209566303 0 > 0 > RequestResponseStage 0 044680860850 0 > 0 > ReadRepairStage 0 0 24562722 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > MiscStage 0 0 0 0 > 0 > HintedHandoff 1 1203 0 > 0 > GossipStage 0 08471784 0 > 0 > CacheCleanupExecutor 0 0122 0 > 0 > InternalResponseStage 0 0 552125 0 > 0 > CommitLogArchiver 0 0 0 0 > 0 > CompactionExecutor8421433715 0 > 0 > ValidationExecutor0 0 2521 0 > 0 > MigrationStage0 0 527549 0 > 0 > AntiEntropyStage 0 0 7697 0 > 0 > PendingRangeCalculator0 0 17 0 > 0 > Sampler 0 0 0 0 > 0 > MemtableFlushWriter 0 0 116966 0 > 0 > MemtablePostFlush 0 0 209103 0 > 0 > MemtableReclaimMemory 0 0 116966 0 > 0 > Native-Transport-Requests 1 0 1715937778 0 > 176262 > > Message type Dropped > READ 2 > RANGE_SLICE 0 > _TRACE 0 > MUTATION 4390 > COUNTER_MUTATION 0 > BINARY 0 > REQUEST_RESPONSE 1882 > PAGED_RANGE 0 > READ_REPAIR 0 > > > On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad wrote: > >> What's your window size? >> >> When you say backed up, how are you measuring that? Are there pending >> tasks or do you just see more files than you expect? >> >> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler >> wrote: >> >>> Hey guys, quick question: >>> >>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on >>> one drive, data on nvme. That was working very well, it's a ts db and has >>> been accumulating data for about 4weeks. >>> >>> The nodes have increased in load and compaction seems to be falling >>> behind. I used to get about 1 file per day for this column family, about >>> ~30GB Data.db file per day. I am now getting hundreds per day at 1mb - >>> 50mb. >>> >>> How to recover from this? >>> >&
Re: TWCS Compaction backed up
Everything is ttl’d I suppose I could use sstablemeta to see the repaired bit, could I just set that to unrepaired somehow and that would fix? Thanks! > On Aug 7, 2018, at 8:12 PM, Jeff Jirsa wrote: > > May be worth seeing if any of the sstables got promoted to repaired - if so > they’re not eligible for compaction with unrepaired sstables and that could > explain some higher counts > > Do you actually do deletes or is everything ttl’d? > > > -- > Jeff Jirsa > > >> On Aug 7, 2018, at 5:09 PM, Brian Spindler wrote: >> >> Hi Jeff, mostly lots of little files, like there will be 4-5 that are >> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each. >> >> Re incremental repair; Yes one of my engineers started an incremental repair >> on this column family that we had to abort. In fact, the node that the >> repair was initiated on ran out of disk space and we ended replacing that >> node like a dead node. >> >> Oddly the new node is experiencing this issue as well. >> >> -B >> >> >>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa wrote: >>> You could toggle off the tombstone compaction to see if that helps, but >>> that should be lower priority than normal compactions >>> >>> Are the lots-of-little-files from memtable flushes or repair/anticompaction? >>> >>> Do you do normal deletes? Did you try to run Incremental repair? >>> >>> -- >>> Jeff Jirsa >>> >>> >>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler >>>> wrote: >>>> >>>> Hi Jonathan, both I believe. >>>> >>>> The window size is 1 day, full settings: >>>> AND compaction = {'timestamp_resolution': 'MILLISECONDS', >>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', >>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': >>>> '86400', 'tombstone_threshold': '0.2', 'class': >>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} >>>> >>>> >>>> nodetool tpstats >>>> >>>> Pool NameActive Pending Completed Blocked >>>> All time blocked >>>> MutationStage 0 068582241832 0 >>>> 0 >>>> ReadStage 0 0 209566303 0 >>>> 0 >>>> RequestResponseStage 0 044680860850 0 >>>> 0 >>>> ReadRepairStage 0 0 24562722 0 >>>> 0 >>>> CounterMutationStage 0 0 0 0 >>>> 0 >>>> MiscStage 0 0 0 0 >>>> 0 >>>> HintedHandoff 1 1203 0 >>>> 0 >>>> GossipStage 0 08471784 0 >>>> 0 >>>> CacheCleanupExecutor 0 0122 0 >>>> 0 >>>> InternalResponseStage 0 0 552125 0 >>>> 0 >>>> CommitLogArchiver 0 0 0 0 >>>> 0 >>>> CompactionExecutor8421433715 0 >>>> 0 >>>> ValidationExecutor0 0 2521 0 >>>> 0 >>>> MigrationStage0 0 527549 0 >>>> 0 >>>> AntiEntropyStage 0 0 7697 0 >>>> 0 >>>> PendingRangeCalculator0 0 17 0 >>>> 0 >>>> Sampler 0 0 0 0 >>>> 0 >>>> MemtableFlushWriter 0 0 116966 0 >>>> 0 >>>> MemtablePostFlush 0 0 209103 0 >>>>
Re: TWCS Compaction backed up
Hi, I spot checked a couple of the files that were ~200MB and the mostly had "Repaired at: 0" so maybe that's not it? -B On Tue, Aug 7, 2018 at 8:16 PM wrote: > Everything is ttl’d > > I suppose I could use sstablemeta to see the repaired bit, could I just > set that to unrepaired somehow and that would fix? > > Thanks! > > On Aug 7, 2018, at 8:12 PM, Jeff Jirsa wrote: > > May be worth seeing if any of the sstables got promoted to repaired - if > so they’re not eligible for compaction with unrepaired sstables and that > could explain some higher counts > > Do you actually do deletes or is everything ttl’d? > > > -- > Jeff Jirsa > > > On Aug 7, 2018, at 5:09 PM, Brian Spindler > wrote: > > Hi Jeff, mostly lots of little files, like there will be 4-5 that are > 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each. > > Re incremental repair; Yes one of my engineers started an incremental > repair on this column family that we had to abort. In fact, the node that > the repair was initiated on ran out of disk space and we ended replacing > that node like a dead node. > > Oddly the new node is experiencing this issue as well. > > -B > > > On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa wrote: > >> You could toggle off the tombstone compaction to see if that helps, but >> that should be lower priority than normal compactions >> >> Are the lots-of-little-files from memtable flushes or >> repair/anticompaction? >> >> Do you do normal deletes? Did you try to run Incremental repair? >> >> -- >> Jeff Jirsa >> >> >> On Aug 7, 2018, at 5:00 PM, Brian Spindler >> wrote: >> >> Hi Jonathan, both I believe. >> >> The window size is 1 day, full settings: >> AND compaction = {'timestamp_resolution': 'MILLISECONDS', >> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', >> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', >> 'tombstone_threshold': '0.2', 'class': >> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} >> >> >> nodetool tpstats >> >> Pool NameActive Pending Completed Blocked >> All time blocked >> MutationStage 0 068582241832 0 >> 0 >> ReadStage 0 0 209566303 0 >> 0 >> RequestResponseStage 0 044680860850 0 >> 0 >> ReadRepairStage 0 0 24562722 0 >> 0 >> CounterMutationStage 0 0 0 0 >> 0 >> MiscStage 0 0 0 0 >> 0 >> HintedHandoff 1 1203 0 >> 0 >> GossipStage 0 08471784 0 >> 0 >> CacheCleanupExecutor 0 0122 0 >> 0 >> InternalResponseStage 0 0 552125 0 >> 0 >> CommitLogArchiver 0 0 0 0 >> 0 >> CompactionExecutor8421433715 0 >> 0 >> ValidationExecutor0 0 2521 0 >> 0 >> MigrationStage0 0 527549 0 >> 0 >> AntiEntropyStage 0 0 7697 0 >> 0 >> PendingRangeCalculator0 0 17 0 >> 0 >> Sampler 0 0 0 0 >> 0 >> MemtableFlushWriter 0 0 116966 0 >> 0 >> MemtablePostFlush 0 0 209103 0 >> 0 >> MemtableReclaimMemory 0 0 116966 0 >> 0 >> Native-Transport-Requests 1 0 1715937778 0 >> 176262 >> >> Message type Dropped >> READ 2 >> RANGE_SLICE 0 >> _TRACE 0 >> MUTATION 4390 >> COUNTER_MUTATI
Re: TWCS Compaction backed up
In fact all of them say Repaired at: 0. On Tue, Aug 7, 2018 at 9:13 PM Brian Spindler wrote: > Hi, I spot checked a couple of the files that were ~200MB and the mostly > had "Repaired at: 0" so maybe that's not it? > > -B > > > On Tue, Aug 7, 2018 at 8:16 PM wrote: > >> Everything is ttl’d >> >> I suppose I could use sstablemeta to see the repaired bit, could I just >> set that to unrepaired somehow and that would fix? >> >> Thanks! >> >> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa wrote: >> >> May be worth seeing if any of the sstables got promoted to repaired - if >> so they’re not eligible for compaction with unrepaired sstables and that >> could explain some higher counts >> >> Do you actually do deletes or is everything ttl’d? >> >> >> -- >> Jeff Jirsa >> >> >> On Aug 7, 2018, at 5:09 PM, Brian Spindler >> wrote: >> >> Hi Jeff, mostly lots of little files, like there will be 4-5 that are >> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each. >> >> Re incremental repair; Yes one of my engineers started an incremental >> repair on this column family that we had to abort. In fact, the node that >> the repair was initiated on ran out of disk space and we ended replacing >> that node like a dead node. >> >> Oddly the new node is experiencing this issue as well. >> >> -B >> >> >> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa wrote: >> >>> You could toggle off the tombstone compaction to see if that helps, but >>> that should be lower priority than normal compactions >>> >>> Are the lots-of-little-files from memtable flushes or >>> repair/anticompaction? >>> >>> Do you do normal deletes? Did you try to run Incremental repair? >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Aug 7, 2018, at 5:00 PM, Brian Spindler >>> wrote: >>> >>> Hi Jonathan, both I believe. >>> >>> The window size is 1 day, full settings: >>> AND compaction = {'timestamp_resolution': 'MILLISECONDS', >>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', >>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', >>> 'tombstone_threshold': '0.2', 'class': >>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} >>> >>> >>> nodetool tpstats >>> >>> Pool NameActive Pending Completed Blocked >>> All time blocked >>> MutationStage 0 068582241832 0 >>>0 >>> ReadStage 0 0 209566303 0 >>>0 >>> RequestResponseStage 0 044680860850 0 >>>0 >>> ReadRepairStage 0 0 24562722 0 >>>0 >>> CounterMutationStage 0 0 0 0 >>>0 >>> MiscStage 0 0 0 0 >>>0 >>> HintedHandoff 1 1203 0 >>>0 >>> GossipStage 0 08471784 0 >>>0 >>> CacheCleanupExecutor 0 0122 0 >>>0 >>> InternalResponseStage 0 0 552125 0 >>>0 >>> CommitLogArchiver 0 0 0 0 >>>0 >>> CompactionExecutor8421433715 0 >>>0 >>> ValidationExecutor0 0 2521 0 >>>0 >>> MigrationStage0 0 527549 0 >>>0 >>> AntiEntropyStage 0 0 7697 0 >>>0 >>> PendingRangeCalculator0 0 17 0 >>>0 >>> Sampler 0 0 0 0 >>>0 >>> MemtableFlushWriter 0 0 116966
Re: TWCS Compaction backed up
Hi Jeff/Jon et al, here is what I'm thinking to do to clean up, please lmk what you think. This is precisely my problem I believe: http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html With this I have a lot of wasted space due to a bad incremental repair. So I am thinking to abandon incremental repairs by; - Set all repairedAt values to 0 on any/all *Data.db SSTables - using either range_repair.py or reaper run sub range repairs Will this clean everything up? On Tue, Aug 7, 2018 at 9:18 PM Brian Spindler wrote: > In fact all of them say Repaired at: 0. > > On Tue, Aug 7, 2018 at 9:13 PM Brian Spindler > wrote: > >> Hi, I spot checked a couple of the files that were ~200MB and the mostly >> had "Repaired at: 0" so maybe that's not it? >> >> -B >> >> >> On Tue, Aug 7, 2018 at 8:16 PM wrote: >> >>> Everything is ttl’d >>> >>> I suppose I could use sstablemeta to see the repaired bit, could I just >>> set that to unrepaired somehow and that would fix? >>> >>> Thanks! >>> >>> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa wrote: >>> >>> May be worth seeing if any of the sstables got promoted to repaired - if >>> so they’re not eligible for compaction with unrepaired sstables and that >>> could explain some higher counts >>> >>> Do you actually do deletes or is everything ttl’d? >>> >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Aug 7, 2018, at 5:09 PM, Brian Spindler >>> wrote: >>> >>> Hi Jeff, mostly lots of little files, like there will be 4-5 that are >>> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each. >>> >>> Re incremental repair; Yes one of my engineers started an incremental >>> repair on this column family that we had to abort. In fact, the node that >>> the repair was initiated on ran out of disk space and we ended replacing >>> that node like a dead node. >>> >>> Oddly the new node is experiencing this issue as well. >>> >>> -B >>> >>> >>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa wrote: >>> >>>> You could toggle off the tombstone compaction to see if that helps, but >>>> that should be lower priority than normal compactions >>>> >>>> Are the lots-of-little-files from memtable flushes or >>>> repair/anticompaction? >>>> >>>> Do you do normal deletes? Did you try to run Incremental repair? >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler >>>> wrote: >>>> >>>> Hi Jonathan, both I believe. >>>> >>>> The window size is 1 day, full settings: >>>> AND compaction = {'timestamp_resolution': 'MILLISECONDS', >>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', >>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', >>>> 'tombstone_threshold': '0.2', 'class': >>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} >>>> >>>> >>>> nodetool tpstats >>>> >>>> Pool NameActive Pending Completed Blocked >>>> All time blocked >>>> MutationStage 0 068582241832 0 >>>>0 >>>> ReadStage 0 0 209566303 0 >>>>0 >>>> RequestResponseStage 0 044680860850 0 >>>>0 >>>> ReadRepairStage 0 0 24562722 0 >>>>0 >>>> CounterMutationStage 0 0 0 0 >>>>0 >>>> MiscStage 0 0 0 0 >>>>0 >>>> HintedHandoff 1 1203 0 >>>>0 >>>> GossipStage 0 08471784 0 >>>>0 >>>> CacheCleanupExecutor 0 0122 0 >>>>0 >>>> Interna
Re: Upgrade from 2.1 to 3.11
Ma, did you try what Mohamadreza suggested? Have a such a large heap means you are getting a ton of stuff that needs full GC. On Tue, Aug 28, 2018 at 4:31 AM Pradeep Chhetri wrote: > You may want to try upgrading to 3.11.3 instead which has some memory > leaks fixes. > > On Tue, Aug 28, 2018 at 9:59 AM, Mun Dega wrote: > >> I am surprised that no one else ran into any issues with this version. >> GC can't catch up fast enough and there is constant Full GC taking place. >> >> The result? unresponsive nodes makeing entire cluster unusable. >> >> Any insight on this issue from anyone that is using this version would be >> appreciated. >> >> Ma >> >> On Fri, Aug 24, 2018, 04:30 Mohamadreza Rostami < >> mohamadrezarosta...@gmail.com> wrote: >> >>> You have very large heap,it’s take most of cpu time in GC stage.you >>> should in maximum set heap on 12GB and enable row cache to your cluster >>> become faster. >>> >>> On Friday, 24 August 2018, Mun Dega wrote: >>> 120G data 28G heap out of 48 on system 9 node cluster, RF3 On Thu, Aug 23, 2018, 17:19 Mohamadreza Rostami < mohamadrezarosta...@gmail.com> wrote: > Hi, > How much data do you have? How much RAM do your servers have? How much > do you have a heep? > On Thu, Aug 23, 2018 at 10:14 PM Mun Dega wrote: > >> Hello, >> >> We recently upgraded from Cassandra 2.1 to 3.11.2 on one cluster. >> The process went OK including upgradesstable but we started to experience >> high latency for r/w, occasional OOM and long GC pause after. >> >> For the same cluster with 2.1, we didn't have any issues like this. We >> also kept server specs, heap, all the same in post upgrade >> >> Has anyone else had similar issues going to 3.11 and what are the >> major changes that could have such a major setback in the new version? >> >> Ma Dega >> > >
upgrading from 2.x TWCS to 3.x TWCS
Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we have several column families configured with twcs class 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy' Is that a safe operation? Will cassandra even start if the column family has a compaction strategy defined with a classname it cannot resolve? How to deal with different versioned nodes and different class names during the upgrade of the binaries throughout the cluster? Thank you for any guidance, Brian
Re: upgrading from 2.x TWCS to 3.x TWCS
[image: image.png] On Fri, Nov 2, 2018 at 2:34 PM Jeff Jirsa wrote: > Easiest approach is to build the 3.11 jar from my repo, upgrade, then > ALTER table to use the official TWCS (org.apache.cassandra) jar > > Sorry for the headache. I hope I have a 3.11 branch for you. > > > -- > Jeff Jirsa > > > On Nov 2, 2018, at 11:28 AM, Brian Spindler > wrote: > > Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we have > several column families configured with twcs class > 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and > with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy' > > Is that a safe operation? Will cassandra even start if the column family > has a compaction strategy defined with a classname it cannot resolve? > How to deal with different versioned nodes and different class names > during the upgrade of the binaries throughout the cluster? > > Thank you for any guidance, > Brian > >
Re: upgrading from 2.x TWCS to 3.x TWCS
Nevermind, I spoke to quickly. I can change the cass version in the pom.xml and re compile, thanks! On Fri, Nov 2, 2018 at 2:38 PM Brian Spindler wrote: > [image: image.png] > > > On Fri, Nov 2, 2018 at 2:34 PM Jeff Jirsa wrote: > >> Easiest approach is to build the 3.11 jar from my repo, upgrade, then >> ALTER table to use the official TWCS (org.apache.cassandra) jar >> >> Sorry for the headache. I hope I have a 3.11 branch for you. >> >> >> -- >> Jeff Jirsa >> >> >> On Nov 2, 2018, at 11:28 AM, Brian Spindler >> wrote: >> >> Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we >> have several column families configured with twcs class >> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and >> with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy' >> >> Is that a safe operation? Will cassandra even start if the column family >> has a compaction strategy defined with a classname it cannot resolve? >> How to deal with different versioned nodes and different class names >> during the upgrade of the binaries throughout the cluster? >> >> Thank you for any guidance, >> Brian >> >>
Re: upgrading from 2.x TWCS to 3.x TWCS
you are right, it won't even compile: [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[110,90] cannot find symbol symbol: method getOverlappingSSTables(org.apache.cassandra.db.lifecycle.SSTableSet,java.util.Set) location: variable cfs of type org.apache.cassandra.db.ColumnFamilyStore [ERROR] /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[150,99] cannot find symbol symbol: class SizeComparator location: class org.apache.cassandra.io.sstable.format.SSTableReader [ERROR] /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[330,59] cannot find symbol symbol: class SizeComparator location: class org.apache.cassandra.io.sstable.format.SSTableReader [ERROR] /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/SizeTieredCompactionStrategy.java:[104,67] cannot find symbol symbol: class SizeComparator location: class org.apache.cassandra.io.sstable.format.SSTableReader [INFO] 4 errors On Fri, Nov 2, 2018 at 2:52 PM Jeff Jirsa wrote: > There’s a chance it will fail to work - possible method signatures changed > between 3.0 and 3.11. Try it in a test cluster before prod > > > -- > Jeff Jirsa > > > On Nov 2, 2018, at 11:49 AM, Brian Spindler > wrote: > > Nevermind, I spoke to quickly. I can change the cass version in the > pom.xml and re compile, thanks! > > On Fri, Nov 2, 2018 at 2:38 PM Brian Spindler > wrote: > >> >> >> >> On Fri, Nov 2, 2018 at 2:34 PM Jeff Jirsa wrote: >> >>> Easiest approach is to build the 3.11 jar from my repo, upgrade, then >>> ALTER table to use the official TWCS (org.apache.cassandra) jar >>> >>> Sorry for the headache. I hope I have a 3.11 branch for you. >>> >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Nov 2, 2018, at 11:28 AM, Brian Spindler >>> wrote: >>> >>> Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we >>> have several column families configured with twcs class >>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and >>> with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy' >>> >>> Is that a safe operation? Will cassandra even start if the column >>> family has a compaction strategy defined with a classname it cannot >>> resolve? >>> How to deal with different versioned nodes and different class names >>> during the upgrade of the binaries throughout the cluster? >>> >>> Thank you for any guidance, >>> Brian >>> >>>
Re: upgrading from 2.x TWCS to 3.x TWCS
I hope I can do as you suggest and leapfrog to 3.11 rather than two stepping it from 3.7->3.11 Just having TWCS has saved me lots of hassle so it’s all good, thanks for all you do for our community. -B On Fri, Nov 2, 2018 at 3:54 PM Jeff Jirsa wrote: > I'm sincerely sorry for the hassle, but for various reasons beyond, it's > unlikely I'll update my repo (at least me, personally). One fix is likely > to grab the actual java classes from the apache repo, pull them in and fix > the package names, and compile (essentially making your own 3.11 branch). > > I suppose you could also disable compaction, switch to something else > (stcs), do the upgrade, then alter it back to the official TWCS. Whether or > not this is viable depends on how quickly you write, and how long it'll > take you to upgrade. > > > > > > On Fri, Nov 2, 2018 at 11:57 AM Brian Spindler > wrote: > >> you are right, it won't even compile: >> >> [INFO] - >> [ERROR] COMPILATION ERROR : >> [INFO] - >> [ERROR] >> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[110,90] >> cannot find symbol >> symbol: method >> getOverlappingSSTables(org.apache.cassandra.db.lifecycle.SSTableSet,java.util.Set) >> location: variable cfs of type org.apache.cassandra.db.ColumnFamilyStore >> [ERROR] >> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[150,99] >> cannot find symbol >> symbol: class SizeComparator >> location: class org.apache.cassandra.io.sstable.format.SSTableReader >> [ERROR] >> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[330,59] >> cannot find symbol >> symbol: class SizeComparator >> location: class org.apache.cassandra.io.sstable.format.SSTableReader >> [ERROR] >> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/SizeTieredCompactionStrategy.java:[104,67] >> cannot find symbol >> symbol: class SizeComparator >> location: class org.apache.cassandra.io.sstable.format.SSTableReader >> [INFO] 4 errors >> >> >> >> On Fri, Nov 2, 2018 at 2:52 PM Jeff Jirsa wrote: >> >>> There’s a chance it will fail to work - possible method signatures >>> changed between 3.0 and 3.11. Try it in a test cluster before prod >>> >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Nov 2, 2018, at 11:49 AM, Brian Spindler >>> wrote: >>> >>> Nevermind, I spoke to quickly. I can change the cass version in the >>> pom.xml and re compile, thanks! >>> >>> On Fri, Nov 2, 2018 at 2:38 PM Brian Spindler >>> wrote: >>> >>>> >>>> >>>> >>>> On Fri, Nov 2, 2018 at 2:34 PM Jeff Jirsa wrote: >>>> >>>>> Easiest approach is to build the 3.11 jar from my repo, upgrade, then >>>>> ALTER table to use the official TWCS (org.apache.cassandra) jar >>>>> >>>>> Sorry for the headache. I hope I have a 3.11 branch for you. >>>>> >>>>> >>>>> -- >>>>> Jeff Jirsa >>>>> >>>>> >>>>> On Nov 2, 2018, at 11:28 AM, Brian Spindler >>>>> wrote: >>>>> >>>>> Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we >>>>> have several column families configured with twcs class >>>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and >>>>> with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy' >>>>> >>>>> Is that a safe operation? Will cassandra even start if the column >>>>> family has a compaction strategy defined with a classname it cannot >>>>> resolve? >>>>> How to deal with different versioned nodes and different class names >>>>> during the upgrade of the binaries throughout the cluster? >>>>> >>>>> Thank you for any guidance, >>>>> Brian >>>>> >>>>> -- -Brian
Re: upgrading from 2.x TWCS to 3.x TWCS
That wasn't horrible at all. After testing, provided all goes well I can submit this back to the main TWCS repo if you think it's worth it. Either way do you mind just reviewing briefly for obvious mistakes? https://github.com/bspindler/twcs/commit/7ba388dbf41b1c9dc1b70661ad69273b258139da Thanks! On Fri, Nov 2, 2018 at 7:24 PM Brian Spindler wrote: > I hope I can do as you suggest and leapfrog to 3.11 rather than two > stepping it from 3.7->3.11 > > Just having TWCS has saved me lots of hassle so it’s all good, thanks for > all you do for our community. > > -B > > On Fri, Nov 2, 2018 at 3:54 PM Jeff Jirsa wrote: > >> I'm sincerely sorry for the hassle, but for various reasons beyond, it's >> unlikely I'll update my repo (at least me, personally). One fix is likely >> to grab the actual java classes from the apache repo, pull them in and fix >> the package names, and compile (essentially making your own 3.11 branch). >> >> I suppose you could also disable compaction, switch to something else >> (stcs), do the upgrade, then alter it back to the official TWCS. Whether or >> not this is viable depends on how quickly you write, and how long it'll >> take you to upgrade. >> >> >> >> >> >> On Fri, Nov 2, 2018 at 11:57 AM Brian Spindler >> wrote: >> >>> you are right, it won't even compile: >>> >>> [INFO] - >>> [ERROR] COMPILATION ERROR : >>> [INFO] - >>> [ERROR] >>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[110,90] >>> cannot find symbol >>> symbol: method >>> getOverlappingSSTables(org.apache.cassandra.db.lifecycle.SSTableSet,java.util.Set) >>> location: variable cfs of type >>> org.apache.cassandra.db.ColumnFamilyStore >>> [ERROR] >>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[150,99] >>> cannot find symbol >>> symbol: class SizeComparator >>> location: class org.apache.cassandra.io.sstable.format.SSTableReader >>> [ERROR] >>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[330,59] >>> cannot find symbol >>> symbol: class SizeComparator >>> location: class org.apache.cassandra.io.sstable.format.SSTableReader >>> [ERROR] >>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/SizeTieredCompactionStrategy.java:[104,67] >>> cannot find symbol >>> symbol: class SizeComparator >>> location: class org.apache.cassandra.io.sstable.format.SSTableReader >>> [INFO] 4 errors >>> >>> >>> >>> On Fri, Nov 2, 2018 at 2:52 PM Jeff Jirsa wrote: >>> >>>> There’s a chance it will fail to work - possible method signatures >>>> changed between 3.0 and 3.11. Try it in a test cluster before prod >>>> >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>> On Nov 2, 2018, at 11:49 AM, Brian Spindler >>>> wrote: >>>> >>>> Nevermind, I spoke to quickly. I can change the cass version in the >>>> pom.xml and re compile, thanks! >>>> >>>> On Fri, Nov 2, 2018 at 2:38 PM Brian Spindler >>>> wrote: >>>> >>>>> >>>>> >>>>> >>>>> On Fri, Nov 2, 2018 at 2:34 PM Jeff Jirsa wrote: >>>>> >>>>>> Easiest approach is to build the 3.11 jar from my repo, upgrade, then >>>>>> ALTER table to use the official TWCS (org.apache.cassandra) jar >>>>>> >>>>>> Sorry for the headache. I hope I have a 3.11 branch for you. >>>>>> >>>>>> >>>>>> -- >>>>>> Jeff Jirsa >>>>>> >>>>>> >>>>>> On Nov 2, 2018, at 11:28 AM, Brian Spindler >>>>>> wrote: >>>>>> >>>>>> Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we >>>>>> have several column families configured with twcs class >>>>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and >>>>>> with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy' >>>>>> >>>>>> Is that a safe operation? Will cassandra even start if the column >>>>>> family has a compaction strategy defined with a classname it cannot >>>>>> resolve? >>>>>> How to deal with different versioned nodes and different class names >>>>>> during the upgrade of the binaries throughout the cluster? >>>>>> >>>>>> Thank you for any guidance, >>>>>> Brian >>>>>> >>>>>> -- > -Brian >
Cassandra lucene secondary indexes
Hi all, we recently started using the cassandra-lucene secondary index support that Instaclustr recently assumed ownership of, thank you btw! We are experiencing a strange issue where adding/removing nodes fails and the joining node is left hung with a compaction "Secondary index build" and it just never completes. We're running v3.11.3 of Cassandra and the plugin, has anyone experienced this before? It's a relatively small cluster ~6 nodes in our user acceptance environment and so not a lot of load either. Thanks! -- -Brian