Dropping down replication factor

2017-08-12 Thread brian . spindler
Hi folks, hopefully a quick one:

We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch.  It's all in 
one region but spread across 3 availability zones.  It was nicely balanced with 
4 nodes in each.

But with a couple of failures and subsequent provisions to the wrong az we now 
have a cluster with : 

5 nodes in az A
5 nodes in az B
2 nodes in az C

Not sure why, but when adding a third node in AZ C it fails to stream after 
getting all the way to completion and no apparent error in logs.  I've looked 
at a couple of bugs referring to scrubbing and possible OOM bugs due to 
metadata writing at end of streaming (sorry don't have ticket handy).  I'm 
worried I might not be able to do much with these since the disk space usage is 
high and they are under a lot of load given the small number of them for this 
rack.

Rather than troubleshoot this further, what I was thinking about doing was:
- drop the replication factor on our keyspace to two
- hopefully this would reduce load on these two remaining nodes 
- run repairs/cleanup across the cluster 
- then shoot these two nodes in the 'c' rack
- run repairs/cleanup across the cluster

Would this work with minimal/no disruption? 
Should I update their "rack" before hand or after ?
What else am I not thinking about? 

My main goal atm is to get back to where the cluster is in a clean consistent 
state that allows nodes to properly bootstrap.

Thanks for your help in advance.
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Dropping down replication factor

2017-08-12 Thread Brian Spindler
Thanks for replying Jeff.

Responses below.

On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa  wrote:

> Answers inline
>
> --
> Jeff Jirsa
>
>
> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote:
> >
> > Hi folks, hopefully a quick one:
> >
> > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch.  It's
> all in one region but spread across 3 availability zones.  It was nicely
> balanced with 4 nodes in each.
> >
> > But with a couple of failures and subsequent provisions to the wrong az
> we now have a cluster with :
> >
> > 5 nodes in az A
> > 5 nodes in az B
> > 2 nodes in az C
> >
> > Not sure why, but when adding a third node in AZ C it fails to stream
> after getting all the way to completion and no apparent error in logs.
> I've looked at a couple of bugs referring to scrubbing and possible OOM
> bugs due to metadata writing at end of streaming (sorry don't have ticket
> handy).  I'm worried I might not be able to do much with these since the
> disk space usage is high and they are under a lot of load given the small
> number of them for this rack.
>
> You'll definitely have higher load on az C instances with rf=3 in this
> ratio


> Streaming should still work - are you sure it's not busy doing something?
> Like building secondary index or similar? jstack thread dump would be
> useful, or at least nodetool tpstats
>
> Only other thing might be a backup.  We do incrementals x1hr and snapshots
x24h; they are shipped to s3 then links are cleaned up.  The error I get on
the node I'm trying to add to rack C is:

ERROR [main] 2017-08-12 23:54:51,546 CassandraDaemon.java:583 - Exception
encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed
at
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:87)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:944)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391)
[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655)
[apache-cassandra-2.1.15.jar:2.1.15]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
~[guava-16.0.jar:na]
at
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
~[guava-16.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
~[guava-16.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
~[guava-16.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
~[guava-16.0.jar:na]
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:209)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:185)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:413)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:700)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:661)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:179)
~[apache-cassandra-2.1.15.jar:2.1.15]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_112]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_112]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_112]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[na:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_112]
WARN  [StorageServiceShutdownHook] 2017-08-12 23:54:51,582
Gossiper.java:1462 - No local state or state is in silent shutdown, not
announcing

Re: Dropping down replication factor

2017-08-12 Thread Brian Spindler
nothing in logs on the node that it was streaming from.

however, I think I found the issue on the other node in the C rack:

ERROR [STREAM-IN-/10.40.17.114] 2017-08-12 16:48:53,354
StreamSession.java:512 - [Stream #08957970-7f7e-11e7-b2a2-a31e21b877e5]
Streaming error occurred
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
/ephemeral/cassandra/data/...

I did a 'cat /var/log/cassandra/system.log|grep Corrupt'  and it seems it's
a single Index.db file and nothing on the other node.

I think nodetool scrub or offline sstablescrub might be in order but with
the current load I'm not sure I can take it offline for very long.

Thanks again for the help.


On Sat, Aug 12, 2017 at 9:38 PM Jeffrey Jirsa  wrote:

> Compaction is backed up – that may be normal write load (because of the
> rack imbalance), or it may be a secondary index build. Hard to say for
> sure. ‘nodetool compactionstats’ if you’re able to provide it. The jstack
> probably not necessary, streaming is being marked as failed and it’s
> turning itself off. Not sure why streaming is marked as failing, though,
> anything on the sending sides?
>
>
>
>
>
> From: Brian Spindler 
> Reply-To: 
> Date: Saturday, August 12, 2017 at 6:34 PM
> To: 
> Subject: Re: Dropping down replication factor
>
> Thanks for replying Jeff.
>
> Responses below.
>
> On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa  wrote:
>
>> Answers inline
>>
>> --
>> Jeff Jirsa
>>
>>
>> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote:
>> >
>> > Hi folks, hopefully a quick one:
>> >
>> > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch.  It's
>> all in one region but spread across 3 availability zones.  It was nicely
>> balanced with 4 nodes in each.
>> >
>> > But with a couple of failures and subsequent provisions to the wrong az
>> we now have a cluster with :
>> >
>> > 5 nodes in az A
>> > 5 nodes in az B
>> > 2 nodes in az C
>> >
>> > Not sure why, but when adding a third node in AZ C it fails to stream
>> after getting all the way to completion and no apparent error in logs.
>> I've looked at a couple of bugs referring to scrubbing and possible OOM
>> bugs due to metadata writing at end of streaming (sorry don't have ticket
>> handy).  I'm worried I might not be able to do much with these since the
>> disk space usage is high and they are under a lot of load given the small
>> number of them for this rack.
>>
>> You'll definitely have higher load on az C instances with rf=3 in this
>> ratio
>
>
>> Streaming should still work - are you sure it's not busy doing something?
>> Like building secondary index or similar? jstack thread dump would be
>> useful, or at least nodetool tpstats
>>
>> Only other thing might be a backup.  We do incrementals x1hr and
> snapshots x24h; they are shipped to s3 then links are cleaned up.  The
> error I get on the node I'm trying to add to rack C is:
>
> ERROR [main] 2017-08-12 23:54:51,546 CassandraDaemon.java:583 - Exception
> encountered during startup
> java.lang.RuntimeException: Error during boostrap: Stream failed
> at
> org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:87)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:944)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:391)
> [apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566)
> [apache-cassandra-2.1.15.jar:2.1.15]
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655)
> [apache-cassandra-2.1.15.jar:2.1.15]
> Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
> at
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at
> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
> ~[guava-16.0.jar:na]
> at

Re: Dropping down replication factor

2017-08-13 Thread Brian Spindler
Hi Jeff, I ran the scrub online and that didn't help.  I went ahead and
stopped the node, deleted all the corrupted data files --*.db
files and planned on running a repair when it came back online.

Unrelated I believe, now another CF is corrupted!

org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
/ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db
Caused by: org.apache.cassandra.io.compress.CorruptBlockException:
(/ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db):
corruption detected, chunk at 101500 of length 26523398.

Few days ago when troubleshooting this I did change the OpsCenter keyspace
RF == 2 from 3 since I thought that would help reduce load.  Did that cause
this corruption?

running *'nodetool scrub OpsCenter rollups300'* on that node now

And now I also see this when running nodetool status:

*"Note: Non-system keyspaces don't have the same replication settings,
effective ownership information is meaningless"*

What to do?

I still can't stream to this new node cause of this corruption.  Disk space
is getting low on these nodes ...

On Sat, Aug 12, 2017 at 9:51 PM Brian Spindler 
wrote:

> nothing in logs on the node that it was streaming from.
>
> however, I think I found the issue on the other node in the C rack:
>
> ERROR [STREAM-IN-/10.40.17.114] 2017-08-12 16:48:53,354
> StreamSession.java:512 - [Stream #08957970-7f7e-11e7-b2a2-a31e21b877e5]
> Streaming error occurred
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
> /ephemeral/cassandra/data/...
>
> I did a 'cat /var/log/cassandra/system.log|grep Corrupt'  and it seems
> it's a single Index.db file and nothing on the other node.
>
> I think nodetool scrub or offline sstablescrub might be in order but with
> the current load I'm not sure I can take it offline for very long.
>
> Thanks again for the help.
>
>
> On Sat, Aug 12, 2017 at 9:38 PM Jeffrey Jirsa  wrote:
>
>> Compaction is backed up – that may be normal write load (because of the
>> rack imbalance), or it may be a secondary index build. Hard to say for
>> sure. ‘nodetool compactionstats’ if you’re able to provide it. The jstack
>> probably not necessary, streaming is being marked as failed and it’s
>> turning itself off. Not sure why streaming is marked as failing, though,
>> anything on the sending sides?
>>
>>
>>
>>
>>
>> From: Brian Spindler 
>> Reply-To: 
>> Date: Saturday, August 12, 2017 at 6:34 PM
>> To: 
>> Subject: Re: Dropping down replication factor
>>
>> Thanks for replying Jeff.
>>
>> Responses below.
>>
>> On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa  wrote:
>>
>>> Answers inline
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote:
>>> >
>>> > Hi folks, hopefully a quick one:
>>> >
>>> > We are running a 12 node cluster (2.1.15) in AWS with Ec2Snitch.  It's
>>> all in one region but spread across 3 availability zones.  It was nicely
>>> balanced with 4 nodes in each.
>>> >
>>> > But with a couple of failures and subsequent provisions to the wrong
>>> az we now have a cluster with :
>>> >
>>> > 5 nodes in az A
>>> > 5 nodes in az B
>>> > 2 nodes in az C
>>> >
>>> > Not sure why, but when adding a third node in AZ C it fails to stream
>>> after getting all the way to completion and no apparent error in logs.
>>> I've looked at a couple of bugs referring to scrubbing and possible OOM
>>> bugs due to metadata writing at end of streaming (sorry don't have ticket
>>> handy).  I'm worried I might not be able to do much with these since the
>>> disk space usage is high and they are under a lot of load given the small
>>> number of them for this rack.
>>>
>>> You'll definitely have higher load on az C instances with rf=3 in this
>>> ratio
>>
>>
>>> Streaming should still work - are you sure it's not busy doing
>>> something? Like building secondary index or similar? jstack thread dump
>>> would be useful, or at least nodetool tpstats
>>>
>>> Only other thing might be a backup.  We do incrementals x1hr and
>> snapshots x24h; they are shipped to s3 then links are cleaned up.  The
>> error I get on the node I'm trying to add to rack C is:
>>
>> ERROR [main] 2017-08-12 2

Re: Dropping down replication factor

2017-08-13 Thread Brian Spindler
Do you think with the setup I've described I'd be ok doing that now to
recover this node?

The node died trying to run the scrub; I've restarted it but I'm not sure
it's going to get past a scrub/repair, this is why I deleted the other
files as a brute force method.  I think I might have to do the same here
and then kick off a repair if I can't just replace it?

Doing the repair on the node that had the corrupt data deleted should be
ok?

On Sun, Aug 13, 2017 at 10:29 AM Jeff Jirsa  wrote:

> Running repairs when you have corrupt sstables can spread the corruption
>
> In 2.1.15, corruption is almost certainly from something like a bad disk
> or bad RAM
>
> One way to deal with corruption is to stop the node and replace is (with
> -Dcassandra.replace_address) so you restream data from neighbors. The
> challenge here is making sure you have a healthy replica for streaming
>
> Please make sure you have backups and snapshots if you have corruption
> popping up
>
> If you're using vnodes, once you get rid of the corruption you may
> consider adding another c node with fewer vnodes to try to get it joined
> faster with less data.
>
>
> --
> Jeff Jirsa
>
>
> On Aug 13, 2017, at 7:11 AM, Brian Spindler 
> wrote:
>
> Hi Jeff, I ran the scrub online and that didn't help.  I went ahead and
> stopped the node, deleted all the corrupted data files --*.db
> files and planned on running a repair when it came back online.
>
> Unrelated I believe, now another CF is corrupted!
>
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
> /ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db
> Caused by: org.apache.cassandra.io.compress.CorruptBlockException:
> (/ephemeral/cassandra/data/OpsCenter/rollups300-45c85324387b35238d056678f8fa8b0f/OpsCenter-rollups300-ka-100672-Data.db):
> corruption detected, chunk at 101500 of length 26523398.
>
> Few days ago when troubleshooting this I did change the OpsCenter keyspace
> RF == 2 from 3 since I thought that would help reduce load.  Did that cause
> this corruption?
>
> running *'nodetool scrub OpsCenter rollups300'* on that node now
>
> And now I also see this when running nodetool status:
>
> *"Note: Non-system keyspaces don't have the same replication settings,
> effective ownership information is meaningless"*
>
> What to do?
>
> I still can't stream to this new node cause of this corruption.  Disk
> space is getting low on these nodes ...
>
> On Sat, Aug 12, 2017 at 9:51 PM Brian Spindler 
> wrote:
>
>> nothing in logs on the node that it was streaming from.
>>
>> however, I think I found the issue on the other node in the C rack:
>>
>> ERROR [STREAM-IN-/10.40.17.114] 2017-08-12 16:48:53,354
>> StreamSession.java:512 - [Stream #08957970-7f7e-11e7-b2a2-a31e21b877e5]
>> Streaming error occurred
>> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
>> /ephemeral/cassandra/data/...
>>
>> I did a 'cat /var/log/cassandra/system.log|grep Corrupt'  and it seems
>> it's a single Index.db file and nothing on the other node.
>>
>> I think nodetool scrub or offline sstablescrub might be in order but with
>> the current load I'm not sure I can take it offline for very long.
>>
>> Thanks again for the help.
>>
>>
>> On Sat, Aug 12, 2017 at 9:38 PM Jeffrey Jirsa  wrote:
>>
>>> Compaction is backed up – that may be normal write load (because of the
>>> rack imbalance), or it may be a secondary index build. Hard to say for
>>> sure. ‘nodetool compactionstats’ if you’re able to provide it. The jstack
>>> probably not necessary, streaming is being marked as failed and it’s
>>> turning itself off. Not sure why streaming is marked as failing, though,
>>> anything on the sending sides?
>>>
>>>
>>>
>>>
>>>
>>> From: Brian Spindler 
>>> Reply-To: 
>>> Date: Saturday, August 12, 2017 at 6:34 PM
>>> To: 
>>> Subject: Re: Dropping down replication factor
>>>
>>> Thanks for replying Jeff.
>>>
>>> Responses below.
>>>
>>> On Sat, Aug 12, 2017 at 8:33 PM Jeff Jirsa  wrote:
>>>
>>>> Answers inline
>>>>
>>>> --
>>>> Jeff Jirsa
>>>>
>>>>
>>>> > On Aug 12, 2017, at 2:58 PM, brian.spind...@gmail.com wrote:
>>>> >
>>>> > Hi folks, hopefully a quick one:
>>>> >
>>>> > We are running a 12 node clu

Re: Dropping down replication factor

2017-08-13 Thread Brian Spindler
Thanks Kurt.

We had one sstable from a cf of ours.  I am actually running a repair on
that cf now and then plan to try and join the additional nodes as you
suggest.  I deleted the opscenter corrupt sstables as well but will not
bother repairing that before adding capacity.

Been keeping an eye across all nodes for corrupt exceptions - so far no new
occurrences.

Thanks again.

-B

On Sun, Aug 13, 2017 at 17:52 kurt greaves  wrote:

>
>
> On 14 Aug. 2017 00:59, "Brian Spindler"  wrote:
>
> Do you think with the setup I've described I'd be ok doing that now to
> recover this node?
>
> The node died trying to run the scrub; I've restarted it but I'm not sure
> it's going to get past a scrub/repair, this is why I deleted the other
> files as a brute force method.  I think I might have to do the same here
> and then kick off a repair if I can't just replace it?
>
> is it just opscenter keyspace that has corrupt sstables? if so I wouldn't
> worry about repairing too much. If you can get that third node in C to join
> I'd say your best bet is to just do that until you have enough nodes in C.
> Dropping and increasing RF is pretty risky on a live system.
>
> It sounds to me like you stand a good chance of getting the new nodes in C
> to join so I'd pursue that before trying anything more complicated
>
>
> Doing the repair on the node that had the corrupt data deleted should be
> ok?
>
> Yes. as long as you also deleted corrupt SSTables on any other nodes that
> had them.
>
-- 
-Brian


Nodes just dieing with OOM

2017-10-06 Thread Brian Spindler
Hi guys, our cluster - around 18 nodes - just starting having nodes die and
when restarting them they are dying with OOM.  How can we handle this?
 I've tried adding a couple extra gigs on these machines to help but it's
not.

Help!
-B


Re: Nodes just dieing with OOM

2017-10-06 Thread Brian Spindler
Sorry about that.  We eventually found that one column family had some
large/corrupt data and causing OOM's

Luckily it was a pretty ephemeral data set and we were able to just
truncate it.  However, it was a guess based on some log messages about
reading a large number of tombstones on that column families.  I think we
should review this column family design so it doesn't generate so many
tombstones?  Could that be the cause?  What else would you recommend?

Thank you in advance.

On Fri, Oct 6, 2017 at 6:33 AM Brian Spindler 
wrote:

> Hi guys, our cluster - around 18 nodes - just starting having nodes die
> and when restarting them they are dying with OOM.  How can we handle this?
>  I've tried adding a couple extra gigs on these machines to help but it's
> not.
>
> Help!
> -B
>
>


Re: Nodes just dieing with OOM

2017-10-06 Thread Brian Spindler
Hi Alain, thanks for getting back to me.  I will read through those
articles.

The truncate did solve the problem.
I am using Cassandra 2.1.15
I'll look at cfstats in more detail, we've got some charting from JVM
metrics yeah.
We're migrating from i2.xl (32GB ram, Local SSD) to m4.xl (16gb, gp2) so we
have a mix there, Cassandra JVM set to 10GB

When I did a truncate, Cassandra did create a snapshot which I'm hoping to
copy over to a developer's machine and find the offending row(s).  If it is
just huge rows, that's probably more of an application leak.

Is 'Compacted partition maximum bytes:' from cfstats the right thing to
look at?

Thanks again,
-B

On Fri, Oct 6, 2017 at 10:40 AM Alain RODRIGUEZ  wrote:

> Hello Brian.
>
> Sorry to hear, looks like a lot of troubles.
>
> I think we should review this column family design so it doesn't generate
>> so many tombstones?  Could that be the cause?
>
>
> It could be indeed, did truncating solved the issue?
>
> There so nicer approaches you can try to handle tombstones correctly
> depending on your use case. I wrote a post and presented a talk about this
> last year, I hope you'll find what you are looking for.
>
> http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
> https://www.youtube.com/watch?v=lReTEcnzl7Y
>
>  What else would you recommend?
>
>
> Well we don't have much information to guess. But I will try to give you
> relevant clues with what you gave us so far:
>
> that one column family had some large/corrupt data and causing OOM's
>>
>
> Are you using Cassandra 3.0.x (x < 14)? You might be facing a bug in
> Cassandra corrupting data after schema changes (
> https://issues.apache.org/jira/browse/CASSANDRA-13004).
>
> You can check large partition using 'nodetool cfstats' or using monitoring
> and corresponding metric (per table / columnfamily)
>
> Other than that what is the memory available, the heap size and GC type
> and options in use. Do you see some GC pauses in the logs or do you control
> this value through a chart using JVM metrics?
>
> C*heers,
>
> -------
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> 2017-10-06 14:48 GMT+01:00 Brian Spindler :
>
>> Sorry about that.  We eventually found that one column family had some
>> large/corrupt data and causing OOM's
>>
>> Luckily it was a pretty ephemeral data set and we were able to just
>> truncate it.  However, it was a guess based on some log messages about
>> reading a large number of tombstones on that column families.  I think we
>> should review this column family design so it doesn't generate so many
>> tombstones?  Could that be the cause?  What else would you recommend?
>>
>> Thank you in advance.
>>
>> On Fri, Oct 6, 2017 at 6:33 AM Brian Spindler 
>> wrote:
>>
>>> Hi guys, our cluster - around 18 nodes - just starting having nodes die
>>> and when restarting them they are dying with OOM.  How can we handle this?
>>>  I've tried adding a couple extra gigs on these machines to help but it's
>>> not.
>>>
>>> Help!
>>> -B
>>>
>>>
>


need to reclaim space with TWCS

2018-01-20 Thread Brian Spindler
Hi, I have several column families using TWCS and it’s great.
Unfortunately we seem to have missed the great advice in Alex’s article
here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html about
setting the appropriate aggressive tombstone settings and now we have lots
of timestamp overlaps and disk space to reclaim.



I am trying to figure the best way out of this. Lots of the SSTables with
overlapping timestamps in newer SSTables have droppable tombstones at like
0.895143957 or something similar, very close to 0.90 where the full sstable
will drop afaik.



I’m thinking to do the following immediately:



Set *unchecked_tombstone_compaction = true*

Set* tombstone_compaction_interval == TTL + gc_grace_seconds*

Set* dclocal_read_repair_chance = 0.0 (currently 0.1)*



If I do this, can I expect TWCS/C* to reclaim the space from those SSTables
with 0.89* droppable tombstones?   Or do I (can I?) manually delete these
files and will c* just ignore the overlapping data and treat as tombstoned?




What else should/could be done?



Thank you in advance for your advice,



*__*

*Brian Spindler *


Re: need to reclaim space with TWCS

2018-01-20 Thread Brian Spindler
I probably should have mentioned our setup: we’re on Cassandra version
2.1.15.


On Sat, Jan 20, 2018 at 9:33 AM Brian Spindler 
wrote:

> Hi, I have several column families using TWCS and it’s great.
> Unfortunately we seem to have missed the great advice in Alex’s article
> here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html about
> setting the appropriate aggressive tombstone settings and now we have lots
> of timestamp overlaps and disk space to reclaim.
>
>
>
> I am trying to figure the best way out of this. Lots of the SSTables with
> overlapping timestamps in newer SSTables have droppable tombstones at like
> 0.895143957 or something similar, very close to 0.90 where the full sstable
> will drop afaik.
>
>
>
> I’m thinking to do the following immediately:
>
>
>
> Set *unchecked_tombstone_compaction = true*
>
> Set* tombstone_compaction_interval == TTL + gc_grace_seconds*
>
> Set* dclocal_read_repair_chance = 0.0 (currently 0.1)*
>
>
>
> If I do this, can I expect TWCS/C* to reclaim the space from those
> SSTables with 0.89* droppable tombstones?   Or do I (can I?) manually
> delete these files and will c* just ignore the overlapping data and treat
> as tombstoned?
>
>
>
> What else should/could be done?
>
>
>
> Thank you in advance for your advice,
>
>
>
> *__*
>
> *Brian Spindler *
>
>
>
>
>


Re: need to reclaim space with TWCS

2018-01-20 Thread Brian Spindler
Hi Alexander,  Thanks for your response!  I'll give it a shot.

On Sat, Jan 20, 2018 at 10:22 AM Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> Hi Brian,
>
> You should definitely set unchecked_tombstone_compaction to true and set
> the interval to the default of 1 day. Use a tombstone_threshold of 0.6 for
> example and see how that works.
> Tombstones will get purged depending on your partitioning as their
> partition needs to be fully contained within a single sstable.
>
> Deleting the sstables by hand is theoretically possible but should be kept
> as a last resort option if you're running out of space.
>
> Cheers,
>
> Le sam. 20 janv. 2018 à 15:41, Brian Spindler 
> a écrit :
>
>> I probably should have mentioned our setup: we’re on Cassandra version
>> 2.1.15.
>>
>>
>> On Sat, Jan 20, 2018 at 9:33 AM Brian Spindler 
>> wrote:
>>
>>> Hi, I have several column families using TWCS and it’s great.
>>> Unfortunately we seem to have missed the great advice in Alex’s article
>>> here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html about
>>> setting the appropriate aggressive tombstone settings and now we have lots
>>> of timestamp overlaps and disk space to reclaim.
>>>
>>>
>>>
>>> I am trying to figure the best way out of this. Lots of the SSTables
>>> with overlapping timestamps in newer SSTables have droppable tombstones at
>>> like 0.895143957 or something similar, very close to 0.90 where the full
>>> sstable will drop afaik.
>>>
>>>
>>>
>>> I’m thinking to do the following immediately:
>>>
>>>
>>>
>>> Set *unchecked_tombstone_compaction = true*
>>>
>>> Set* tombstone_compaction_interval == TTL + gc_grace_seconds*
>>>
>>> Set* dclocal_read_repair_chance = 0.0 (currently 0.1)*
>>>
>>>
>>>
>>> If I do this, can I expect TWCS/C* to reclaim the space from those
>>> SSTables with 0.89* droppable tombstones?   Or do I (can I?) manually
>>> delete these files and will c* just ignore the overlapping data and treat
>>> as tombstoned?
>>>
>>>
>>>
>>> What else should/could be done?
>>>
>>>
>>>
>>> Thank you in advance for your advice,
>>>
>>>
>>>
>>> *__*
>>>
>>> *Brian Spindler *
>>>
>>>
>>>
>>>
>>>
>> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: need to reclaim space with TWCS

2018-01-20 Thread Brian Spindler
Hi Alexander, after re-reading this
https://issues.apache.org/jira/browse/CASSANDRA-13418 it seems you would
recommend leaving dclocal_read_repair at maybe 10%  is that true?

Also, has this been patched to 2.1?
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176


Cheers,

-B


On Sat, Jan 20, 2018 at 10:49 AM Brian Spindler 
wrote:

> Hi Alexander,  Thanks for your response!  I'll give it a shot.
>
> On Sat, Jan 20, 2018 at 10:22 AM Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Brian,
>>
>> You should definitely set unchecked_tombstone_compaction to true and set
>> the interval to the default of 1 day. Use a tombstone_threshold of 0.6 for
>> example and see how that works.
>> Tombstones will get purged depending on your partitioning as their
>> partition needs to be fully contained within a single sstable.
>>
>> Deleting the sstables by hand is theoretically possible but should be
>> kept as a last resort option if you're running out of space.
>>
>> Cheers,
>>
>> Le sam. 20 janv. 2018 à 15:41, Brian Spindler 
>> a écrit :
>>
>>> I probably should have mentioned our setup: we’re on Cassandra version
>>> 2.1.15.
>>>
>>>
>>> On Sat, Jan 20, 2018 at 9:33 AM Brian Spindler 
>>> wrote:
>>>
>>>> Hi, I have several column families using TWCS and it’s great.
>>>> Unfortunately we seem to have missed the great advice in Alex’s article
>>>> here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html about
>>>> setting the appropriate aggressive tombstone settings and now we have lots
>>>> of timestamp overlaps and disk space to reclaim.
>>>>
>>>>
>>>>
>>>> I am trying to figure the best way out of this. Lots of the SSTables
>>>> with overlapping timestamps in newer SSTables have droppable tombstones at
>>>> like 0.895143957 or something similar, very close to 0.90 where the full
>>>> sstable will drop afaik.
>>>>
>>>>
>>>>
>>>> I’m thinking to do the following immediately:
>>>>
>>>>
>>>>
>>>> Set *unchecked_tombstone_compaction = true*
>>>>
>>>> Set* tombstone_compaction_interval == TTL + gc_grace_seconds*
>>>>
>>>> Set* dclocal_read_repair_chance = 0.0 (currently 0.1)*
>>>>
>>>>
>>>>
>>>> If I do this, can I expect TWCS/C* to reclaim the space from those
>>>> SSTables with 0.89* droppable tombstones?   Or do I (can I?) manually
>>>> delete these files and will c* just ignore the overlapping data and treat
>>>> as tombstoned?
>>>>
>>>>
>>>>
>>>> What else should/could be done?
>>>>
>>>>
>>>>
>>>> Thank you in advance for your advice,
>>>>
>>>>
>>>>
>>>> *__*
>>>>
>>>> *Brian Spindler *
>>>>
>>>>
>>>>
>>>>
>>>>
>>> --
>> -
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>


Re: need to reclaim space with TWCS

2018-01-20 Thread brian . spindler
Got it.  Thanks again. 

> On Jan 20, 2018, at 11:17 AM, Alexander Dejanovski  
> wrote:
> 
> I would turn background read repair off on the table to improve the overlap 
> issue, but you'll still have foreground read repair if you use quorum reads 
> anyway.
> 
> So put dclocal_... to 0.0.
> 
> The commit you're referring to has been merged in 3.11.1 as 2.1 doesn't 
> patched anymore.
> 
> 
>> Le sam. 20 janv. 2018 à 16:55, Brian Spindler  a 
>> écrit :
>> Hi Alexander, after re-reading this 
>> https://issues.apache.org/jira/browse/CASSANDRA-13418 it seems you would 
>> recommend leaving dclocal_read_repair at maybe 10%  is that true?  
>> 
>> Also, has this been patched to 2.1?  
>> https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
>>  
>> 
>> Cheers, 
>> 
>> -B
>> 
>> 
>>> On Sat, Jan 20, 2018 at 10:49 AM Brian Spindler  
>>> wrote:
>>> Hi Alexander,  Thanks for your response!  I'll give it a shot.
>>> 
>>>> On Sat, Jan 20, 2018 at 10:22 AM Alexander Dejanovski 
>>>>  wrote:
>>>> Hi Brian,
>>>> 
>>>> You should definitely set unchecked_tombstone_compaction to true and set 
>>>> the interval to the default of 1 day. Use a tombstone_threshold of 0.6 for 
>>>> example and see how that works.
>>>> Tombstones will get purged depending on your partitioning as their 
>>>> partition needs to be fully contained within a single sstable.
>>>> 
>>>> Deleting the sstables by hand is theoretically possible but should be kept 
>>>> as a last resort option if you're running out of space.
>>>> 
>>>> Cheers,
>>>> 
>>>> 
>>>>> Le sam. 20 janv. 2018 à 15:41, Brian Spindler  
>>>>> a écrit :
>>>>> I probably should have mentioned our setup: we’re on Cassandra version 
>>>>> 2.1.15.
>>>>> 
>>>>> 
>>>>>> On Sat, Jan 20, 2018 at 9:33 AM Brian Spindler 
>>>>>>  wrote:
>>>>>> Hi, I have several column families using TWCS and it’s great.  
>>>>>> Unfortunately we seem to have missed the great advice in Alex’s article 
>>>>>> here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html about 
>>>>>> setting the appropriate aggressive tombstone settings and now we have 
>>>>>> lots of timestamp overlaps and disk space to reclaim. 
>>>>>>  
>>>>>> I am trying to figure the best way out of this. Lots of the SSTables 
>>>>>> with overlapping timestamps in newer SSTables have droppable tombstones 
>>>>>> at like 0.895143957 or something similar, very close to 0.90 where the 
>>>>>> full sstable will drop afaik.  
>>>>>>  
>>>>>> I’m thinking to do the following immediately:
>>>>>>  
>>>>>> Set unchecked_tombstone_compaction = true
>>>>>> Set tombstone_compaction_interval == TTL + gc_grace_seconds
>>>>>> Set dclocal_read_repair_chance = 0.0 (currently 0.1)
>>>>>>  
>>>>>> If I do this, can I expect TWCS/C* to reclaim the space from those 
>>>>>> SSTables with 0.89* droppable tombstones?   Or do I (can I?) manually 
>>>>>> delete these files and will c* just ignore the overlapping data and 
>>>>>> treat as tombstoned?  
>>>>>>  
>>>>>> What else should/could be done? 
>>>>>>  
>>>>>> Thank you in advance for your advice,
>>>>>>  
>>>>>> __
>>>>>> Brian Spindler 
>>>>>>  
>>>>>>  
>>>> 
>>>> -- 
>>>> -
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>> 
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
> 
> -- 
> -
> Alexander Dejanovski
> France
> @alexanderdeja
> 
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com


Re: Cassandra Repair Duration.

2018-01-24 Thread brian . spindler
Hi Karthick, repairs can be tricky.  

You can (and probably should) run repairs as apart of routine maintenance.  And 
of course absolutely if you lose a node in a bad way.  If you decommission a 
node for example, no “extra” repair needed. 

If you are using TWCS you should probably not run repairs on those cf.

We have a combination of scripts and locks to run repairs across an 18 node 
cluster 1 node at a time, typically takes around 2-3days and so we run it once 
a week.  

The great folks at tlp have put together http://cassandra-reaper.io/ which 
makes managing repairs even easier and probably more performant since as I 
understand, it used range repairs. 

Good luck,
-B

> On Jan 24, 2018, at 4:57 AM, Karthick V  wrote:
> 
> Periodically I have been running Full repair process befor GC Grace period as 
> mentioned in the best practices.Initially, all went well but as the data size 
> increases Repair duration has increased drastically and we are also facing 
> Query timeouts during that time and we have tried incremental repair facing 
> some OOM issues.
> 
> After running a repair process for more than 80 Hours we have ended up with 
> the question
> 
> why can't we run a repair process if and only if a Cassandra node got a 
> downtime? 
> 
> Say if there is no downtime during a GC grace period Do we still face 
> Inconsistency among nodes? if yes, then doesn't Hinted Handoff handle those? 
> 
> Cluster Info: Having two DataCenter with 8 machines each with a disk size of 
> 1TB, C* v_2.1.13  and having around 420GB data each.
> 
>> On Wed, Jan 24, 2018 at 2:46 PM, Karthick V  wrote:
>> Hi,
>> 
>> 
>> 
>> 
>> 
>> 
> 


Re: TWCS not deleting expired sstables

2018-01-28 Thread brian . spindler
I would start here:  http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

Specifically the “Hints and repairs” and “Timestamp overlap” sections might be 
of use.  

-B

> On Jan 25, 2018, at 11:05 AM, Thakrar, Jayesh  
> wrote:
> 
> Wondering if I can get some pointers to what's happening here and why 
> sstables that I think should be expired are not being dropped.
>  
> Here's the table's compaction property - note also set 
> "unchecked_tombstone_compaction" to true.
>  
> compaction = {'class': 
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
> 'compaction_window_size': '7', 'compaction_window_unit': 'DAYS', 
> 'max_threshold': '4', 'min_threshold': '4', 'unchecked_tombstone_compaction': 
> 'true'}
>  
> We insert data with timestamp and TTL programmatically.
>  
> Here's one set of sstable that I expect to be removed:
>  
> $ ls -lt *Data.db | tail -5
> -rw-r--r--. 1 vchadoop vchadoop  31245097312 Sep 20 17:16 mc-1308-big-Data.db
> -rw-r--r--. 1 vchadoop vchadoop  31524316252 Sep 19 14:27 mc-1187-big-Data.db
> -rw-r--r--. 1 vchadoop vchadoop  21405216502 Sep 18 14:14 mc-1070-big-Data.db
> -rw-r--r--. 1 vchadoop vchadoop  13609890747 Sep 13 20:53 mc-178-big-Data.db
>  
> $ date +%s
> 1516895877
>  
> $ date 
> Thu Jan 25 15:58:00 UTC 2018
>  
> $ sstablemetadata $PWD/mc-130-big-Data.db | head -20
> SSTable: 
> /ae/disk1/data/ae/raw_logs_by_user-f58b9960980311e79ac26928246f09c1/mc-130-big
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Bloom Filter FP chance: 0.01
> Minimum timestamp: 14966028
> Maximum timestamp: 14980788
> SSTable min local deletion time: 1507924954
> SSTable max local deletion time: 1509400832
> Compressor: org.apache.cassandra.io.compress.LZ4Compressor
> Compression ratio: 0.17430158132352797
> TTL min: 2630598
> TTL max: 4086188
> First token: -9177441867697829836 (key=823134638755651936)
> Last token: 9155171035305804798 (key=395118640769012487)
> minClustringValues: [-1, da, 3, 1498082382078, -9223371818124305448, 
> -9223371652504795402, -1]
> maxClustringValues: [61818, tpt, 325, 149660280, -4611686088173246790, 
> 9223372014135560885, 1]
> Estimated droppable tombstones: 1.1983492967652476
> SSTable Level: 0
> Repaired at: 0
> Replay positions covered: {CommitLogPosition(segmentId=1505171071629, 
> position=7157684)=CommitLogPosition(segmentId=1505171075152, 
> position=6263269)}
> totalColumnsSet: 111047277
>  


Re: Cassandra Repair Duration.

2018-01-28 Thread Brian Spindler
It's all here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesWhen.html


-B


On Thu, Jan 25, 2018 at 6:08 AM Karthick V  wrote:

> *You can (and probably should) run repairs as apart of routine
>> maintenance.*
>
>
>  Can u explain any use case for why do we need this?
>
>
>
>
>
> On Wed, Jan 24, 2018 at 5:35 PM,  wrote:
>
>> Hi Karthick, repairs can be tricky.
>>
>> You can (and probably should) run repairs as apart of routine
>> maintenance.  And of course absolutely if you lose a node in a bad way.  If
>> you decommission a node for example, no “extra” repair needed.
>>
>> If you are using TWCS you should probably not run repairs on those cf.
>>
>> We have a combination of scripts and locks to run repairs across an 18
>> node cluster 1 node at a time, typically takes around 2-3days and so we run
>> it once a week.
>>
>> The great folks at tlp have put together http://cassandra-reaper.io/ which
>> makes managing repairs even easier and probably more performant since as I
>> understand, it used range repairs.
>>
>> Good luck,
>> -B
>>
>>
>> On Jan 24, 2018, at 4:57 AM, Karthick V  wrote:
>>
>> Periodically I have been running Full repair process befor GC Grace
>> period as mentioned in the best practices.Initially, all went well but as
>> the data size increases Repair duration has increased drastically and we
>> are also facing Query timeouts during that time and we have tried
>> incremental repair facing some OOM issues.
>>
>> After running a repair process for more than 80 Hours we have ended up
>> with the question
>>
>> why can't we run a repair process if and only if a Cassandra node got a
>> downtime?
>>
>> Say if there is no downtime during a GC grace period Do we still face
>> Inconsistency among nodes? if yes, then doesn't Hinted Handoff handle
>> those?
>>
>> Cluster Info: Having two DataCenter with 8 machines each with a disk size
>> of 1TB, C* v_2.1.13  and having around 420GB data each.
>>
>> On Wed, Jan 24, 2018 at 2:46 PM, Karthick V 
>> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


Node won't start

2018-02-03 Thread Brian Spindler
Hi guys, I've got a 2.1.15 node that will not start it seems.  Hangs on
Opening system.size_estimates.  Sometimes it can take a while but I've let
it run for 90m and nothing.  Should I move this sstable out of the way to
let it start?  will it rebuild/refresh size estimates if I remove that
folder?

thanks
-B


Re: Node won't start

2018-02-03 Thread brian . spindler
Thanks Alex.  That’s exactly what I ended up doing - it did take maybe 45m to 
come back up though :(

-B



Sent from my iPhone
> On Feb 3, 2018, at 9:03 AM, Alexander Dejanovski  
> wrote:
> 
> Hi Brian,
> 
> I just tested this on a CCM cluster and the node started without problem. It 
> flushed some new SSTables a short while after.
> 
> I honestly do not know the specifics of how size_estimates is used, but if it 
> prevented a node from restarting I'd definitely remove the sstables to get it 
> back up.
> 
> Cheers,
> 
>> On Sat, Feb 3, 2018 at 1:53 PM Brian Spindler  
>> wrote:
>> Hi guys, I've got a 2.1.15 node that will not start it seems.  Hangs on 
>> Opening system.size_estimates.  Sometimes it can take a while but I've let 
>> it run for 90m and nothing.  Should I move this sstable out of the way to 
>> let it start?  will it rebuild/refresh size estimates if I remove that 
>> folder?  
>> 
>> thanks
>> -B
> -- 
> -
> Alexander Dejanovski
> France
> @alexanderdeja
> 
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com


TWCS Compaction backed up

2018-08-07 Thread Brian Spindler
Hey guys, quick question:

I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on
one drive, data on nvme.  That was working very well, it's a ts db and has
been accumulating data for about 4weeks.

The nodes have increased in load and compaction seems to be falling
behind.  I used to get about 1 file per day for this column family, about
~30GB Data.db file per day.  I am now getting hundreds per day at  1mb -
50mb.

How to recover from this?

I can scale out to give some breathing room but will it go back and compact
the old days into nicely packed files for the day?

I tried setting compaction throughput to 1000 from 256 and it seemed to
make things worse for the CPU, it's configured on i3.2xl with 8 compaction
threads.

-B

Lastly, I have mixed TTLs in this CF and need to run a repair (I think) to
get rid of old tombstones, however running repairs in 2.1 on TWCS column
families causes a very large spike in sstable counts due to anti-compaction
which causes a lot of disruption, is there any other way?


Re: TWCS Compaction backed up

2018-08-07 Thread Brian Spindler
Hi Jonathan, both I believe.

The window size is 1 day, full settings:
AND compaction = {'timestamp_resolution': 'MILLISECONDS',
'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
'tombstone_threshold': '0.2', 'class':
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}


nodetool tpstats

Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage 0 068582241832 0
   0
ReadStage 0 0  209566303 0
   0
RequestResponseStage  0 044680860850 0
   0
ReadRepairStage   0 0   24562722 0
   0
CounterMutationStage  0 0  0 0
   0
MiscStage 0 0  0 0
   0
HintedHandoff 1 1203 0
   0
GossipStage   0 08471784 0
   0
CacheCleanupExecutor  0 0122 0
   0
InternalResponseStage 0 0 552125 0
   0
CommitLogArchiver 0 0  0 0
   0
CompactionExecutor8421433715 0
   0
ValidationExecutor0 0   2521 0
   0
MigrationStage0 0 527549 0
   0
AntiEntropyStage  0 0   7697 0
   0
PendingRangeCalculator0 0 17 0
   0
Sampler   0 0  0 0
   0
MemtableFlushWriter   0 0 116966 0
   0
MemtablePostFlush 0 0 209103 0
   0
MemtableReclaimMemory 0 0 116966 0
   0
Native-Transport-Requests 1 0 1715937778 0
  176262

Message type   Dropped
READ 2
RANGE_SLICE  0
_TRACE   0
MUTATION  4390
COUNTER_MUTATION 0
BINARY   0
REQUEST_RESPONSE  1882
PAGED_RANGE  0
READ_REPAIR  0


On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad  wrote:

> What's your window size?
>
> When you say backed up, how are you measuring that?  Are there pending
> tasks or do you just see more files than you expect?
>
> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler 
> wrote:
>
>> Hey guys, quick question:
>>
>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on
>> one drive, data on nvme.  That was working very well, it's a ts db and has
>> been accumulating data for about 4weeks.
>>
>> The nodes have increased in load and compaction seems to be falling
>> behind.  I used to get about 1 file per day for this column family, about
>> ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb -
>> 50mb.
>>
>> How to recover from this?
>>
>> I can scale out to give some breathing room but will it go back and
>> compact the old days into nicely packed files for the day?
>>
>> I tried setting compaction throughput to 1000 from 256 and it seemed to
>> make things worse for the CPU, it's configured on i3.2xl with 8 compaction
>> threads.
>>
>> -B
>>
>> Lastly, I have mixed TTLs in this CF and need to run a repair (I think)
>> to get rid of old tombstones, however running repairs in 2.1 on TWCS column
>> families causes a very large spike in sstable counts due to anti-compaction
>> which causes a lot of disruption, is there any other way?
>>
>>
>>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>


Re: TWCS Compaction backed up

2018-08-07 Thread Brian Spindler
Hi Jeff, mostly lots of little files, like there will be 4-5 that are
1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.

Re incremental repair; Yes one of my engineers started an incremental
repair on this column family that we had to abort.  In fact, the node that
the repair was initiated on ran out of disk space and we ended replacing
that node like a dead node.

Oddly the new node is experiencing this issue as well.

-B


On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa  wrote:

> You could toggle off the tombstone compaction to see if that helps, but
> that should be lower priority than normal compactions
>
> Are the lots-of-little-files from memtable flushes or
> repair/anticompaction?
>
> Do you do normal deletes? Did you try to run Incremental repair?
>
> --
> Jeff Jirsa
>
>
> On Aug 7, 2018, at 5:00 PM, Brian Spindler 
> wrote:
>
> Hi Jonathan, both I believe.
>
> The window size is 1 day, full settings:
> AND compaction = {'timestamp_resolution': 'MILLISECONDS',
> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
> 'tombstone_threshold': '0.2', 'class':
> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
>
>
> nodetool tpstats
>
> Pool NameActive   Pending  Completed   Blocked
> All time blocked
> MutationStage 0 068582241832 0
>  0
> ReadStage 0 0  209566303 0
>  0
> RequestResponseStage  0 044680860850 0
>  0
> ReadRepairStage   0 0   24562722 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 1 1203 0
>  0
> GossipStage   0 08471784 0
>  0
> CacheCleanupExecutor  0 0122 0
>  0
> InternalResponseStage 0 0 552125 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor8421433715 0
>  0
> ValidationExecutor0 0   2521 0
>  0
> MigrationStage0 0 527549 0
>  0
> AntiEntropyStage  0 0   7697 0
>  0
> PendingRangeCalculator0 0 17 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0 116966 0
>  0
> MemtablePostFlush 0 0 209103 0
>  0
> MemtableReclaimMemory 0 0 116966 0
>  0
> Native-Transport-Requests 1 0 1715937778 0
> 176262
>
> Message type   Dropped
> READ 2
> RANGE_SLICE  0
> _TRACE   0
> MUTATION  4390
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE  1882
> PAGED_RANGE  0
> READ_REPAIR  0
>
>
> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad  wrote:
>
>> What's your window size?
>>
>> When you say backed up, how are you measuring that?  Are there pending
>> tasks or do you just see more files than you expect?
>>
>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler 
>> wrote:
>>
>>> Hey guys, quick question:
>>>
>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on
>>> one drive, data on nvme.  That was working very well, it's a ts db and has
>>> been accumulating data for about 4weeks.
>>>
>>> The nodes have increased in load and compaction seems to be falling
>>> behind.  I used to get about 1 file per day for this column family, about
>>> ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb -
>>> 50mb.
>>>
>>> How to recover from this?
>>>
>&

Re: TWCS Compaction backed up

2018-08-07 Thread brian . spindler
Everything is ttl’d 

I suppose I could use sstablemeta to see the repaired bit, could I just set 
that to unrepaired somehow and that would fix? 

Thanks!

> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa  wrote:
> 
> May be worth seeing if any of the sstables got promoted to repaired - if so 
> they’re not eligible for compaction with unrepaired sstables and that could 
> explain some higher counts
> 
> Do you actually do deletes or is everything ttl’d?
>  
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Aug 7, 2018, at 5:09 PM, Brian Spindler  wrote:
>> 
>> Hi Jeff, mostly lots of little files, like there will be 4-5 that are 
>> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.   
>> 
>> Re incremental repair; Yes one of my engineers started an incremental repair 
>> on this column family that we had to abort.  In fact, the node that the 
>> repair was initiated on ran out of disk space and we ended replacing that 
>> node like a dead node.   
>> 
>> Oddly the new node is experiencing this issue as well.  
>> 
>> -B
>> 
>> 
>>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa  wrote:
>>> You could toggle off the tombstone compaction to see if that helps, but 
>>> that should be lower priority than normal compactions
>>> 
>>> Are the lots-of-little-files from memtable flushes or repair/anticompaction?
>>> 
>>> Do you do normal deletes? Did you try to run Incremental repair?  
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler  
>>>> wrote:
>>>> 
>>>> Hi Jonathan, both I believe.  
>>>> 
>>>> The window size is 1 day, full settings: 
>>>> AND compaction = {'timestamp_resolution': 'MILLISECONDS', 
>>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', 
>>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': 
>>>> '86400', 'tombstone_threshold': '0.2', 'class': 
>>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} 
>>>> 
>>>> 
>>>> nodetool tpstats 
>>>> 
>>>> Pool NameActive   Pending  Completed   Blocked  
>>>> All time blocked
>>>> MutationStage 0 068582241832 0 
>>>> 0
>>>> ReadStage 0 0  209566303 0 
>>>> 0
>>>> RequestResponseStage  0 044680860850 0 
>>>> 0
>>>> ReadRepairStage   0 0   24562722 0 
>>>> 0
>>>> CounterMutationStage  0 0  0 0 
>>>> 0
>>>> MiscStage 0 0  0 0 
>>>> 0
>>>> HintedHandoff 1 1203 0 
>>>> 0
>>>> GossipStage   0 08471784 0 
>>>> 0
>>>> CacheCleanupExecutor  0 0122 0 
>>>> 0
>>>> InternalResponseStage 0 0 552125 0 
>>>> 0
>>>> CommitLogArchiver 0 0  0 0 
>>>> 0
>>>> CompactionExecutor8421433715 0 
>>>> 0
>>>> ValidationExecutor0 0   2521 0 
>>>> 0
>>>> MigrationStage0 0 527549 0 
>>>> 0
>>>> AntiEntropyStage  0 0   7697 0 
>>>> 0
>>>> PendingRangeCalculator0 0 17 0 
>>>> 0
>>>> Sampler   0 0  0 0 
>>>> 0
>>>> MemtableFlushWriter   0 0 116966     0 
>>>> 0
>>>> MemtablePostFlush 0 0 209103 0 
>>>>  

Re: TWCS Compaction backed up

2018-08-07 Thread Brian Spindler
Hi, I spot checked a couple of the files that were ~200MB and the mostly
had "Repaired at: 0" so maybe that's not it?

-B


On Tue, Aug 7, 2018 at 8:16 PM  wrote:

> Everything is ttl’d
>
> I suppose I could use sstablemeta to see the repaired bit, could I just
> set that to unrepaired somehow and that would fix?
>
> Thanks!
>
> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa  wrote:
>
> May be worth seeing if any of the sstables got promoted to repaired - if
> so they’re not eligible for compaction with unrepaired sstables and that
> could explain some higher counts
>
> Do you actually do deletes or is everything ttl’d?
>
>
> --
> Jeff Jirsa
>
>
> On Aug 7, 2018, at 5:09 PM, Brian Spindler 
> wrote:
>
> Hi Jeff, mostly lots of little files, like there will be 4-5 that are
> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.
>
> Re incremental repair; Yes one of my engineers started an incremental
> repair on this column family that we had to abort.  In fact, the node that
> the repair was initiated on ran out of disk space and we ended replacing
> that node like a dead node.
>
> Oddly the new node is experiencing this issue as well.
>
> -B
>
>
> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa  wrote:
>
>> You could toggle off the tombstone compaction to see if that helps, but
>> that should be lower priority than normal compactions
>>
>> Are the lots-of-little-files from memtable flushes or
>> repair/anticompaction?
>>
>> Do you do normal deletes? Did you try to run Incremental repair?
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Aug 7, 2018, at 5:00 PM, Brian Spindler 
>> wrote:
>>
>> Hi Jonathan, both I believe.
>>
>> The window size is 1 day, full settings:
>> AND compaction = {'timestamp_resolution': 'MILLISECONDS',
>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
>> 'tombstone_threshold': '0.2', 'class':
>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
>>
>>
>> nodetool tpstats
>>
>> Pool NameActive   Pending  Completed   Blocked
>> All time blocked
>> MutationStage 0 068582241832 0
>>  0
>> ReadStage 0 0  209566303 0
>>  0
>> RequestResponseStage  0 044680860850 0
>>  0
>> ReadRepairStage   0 0   24562722 0
>>  0
>> CounterMutationStage  0 0  0 0
>>  0
>> MiscStage 0 0  0 0
>>  0
>> HintedHandoff 1 1203 0
>>  0
>> GossipStage   0 08471784 0
>>  0
>> CacheCleanupExecutor  0 0122 0
>>  0
>> InternalResponseStage 0 0 552125 0
>>  0
>> CommitLogArchiver 0 0  0 0
>>  0
>> CompactionExecutor8421433715 0
>>  0
>> ValidationExecutor0 0   2521 0
>>  0
>> MigrationStage0 0 527549 0
>>  0
>> AntiEntropyStage  0 0   7697 0
>>  0
>> PendingRangeCalculator0 0 17 0
>>  0
>> Sampler   0 0  0 0
>>  0
>> MemtableFlushWriter   0 0 116966 0
>>  0
>> MemtablePostFlush 0 0 209103 0
>>  0
>> MemtableReclaimMemory 0 0 116966 0
>>  0
>> Native-Transport-Requests     1 0 1715937778 0
>> 176262
>>
>> Message type   Dropped
>> READ 2
>> RANGE_SLICE  0
>> _TRACE   0
>> MUTATION  4390
>> COUNTER_MUTATI

Re: TWCS Compaction backed up

2018-08-07 Thread Brian Spindler
In fact all of them say Repaired at: 0.

On Tue, Aug 7, 2018 at 9:13 PM Brian Spindler 
wrote:

> Hi, I spot checked a couple of the files that were ~200MB and the mostly
> had "Repaired at: 0" so maybe that's not it?
>
> -B
>
>
> On Tue, Aug 7, 2018 at 8:16 PM  wrote:
>
>> Everything is ttl’d
>>
>> I suppose I could use sstablemeta to see the repaired bit, could I just
>> set that to unrepaired somehow and that would fix?
>>
>> Thanks!
>>
>> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa  wrote:
>>
>> May be worth seeing if any of the sstables got promoted to repaired - if
>> so they’re not eligible for compaction with unrepaired sstables and that
>> could explain some higher counts
>>
>> Do you actually do deletes or is everything ttl’d?
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Aug 7, 2018, at 5:09 PM, Brian Spindler 
>> wrote:
>>
>> Hi Jeff, mostly lots of little files, like there will be 4-5 that are
>> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.
>>
>> Re incremental repair; Yes one of my engineers started an incremental
>> repair on this column family that we had to abort.  In fact, the node that
>> the repair was initiated on ran out of disk space and we ended replacing
>> that node like a dead node.
>>
>> Oddly the new node is experiencing this issue as well.
>>
>> -B
>>
>>
>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa  wrote:
>>
>>> You could toggle off the tombstone compaction to see if that helps, but
>>> that should be lower priority than normal compactions
>>>
>>> Are the lots-of-little-files from memtable flushes or
>>> repair/anticompaction?
>>>
>>> Do you do normal deletes? Did you try to run Incremental repair?
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler 
>>> wrote:
>>>
>>> Hi Jonathan, both I believe.
>>>
>>> The window size is 1 day, full settings:
>>> AND compaction = {'timestamp_resolution': 'MILLISECONDS',
>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
>>> 'tombstone_threshold': '0.2', 'class':
>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
>>>
>>>
>>> nodetool tpstats
>>>
>>> Pool NameActive   Pending  Completed   Blocked
>>> All time blocked
>>> MutationStage 0 068582241832 0
>>>0
>>> ReadStage 0 0  209566303 0
>>>0
>>> RequestResponseStage  0 044680860850 0
>>>0
>>> ReadRepairStage   0 0   24562722 0
>>>0
>>> CounterMutationStage  0 0  0 0
>>>0
>>> MiscStage 0 0  0 0
>>>0
>>> HintedHandoff 1 1203 0
>>>0
>>> GossipStage   0 08471784 0
>>>0
>>> CacheCleanupExecutor  0 0122 0
>>>0
>>> InternalResponseStage 0 0 552125 0
>>>0
>>> CommitLogArchiver 0 0  0 0
>>>0
>>> CompactionExecutor8421433715 0
>>>0
>>> ValidationExecutor0 0   2521 0
>>>0
>>> MigrationStage0 0 527549 0
>>>0
>>> AntiEntropyStage  0 0   7697 0
>>>0
>>> PendingRangeCalculator0 0 17 0
>>>0
>>> Sampler   0 0  0 0
>>>0
>>> MemtableFlushWriter   0     0 116966  

Re: TWCS Compaction backed up

2018-08-08 Thread Brian Spindler
Hi Jeff/Jon et al, here is what I'm thinking to do to clean up, please lmk
what you think.

This is precisely my problem I believe:
http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html

With this I have a lot of wasted space due to a bad incremental repair.  So
I am thinking to abandon incremental repairs by;
- Set all repairedAt values to 0 on any/all *Data.db SSTables
- using either range_repair.py or reaper run sub range repairs

Will this clean everything up?


On Tue, Aug 7, 2018 at 9:18 PM Brian Spindler 
wrote:

> In fact all of them say Repaired at: 0.
>
> On Tue, Aug 7, 2018 at 9:13 PM Brian Spindler 
> wrote:
>
>> Hi, I spot checked a couple of the files that were ~200MB and the mostly
>> had "Repaired at: 0" so maybe that's not it?
>>
>> -B
>>
>>
>> On Tue, Aug 7, 2018 at 8:16 PM  wrote:
>>
>>> Everything is ttl’d
>>>
>>> I suppose I could use sstablemeta to see the repaired bit, could I just
>>> set that to unrepaired somehow and that would fix?
>>>
>>> Thanks!
>>>
>>> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa  wrote:
>>>
>>> May be worth seeing if any of the sstables got promoted to repaired - if
>>> so they’re not eligible for compaction with unrepaired sstables and that
>>> could explain some higher counts
>>>
>>> Do you actually do deletes or is everything ttl’d?
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Aug 7, 2018, at 5:09 PM, Brian Spindler 
>>> wrote:
>>>
>>> Hi Jeff, mostly lots of little files, like there will be 4-5 that are
>>> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.
>>>
>>> Re incremental repair; Yes one of my engineers started an incremental
>>> repair on this column family that we had to abort.  In fact, the node that
>>> the repair was initiated on ran out of disk space and we ended replacing
>>> that node like a dead node.
>>>
>>> Oddly the new node is experiencing this issue as well.
>>>
>>> -B
>>>
>>>
>>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa  wrote:
>>>
>>>> You could toggle off the tombstone compaction to see if that helps, but
>>>> that should be lower priority than normal compactions
>>>>
>>>> Are the lots-of-little-files from memtable flushes or
>>>> repair/anticompaction?
>>>>
>>>> Do you do normal deletes? Did you try to run Incremental repair?
>>>>
>>>> --
>>>> Jeff Jirsa
>>>>
>>>>
>>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler 
>>>> wrote:
>>>>
>>>> Hi Jonathan, both I believe.
>>>>
>>>> The window size is 1 day, full settings:
>>>> AND compaction = {'timestamp_resolution': 'MILLISECONDS',
>>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
>>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
>>>> 'tombstone_threshold': '0.2', 'class':
>>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
>>>>
>>>>
>>>> nodetool tpstats
>>>>
>>>> Pool NameActive   Pending  Completed   Blocked
>>>> All time blocked
>>>> MutationStage 0 068582241832 0
>>>>0
>>>> ReadStage 0 0  209566303 0
>>>>0
>>>> RequestResponseStage  0 044680860850 0
>>>>0
>>>> ReadRepairStage   0 0   24562722 0
>>>>0
>>>> CounterMutationStage  0 0  0 0
>>>>0
>>>> MiscStage 0 0  0 0
>>>>0
>>>> HintedHandoff 1 1203 0
>>>>0
>>>> GossipStage   0 08471784 0
>>>>0
>>>> CacheCleanupExecutor  0 0122 0
>>>>0
>>>> Interna

Re: Upgrade from 2.1 to 3.11

2018-08-28 Thread Brian Spindler
Ma, did you try what Mohamadreza suggested?  Have a such a large heap means
you are getting a ton of stuff that needs full GC.

On Tue, Aug 28, 2018 at 4:31 AM Pradeep Chhetri 
wrote:

> You may want to try upgrading to 3.11.3 instead which has some memory
> leaks fixes.
>
> On Tue, Aug 28, 2018 at 9:59 AM, Mun Dega  wrote:
>
>> I am surprised that no one else ran into any issues with this version.
>> GC can't catch up fast enough and there is constant Full GC taking place.
>>
>> The result? unresponsive nodes makeing entire cluster unusable.
>>
>> Any insight on this issue from anyone that is using this version would be
>> appreciated.
>>
>> Ma
>>
>> On Fri, Aug 24, 2018, 04:30 Mohamadreza Rostami <
>> mohamadrezarosta...@gmail.com> wrote:
>>
>>> You have very large heap,it’s take most of  cpu time in GC stage.you
>>> should in maximum set heap on 12GB and enable row cache to your cluster
>>> become faster.
>>>
>>> On Friday, 24 August 2018, Mun Dega  wrote:
>>>
 120G data
 28G heap out of 48 on system
 9 node cluster, RF3


 On Thu, Aug 23, 2018, 17:19 Mohamadreza Rostami <
 mohamadrezarosta...@gmail.com> wrote:

> Hi,
> How much data do you have? How much RAM do your servers have? How much
> do you have a heep?
> On Thu, Aug 23, 2018 at 10:14 PM Mun Dega  wrote:
>
>> Hello,
>>
>> We recently upgraded from Cassandra 2.1 to 3.11.2 on one cluster.
>> The process went OK including upgradesstable but we started to experience
>> high latency for r/w, occasional OOM and long GC pause after.
>>
>> For the same cluster with 2.1, we didn't have any issues like this.  We
>> also kept server specs, heap, all the same in post upgrade
>>
>> Has anyone else had similar issues going to 3.11 and what are the
>> major changes that could have such a major setback in the new version?
>>
>> Ma Dega
>>
>
>


upgrading from 2.x TWCS to 3.x TWCS

2018-11-02 Thread Brian Spindler
Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we have
several column families configured with twcs class
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and
with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy'

Is that a safe operation?  Will cassandra even start if the column family
has a compaction strategy defined with a classname it cannot resolve?
How to deal with different versioned nodes and different class names during
the upgrade of the binaries throughout the cluster?

Thank you for any guidance,
Brian


Re: upgrading from 2.x TWCS to 3.x TWCS

2018-11-02 Thread Brian Spindler
[image: image.png]

On Fri, Nov 2, 2018 at 2:34 PM Jeff Jirsa  wrote:

> Easiest approach is to build the 3.11 jar from my repo, upgrade, then
> ALTER table to use the official TWCS (org.apache.cassandra) jar
>
> Sorry for the headache. I hope I have a 3.11 branch for you.
>
>
> --
> Jeff Jirsa
>
>
> On Nov 2, 2018, at 11:28 AM, Brian Spindler 
> wrote:
>
> Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we have
> several column families configured with twcs class
> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and
> with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy'
>
> Is that a safe operation?  Will cassandra even start if the column family
> has a compaction strategy defined with a classname it cannot resolve?
> How to deal with different versioned nodes and different class names
> during the upgrade of the binaries throughout the cluster?
>
> Thank you for any guidance,
> Brian
>
>


Re: upgrading from 2.x TWCS to 3.x TWCS

2018-11-02 Thread Brian Spindler
Nevermind, I spoke to quickly.  I can change the cass version in the
pom.xml and re compile, thanks!

On Fri, Nov 2, 2018 at 2:38 PM Brian Spindler 
wrote:

> [image: image.png]
>
>
> On Fri, Nov 2, 2018 at 2:34 PM Jeff Jirsa  wrote:
>
>> Easiest approach is to build the 3.11 jar from my repo, upgrade, then
>> ALTER table to use the official TWCS (org.apache.cassandra) jar
>>
>> Sorry for the headache. I hope I have a 3.11 branch for you.
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Nov 2, 2018, at 11:28 AM, Brian Spindler 
>> wrote:
>>
>> Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we
>> have several column families configured with twcs class
>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and
>> with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy'
>>
>> Is that a safe operation?  Will cassandra even start if the column family
>> has a compaction strategy defined with a classname it cannot resolve?
>> How to deal with different versioned nodes and different class names
>> during the upgrade of the binaries throughout the cluster?
>>
>> Thank you for any guidance,
>> Brian
>>
>>


Re: upgrading from 2.x TWCS to 3.x TWCS

2018-11-02 Thread Brian Spindler
you are right, it won't even compile:

[INFO] -
[ERROR] COMPILATION ERROR :
[INFO] -
[ERROR]
/Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[110,90]
cannot find symbol
  symbol:   method
getOverlappingSSTables(org.apache.cassandra.db.lifecycle.SSTableSet,java.util.Set)
  location: variable cfs of type org.apache.cassandra.db.ColumnFamilyStore
[ERROR]
/Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[150,99]
cannot find symbol
  symbol:   class SizeComparator
  location: class org.apache.cassandra.io.sstable.format.SSTableReader
[ERROR]
/Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[330,59]
cannot find symbol
  symbol:   class SizeComparator
  location: class org.apache.cassandra.io.sstable.format.SSTableReader
[ERROR]
/Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/SizeTieredCompactionStrategy.java:[104,67]
cannot find symbol
  symbol:   class SizeComparator
  location: class org.apache.cassandra.io.sstable.format.SSTableReader
[INFO] 4 errors



On Fri, Nov 2, 2018 at 2:52 PM Jeff Jirsa  wrote:

> There’s a chance it will fail to work - possible method signatures changed
> between 3.0 and 3.11. Try it in a test cluster before prod
>
>
> --
> Jeff Jirsa
>
>
> On Nov 2, 2018, at 11:49 AM, Brian Spindler 
> wrote:
>
> Nevermind, I spoke to quickly.  I can change the cass version in the
> pom.xml and re compile, thanks!
>
> On Fri, Nov 2, 2018 at 2:38 PM Brian Spindler 
> wrote:
>
>> 
>>
>>
>> On Fri, Nov 2, 2018 at 2:34 PM Jeff Jirsa  wrote:
>>
>>> Easiest approach is to build the 3.11 jar from my repo, upgrade, then
>>> ALTER table to use the official TWCS (org.apache.cassandra) jar
>>>
>>> Sorry for the headache. I hope I have a 3.11 branch for you.
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Nov 2, 2018, at 11:28 AM, Brian Spindler 
>>> wrote:
>>>
>>> Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we
>>> have several column families configured with twcs class
>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and
>>> with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy'
>>>
>>> Is that a safe operation?  Will cassandra even start if the column
>>> family has a compaction strategy defined with a classname it cannot
>>> resolve?
>>> How to deal with different versioned nodes and different class names
>>> during the upgrade of the binaries throughout the cluster?
>>>
>>> Thank you for any guidance,
>>> Brian
>>>
>>>


Re: upgrading from 2.x TWCS to 3.x TWCS

2018-11-02 Thread Brian Spindler
I hope I can do as you suggest and leapfrog to 3.11 rather than two
stepping it from 3.7->3.11

Just having TWCS has saved me lots of hassle so it’s all good, thanks for
all you do for our community.

-B

On Fri, Nov 2, 2018 at 3:54 PM Jeff Jirsa  wrote:

> I'm sincerely sorry for the hassle, but for various reasons beyond, it's
> unlikely I'll update my repo (at least me, personally). One fix is likely
> to grab the actual java classes from the apache repo, pull them in and fix
> the package names, and compile (essentially making your own 3.11 branch).
>
> I suppose you could also disable compaction, switch to something else
> (stcs), do the upgrade, then alter it back to the official TWCS. Whether or
> not this is viable depends on how quickly you write, and how long it'll
> take you to upgrade.
>
>
>
>
>
> On Fri, Nov 2, 2018 at 11:57 AM Brian Spindler 
> wrote:
>
>> you are right, it won't even compile:
>>
>> [INFO] -
>> [ERROR] COMPILATION ERROR :
>> [INFO] -
>> [ERROR]
>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[110,90]
>> cannot find symbol
>>   symbol:   method
>> getOverlappingSSTables(org.apache.cassandra.db.lifecycle.SSTableSet,java.util.Set)
>>   location: variable cfs of type org.apache.cassandra.db.ColumnFamilyStore
>> [ERROR]
>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[150,99]
>> cannot find symbol
>>   symbol:   class SizeComparator
>>   location: class org.apache.cassandra.io.sstable.format.SSTableReader
>> [ERROR]
>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[330,59]
>> cannot find symbol
>>   symbol:   class SizeComparator
>>   location: class org.apache.cassandra.io.sstable.format.SSTableReader
>> [ERROR]
>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/SizeTieredCompactionStrategy.java:[104,67]
>> cannot find symbol
>>   symbol:   class SizeComparator
>>   location: class org.apache.cassandra.io.sstable.format.SSTableReader
>> [INFO] 4 errors
>>
>>
>>
>> On Fri, Nov 2, 2018 at 2:52 PM Jeff Jirsa  wrote:
>>
>>> There’s a chance it will fail to work - possible method signatures
>>> changed between 3.0 and 3.11. Try it in a test cluster before prod
>>>
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Nov 2, 2018, at 11:49 AM, Brian Spindler 
>>> wrote:
>>>
>>> Nevermind, I spoke to quickly.  I can change the cass version in the
>>> pom.xml and re compile, thanks!
>>>
>>> On Fri, Nov 2, 2018 at 2:38 PM Brian Spindler 
>>> wrote:
>>>
>>>> 
>>>>
>>>>
>>>> On Fri, Nov 2, 2018 at 2:34 PM Jeff Jirsa  wrote:
>>>>
>>>>> Easiest approach is to build the 3.11 jar from my repo, upgrade, then
>>>>> ALTER table to use the official TWCS (org.apache.cassandra) jar
>>>>>
>>>>> Sorry for the headache. I hope I have a 3.11 branch for you.
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Jirsa
>>>>>
>>>>>
>>>>> On Nov 2, 2018, at 11:28 AM, Brian Spindler 
>>>>> wrote:
>>>>>
>>>>> Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we
>>>>> have several column families configured with twcs class
>>>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and
>>>>> with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy'
>>>>>
>>>>> Is that a safe operation?  Will cassandra even start if the column
>>>>> family has a compaction strategy defined with a classname it cannot
>>>>> resolve?
>>>>> How to deal with different versioned nodes and different class names
>>>>> during the upgrade of the binaries throughout the cluster?
>>>>>
>>>>> Thank you for any guidance,
>>>>> Brian
>>>>>
>>>>> --
-Brian


Re: upgrading from 2.x TWCS to 3.x TWCS

2018-11-02 Thread Brian Spindler
That wasn't horrible at all.  After testing, provided all goes well I can
submit this back to the main TWCS repo if you think it's worth it.

Either way do you mind just reviewing briefly for obvious mistakes?

https://github.com/bspindler/twcs/commit/7ba388dbf41b1c9dc1b70661ad69273b258139da

Thanks!

On Fri, Nov 2, 2018 at 7:24 PM Brian Spindler 
wrote:

> I hope I can do as you suggest and leapfrog to 3.11 rather than two
> stepping it from 3.7->3.11
>
> Just having TWCS has saved me lots of hassle so it’s all good, thanks for
> all you do for our community.
>
> -B
>
> On Fri, Nov 2, 2018 at 3:54 PM Jeff Jirsa  wrote:
>
>> I'm sincerely sorry for the hassle, but for various reasons beyond, it's
>> unlikely I'll update my repo (at least me, personally). One fix is likely
>> to grab the actual java classes from the apache repo, pull them in and fix
>> the package names, and compile (essentially making your own 3.11 branch).
>>
>> I suppose you could also disable compaction, switch to something else
>> (stcs), do the upgrade, then alter it back to the official TWCS. Whether or
>> not this is viable depends on how quickly you write, and how long it'll
>> take you to upgrade.
>>
>>
>>
>>
>>
>> On Fri, Nov 2, 2018 at 11:57 AM Brian Spindler 
>> wrote:
>>
>>> you are right, it won't even compile:
>>>
>>> [INFO] -
>>> [ERROR] COMPILATION ERROR :
>>> [INFO] -
>>> [ERROR]
>>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[110,90]
>>> cannot find symbol
>>>   symbol:   method
>>> getOverlappingSSTables(org.apache.cassandra.db.lifecycle.SSTableSet,java.util.Set)
>>>   location: variable cfs of type
>>> org.apache.cassandra.db.ColumnFamilyStore
>>> [ERROR]
>>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[150,99]
>>> cannot find symbol
>>>   symbol:   class SizeComparator
>>>   location: class org.apache.cassandra.io.sstable.format.SSTableReader
>>> [ERROR]
>>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/TimeWindowCompactionStrategy.java:[330,59]
>>> cannot find symbol
>>>   symbol:   class SizeComparator
>>>   location: class org.apache.cassandra.io.sstable.format.SSTableReader
>>> [ERROR]
>>> /Users/bspindler/src/github/twcs/src/main/java/com/jeffjirsa/cassandra/db/compaction/SizeTieredCompactionStrategy.java:[104,67]
>>> cannot find symbol
>>>   symbol:   class SizeComparator
>>>   location: class org.apache.cassandra.io.sstable.format.SSTableReader
>>> [INFO] 4 errors
>>>
>>>
>>>
>>> On Fri, Nov 2, 2018 at 2:52 PM Jeff Jirsa  wrote:
>>>
>>>> There’s a chance it will fail to work - possible method signatures
>>>> changed between 3.0 and 3.11. Try it in a test cluster before prod
>>>>
>>>>
>>>> --
>>>> Jeff Jirsa
>>>>
>>>>
>>>> On Nov 2, 2018, at 11:49 AM, Brian Spindler 
>>>> wrote:
>>>>
>>>> Nevermind, I spoke to quickly.  I can change the cass version in the
>>>> pom.xml and re compile, thanks!
>>>>
>>>> On Fri, Nov 2, 2018 at 2:38 PM Brian Spindler 
>>>> wrote:
>>>>
>>>>> 
>>>>>
>>>>>
>>>>> On Fri, Nov 2, 2018 at 2:34 PM Jeff Jirsa  wrote:
>>>>>
>>>>>> Easiest approach is to build the 3.11 jar from my repo, upgrade, then
>>>>>> ALTER table to use the official TWCS (org.apache.cassandra) jar
>>>>>>
>>>>>> Sorry for the headache. I hope I have a 3.11 branch for you.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Jirsa
>>>>>>
>>>>>>
>>>>>> On Nov 2, 2018, at 11:28 AM, Brian Spindler 
>>>>>> wrote:
>>>>>>
>>>>>> Hi all, we're planning an upgrade from 2.1.5->3.11.3 and currently we
>>>>>> have several column families configured with twcs class
>>>>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' and
>>>>>> with 3.11.3 we need to set it to 'TimeWindowCompactionStrategy'
>>>>>>
>>>>>> Is that a safe operation?  Will cassandra even start if the column
>>>>>> family has a compaction strategy defined with a classname it cannot
>>>>>> resolve?
>>>>>> How to deal with different versioned nodes and different class names
>>>>>> during the upgrade of the binaries throughout the cluster?
>>>>>>
>>>>>> Thank you for any guidance,
>>>>>> Brian
>>>>>>
>>>>>> --
> -Brian
>


Cassandra lucene secondary indexes

2018-12-12 Thread Brian Spindler
Hi all, we recently started using the cassandra-lucene secondary index
support that Instaclustr recently assumed ownership of, thank you btw!

We are experiencing a strange issue where adding/removing nodes fails and
the joining node is left hung with a compaction "Secondary index build" and
it just never completes.

We're running v3.11.3 of Cassandra and the plugin, has anyone experienced
this before?

It's a relatively small cluster ~6 nodes in our user acceptance environment
and so not a lot of load either.

Thanks!

-- 
-Brian