Re: update cassandra.yaml file on number of cluster nodes

2021-10-18 Thread vytenis silgalis
Yep, also use Ansible with configs living in git here.

On Fri, Oct 15, 2021 at 5:19 PM Bowen Song  wrote:

> We have Cassandra on bare-metal servers, and we manage our servers via
> Ansible. In this use case, we create an Ansible playbook to update the
> servers one by one, change the cassandra.yaml file, restart Cassandra, and
> wait for it to finish the restart, and then wait for a few minutes before
> moving on to the next server.
> On 15/10/2021 22:42, ZAIDI, ASAD wrote:
>
>
>
> Hello Folks,
>
>
>
> Can you guys please suggest tool or approach  to update  cassandra.yaml
> file in multi-dc environment with large number of nodes efficiently.
>
>
>
> Thank you.
>
> Asad
>
>
>
>
>
>


Re: Schema collision results in multiple data directories per table

2021-10-13 Thread vytenis silgalis
You ran the `alter keyspace` command on the original dc1 nodes or the new
dc2 nodes?

On Wed, Oct 13, 2021 at 8:15 AM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Hi Tom,
>
> while I am not completely sure what might cause your issue, I just
> want to highlight that schema agreements were overhauled in 4.0 (1) a
> lot so that may be somehow related to what that ticket was trying to
> fix.
>
> Regards
>
> (1) https://issues.apache.org/jira/browse/CASSANDRA-15158
>
> On Fri, 1 Oct 2021 at 18:43, Tom Offermann 
> wrote:
> >
> > When adding a datacenter to a keyspace (following the Last Pickle [Data
> Center Switch][lp] playbook), I ran into a "Configuration exception merging
> remote schema" error. The nodes in one datacenter didn't converge to the
> new schema version, and after restarting them, I saw the symptoms described
> in this Datastax article on [Fixing a table schema collision][ds], where
> there were two data directories for each table in the keyspace on the nodes
> that didn't converge. I followed the recovery steps in the Datastax article
> to move the data from the older directories to the new directories, ran
> `nodetool refresh`, and that fixed the problem.
> >
> > [lp]: https://thelastpickle.com/blog/2019/02/26/data-center-switch.html
> > [ds]:
> https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/useCreateTableCollisionFix.html
> >
> > While the Datastax article was super helpful for helping me recover, I'm
> left wondering *why* this happened. If anyone can shed some light on that,
> or offer advice on how I can avoid getting in this situation in the future,
> I would be most appreciative. I'll describe the steps I took in more detail
> in the thread.
> >
> > ## Steps
> >
> > 1. The day before, I had added the second datacenter ('dc2') to the
> system_traces, system_distributed, and system_auth keyspaces and ran
> `nodetool rebuild` for each of the 3 keyspaces. All of that went smoothly
> with no issues.
> >
> > 2. For a large keyspace, I added the second datacenter ('dc2') with an
> `ALTER KEYSPACE foo WITH replication = {'class': 'NetworkTopologyStrategy',
> 'dc1': '2', 'dc2': '3'};` statement. Immediately, I saw this error in the
> log:
> > ```
> > "ERROR 16:45:47 Exception in thread Thread[MigrationStage:1,5,main]"
> > "org.apache.cassandra.exceptions.ConfigurationException: Column
> family ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected
> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
> > "\tat
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1482)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1438)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1407)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1384)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> org.apache.cassandra.service.MigrationManager$1.runMayThrow(MigrationManager.java:594)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_232]"
> > "\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_232]"
> > "\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[na:1.8.0_232]"
> > "\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_232]"
> > "\tat
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> [apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_232]"
> > ```
> >
> > I also saw this:
> > ```
> > "ERROR 16:46:48 Configuration exception merging remote schema"
> > "org.apache.cassandra.exceptions.ConfigurationException: Column
> family ID mismatch (found 8ad72660-f629-11eb-a217-e1a09d8bc60c; expected
> 20739eb0-d92e-11e6-b42f-e7eb6f21c481)"
> > "\tat
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:949)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:903)
> ~[apache-cassandra-3.11.5.jar:3.11.5]"
> > "\tat
> 

Re: Unable to Gossip

2021-09-10 Thread vytenis silgalis
Hmm. are the ports open on the `new` server?

Looks like it can connect to other nodes but other nodes can't connect to
it.

-Vy

On Fri, Sep 10, 2021 at 10:20 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Good idea.
> There are two seed nodes:
> I see this on one (note 172.16.100.44 is the new node):
>
> DEBUG [CompactionExecutor:1345] 2021-09-10 11:13:49,569
> TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully
> expired SSTables
> INFO  [Messaging-EventLoop-3-10] 2021-09-10 11:14:22,810
> InboundConnectionInitiator.java:464 - /172.16.100.44:7000
> (/172.16.100.44:45970)->/172.16.100.253:7000-URGENT_MESSAGES-30a4fd82
> messaging connection established, version = 12, framing = LZ4, encryption =
> unencrypted
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 MessagingMetrics.java:206
> - GOSSIP_DIGEST_SYN messages were dropped in last 5000 ms: 0 internal and 1
> cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped
> latency: 15137813 ms
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:65 -
> Pool Name   Active   Pending  Completed   Blocked
> All Time Blocked
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 -
> ReadStage0 04729810
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 -
> CompactionExecutor   0 0 384171
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 -
> MutationStage0 0   14540487
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> MemtableReclaimMemory0 0316
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> PendingRangeCalculator   0 0 11
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> GossipStage  0 01126031
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> SecondaryIndexManagement 0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> HintsDispatcher  0 0 15
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> Native-Transport-Requests0 0   13286230
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> RequestResponseStage 0 0   15724485
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> MemtableFlushWriter  0 0298
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> PerDiskMemtableFlushWriter_0 0 0316
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> MemtablePostFlush0 0336
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 -
> Sampler  0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 -
> ValidationExecutor   0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 -
> ViewBuildExecutor0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 -
> CacheCleanupExecutor 0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:79 -
> CompactionManager 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:91 -
> MessagingServicen/a   0/0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:101 -
> Cache Type Size Capacity
> KeysToSave
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:103 -
> KeyCache   75539240
> 104857600  all
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:109 -
> RowCache  0
> 0  all
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:116 -
> Table   Memtable ops,data
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:119 -
> system_schema.columns 0,0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:119 -
> system_schema.types   0,0
> INFO  

Re: [!!Mass Mail]Re: Service Failed but cassandra runs

2021-08-20 Thread vytenis silgalis
The process line looks like it's missing all the cassandra jars, what
command is the `service cassandra start` running?

On Thu, Aug 19, 2021 at 8:18 AM FERON Matthieu  wrote:

> Thanks for your help. I've looked the docker logs for the container but
> didn't ffound nothing helpfull
> --
> *De :* Jim Shaw 
> *Envoyé :* jeudi 19 août 2021 00:25:07
> *À :* user@cassandra.apache.org
> *Objet :* [!!Mass Mail]Re: Service Failed but cassandra runs
>
> you start c* from docker command, right ?  check docker log, may see some
> info helpful.
>
> On Wed, Aug 18, 2021 at 8:58 AM FERON Matthieu 
> wrote:
>
>> Hello you all,
>>
>>
>> I'm trying to set cassandra on a docker container centos7.
>>
>> When I start the service, it says Failed but I see the proccess in memory.
>>
>> When I look in /var/run/cassandra for cassandra.pid it's not there.
>>
>> I've looked on the web and try all fixes I found but none works.
>>
>> It's the 3.11.6.1 version (It's mandatory, I don't have the choice)
>>
>> jvm is provided by java-1.8.0-openjdk-1.8.0.262.b10-1
>>
>> I've activate debug level but don't find any ERROR line
>>
>>
>> Here are the status logs
>>
>> [root@NYTHIVED01 cassandra]# service cassandra status
>> ● cassandra.service - LSB: distributed storage system for structured data
>>Loaded: loaded (/etc/rc.d/init.d/cassandra; bad; vendor preset:
>> disabled)
>>Active: failed (Result: protocol) since Wed 2021-08-18 10:33:32 UTC;
>> 29s ago
>>  Docs: man:systemd-sysv-generator(8)
>>   Process: 8047 ExecStart=/etc/rc.d/init.d/cassandra start (code=exited,
>> status=0/SUCCESS)
>>
>> Aug 18 10:33:31 NYTHIVED01 systemd[1]: Starting LSB: distributed storage
>> system for structured data...
>> Aug 18 10:33:31 NYTHIVED01 su[8057]: (to cassandra) root on none
>> Aug 18 10:33:32 NYTHIVED01 cassandra[8047]: Starting Cassandra: OK
>> Aug 18 10:33:32 NYTHIVED01 systemd[1]: Failed to parse PID from file
>> /var/run/cassandra/cassandra.pid: Numerical result out of range
>> Aug 18 10:33:32 NYTHIVED01 systemd[1]: Failed to start LSB: distributed
>> storage system for structured data.
>> Aug 18 10:33:32 NYTHIVED01 systemd[1]: Unit cassandra.service entered
>> failed state.
>> Aug 18 10:33:32 NYTHIVED01 systemd[1]: cassandra.service failed.
>>
>> Here the process line
>> cassand+  7773 1  1 10:28 ?00:00:14
>> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-1.el7.x86_64/jre/bin/java
>> -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities
>> -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemory
>> r
>>
>> Thank you for your help
>>
>>


Re: How to remove tombstones in a levelled compaction table in Cassandra 2.1.16?

2021-07-06 Thread vytenis silgalis
You might want to take a look at `unchecked_tombstone_compaction` table
setting. The best way to see if this is affecting you is to look at the
sstablemetadata for the sstables and see if your tombstone ratio is higher
than the configured tombstone_threshold ratio (0.2 be default) for the
table.

For example the sstable has a tombstone_threshold of 0.2 but you see
sstables OLDER than 10 days (LCS has a tombstone compaction interval of 10
days, it won't run a tombstone compaction until a sstable is at least 10
days old).

> sstablemetadata example-ka-1233-Data.db | grep droppable
Estimated droppable tombstones: 1.0
^ this is an extreme example but anything greater than .2 on a 10+ day
sstable is a problem.

By default the unchecked_tombstone_compaction setting is false which will
lead to tombstones staying around if a partition spans multiple sstables
(which may happen with LCS over a long period).

Try setting `unchecked_tombstone_compaction` to true, note: that when you
first run this IF any sstables are above the tombstone_ratio setting for
that table they will be compacted, this may cause extra load on the cluster.

Vytenis
... always do your own research and verify what people say. :)

On Mon, Jul 5, 2021 at 10:11 PM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> Thanks Kane for the suggestion.
>
> Regards
> Manish
>
> On Tue, Jul 6, 2021 at 6:19 AM Kane Wilson  wrote:
>
>>
>> In one of our LCS table auto compaction was disabled. Now after years of
>>> run, range queries using spark-cassandra-connector are failing. Cassandra
>>> version is 2.1.16.
>>>
>>> I suspect due to disabling of autocompaction lots of tombstones got
>>> created. And now while reading those are creating issues and queries are
>>> getting timed out. Am I right in my thinking? What is the possible way to
>>> get out of this?
>>>
>>> I thought of using major compaction but for LCS that was introduced in
>>> Cassandra 2.2. Also user defined compactions dont work on LCS tables.
>>>
>>>
>>>
>>> Regards
>>>
>>> Manish Khandelwal
>>>
>>
>> If it's tombstones specifically you'll be able to see errors in the logs
>> regarding passing the tombstone limit. However, disabling compactions could
>> cause lots of problems (especially over years). I wouldn't be surprised if
>> your reads are slow purely because of the number of SSTables you're hitting
>> on each read. Given you've been running without compactions for so long you
>> might want to look at just switching to STCS and re-enabling compactions.
>> Note this should be done with care, as it could cause performance/storage
>> issues.
>>
>> Cheers,
>> Kane
>>
>> --
>> raft.so - Cassandra consulting, support, and managed services
>>
>


Re: How to make Cassandra flush CommitLog files more frequently?

2021-05-05 Thread vytenis silgalis
I believe you could set your tables to flush to disk at specific intervals
(memtable_flush_period_in_ms), note that you'd have to set this for all
tables (not just the CDC enabled tables) to ensure that commitlog files are
flushed to the cdc_raw directory. Or as Dhanunjaya noted you could just
periodically call the `nodetool flush` endpoint to flush all the tables at
one go.

Vytenis





On Tue, May 4, 2021 at 11:04 PM Dhanunjaya Tokala <
dhanunjayatok...@gmail.com> wrote:

> One way to flush commitlog is nodetool flush
> On Cassandra nodes .
>
> On Tue, May 4, 2021 at 3:58 PM Bingqin Zhou  wrote:
>
>> Hi Kane,
>>
>> Thank you for the insights!
>>
>> Reducing the total space on its own will help, however definitely test
>>> this as such a large drop could result in a massive increase in SSTables
>>> and thus compaction overhead. You'll in general want to look into any
>>> property that makes memtables flush more frequently (which is based on heap
>>> size and some tuning properties in cassandra.yaml).
>>
>>
>> If we decrease *memtable_heap_sapce_in_mb* and *memtable_off_space_in_mb*,
>> is it going to cause more compaction activities potentially as well?
>>
>> I'm typically not a fan of using a database as a streaming/workflow
>>> service, so I have to ask have you considered managing this from your
>>> clients rather than using CDC in C*?
>>
>>
>> Actually, the design and initiation of our service is based on the fact
>> that the CDC feature in Cassandra is used for streaming data changes in
>> Cassandra with low latency. If this is not the case, may I understand
>> what's the purpose and the intended use case for the CDC feature in
>> Cassandra please?
>>
>> Thank you so much!
>> Bingqin Zhou
>>
>> On Mon, May 3, 2021 at 5:00 PM Kane Wilson  wrote:
>>
>>> (removing dev)
>>>
>>> commitlog_segment_size_in_mb isn't going to help, in fact you probably
>>> don't want to modify this as it'll reduce the maximum size of your
>>> mutations.
>>> Reducing the total space on its own will help, however definitely test
>>> this as such a large drop could result in a massive increase in SSTables
>>> and thus compaction overhead. You'll in general want to look into any
>>> property that makes memtables flush more frequently (which is based on heap
>>> size and some tuning properties in cassandra.yaml).
>>>
>>> I'm typically not a fan of using a database as a streaming/workflow
>>> service, so I have to ask have you considered managing this from your
>>> clients rather than using CDC in C*?
>>>
>>> raft.so - Cassandra consulting, support, and managed services
>>>
>>>
>>> On Tue, May 4, 2021 at 4:16 AM Bingqin Zhou  wrote:
>>>
 Hi,

 We're working with the CDC feature to develop an agent to stream
 changes in Cassandra DB into Kafka. However, the CDC feature doesn't work
 well for us so far because CommitLog files are rarely flushed into cdc_raw
 directory, and the frequency can be as low as a few months.

 Is there any suggested and feasible way to increase the frequency for
 Cassandra to flush CommitLog files please?

 We're thinking about decreasing *commitlog_segment_size_in_mb* from 32
 to 16, and decreasing *commitlog_total_space_in_mb* from 8192 to 160.
 Does this sound like a reasonable approach? Is there any concern or
 anything we need to be warned about trying this please?

 Thank you!

 Bingqin Zhou

>>>


Re: tablehistogram shows high sstables

2021-04-30 Thread vytenis silgalis
17ms read latency for the 50th percentile is actually a pretty high latency
in my experience, I prefer to see the 75th percentile read latency to be
around 1-2ms.  Of course it depends on your use case and what your
performance objectives are.

On Thu, Apr 29, 2021 at 7:05 AM Kane Wilson  wrote:

> It does imply the SSTables are being read - how big is your data size and
> how much memory on the nodes? It's certainly possible to get low latencies
> despite many SSTables, but I'd expect small read sizes paired with a lot of
> memory.
>
>
> raft.so - Cassandra consulting, support, managed services
>
> On Thu., 29 Apr. 2021, 08:44 Ayub M,  wrote:
>
>> The table has 24 sstables with size tiered compaction, when I run
>> nodetool tablehistograms I see 99% percentile of the queries are showing up
>> 24 as the number of sstables. But the read latency is very low, my
>> understanding from the tableshistograms's sstable column is - it shows how
>> many sstables were read to complete the query. If so reading 24 sstables
>> should take sometime, at least maybe couple of seconds. Am I missing
>> something here? Does checking against index/bloom filters count towards
>> sstable counter as well?
>>
>> Percentile  SSTables Write Latency  Read LatencyPartition Size   
>>  Cell Count
>>   (micros)  (micros)   (bytes)
>> 50%24.00 17.08  17436.92   310   
>>   6
>> 75%24.00 24.60  20924.30   446   
>>   6
>> 95%24.00 42.51  62479.63   770   
>>  10
>> 98%24.00 51.01  74975.55  1597   
>>  17
>> 99%24.00 61.21  74975.55  3311   
>>  24
>> Min18.00  2.30   4866.3287   
>>   0
>> Max24.00943.13  89970.66545791   
>>   17084
>>
>>