Re: Impact of enabling authentication on performance

2020-06-01 Thread Jeff Jirsa
Set the Auth cache to a long validity

Don’t go crazy with RF of system auth

Drop bcrypt rounds if you see massive cpu spikes on reconnect storms


> On Jun 1, 2020, at 11:26 PM, Gil Ganz  wrote:
> 
> 
> Hi
> I have a production 3.11.6 cluster which I'm might want to enable 
> authentication in, I'm trying to understand what will be the performance 
> impact, if any.
> I understand each use case might be different, trying to understand if there 
> is a common % people usually see their performance hit, or if someone has 
> looked into this.
> Gil

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Cassandra crashes when using offheap_objects for memtable_allocation_type

2020-06-01 Thread onmstester onmstester
I just changed these properties to increase flushed file size (decrease number 
of compactions):

memtable_allocation_type from heap_buffers to offheap_objects

memtable_offheap_space_in_mb: from default (2048) to 8192


Using default value for other memtable/compaction/commitlog configurations .


After a few hours some of nodes stopped to do any mutations (dropped mutaion 
increased) and also pending flushes increased, they were just up and running 
and there was only a single CPU core with 100% usage(other cores was 0%). other 
nodes on the cluster determines the node as DN. Could not access 7199 and also 
could not create thread dump even with jstack -F. 



Restarting Cassandra service fixes the problem but after a while some other 
node would be DN.



Am i missing some configurations?  What should i change in cassandra default 
configuration to maximize write throughput in single node/cluster in 
write-heavy scenario for the data model:

Data mode is a single table:

  create table test(

  text partition_key,

  text clustering_key,

  set rows,

  primary key ((partition_key, clustering_key))






vCPU: 12

Memory: 32GB

Node data size: 2TB
Apache cassandra 3.11.2

JVM heap size: 16GB, CMS, 1GB newgen



Sent using https://www.zoho.com/mail/

Impact of enabling authentication on performance

2020-06-01 Thread Gil Ganz
Hi
I have a production 3.11.6 cluster which I'm might want to enable
authentication in, I'm trying to understand what will be the performance
impact, if any.
I understand each use case might be different, trying to understand if
there is a common % people usually see their performance hit, or if someone
has looked into this.
Gil


Re: Cassandra Bootstrap Sequence

2020-06-01 Thread Jai Bheemsen Rao Dhanwada
Thanks Erick,

I see below tasks are being run mostly. I didn't quite understand what
exactly these scheduled tasks are for? Is there a way to reduce the boot-up
time or do I have to live with this delay?

$ zgrep "CompactionStrategyManager.java:380 - Recreating compaction
> strategy" debug.log*  | wc -l
> 3249
> $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for"
> debug.log*  | wc -l
> 6293
> $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  | wc
> -l
> 6308
> $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from
> DiskBoundaries" debug.log*  | wc -l
> 3249






On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
wrote:

> There's quite a lot of steps that takes place during the startup sequence
> between these 2 lines:
>
>
>>> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
>>> backlog; proceeding*INFO  [main] 2020-05-31 23:54:06,867
>>> NativeTransportService.java:70 - Netty using native Epoll event loop
>>>
>>
> For the most part, it's taken up by CompactionStrategyManager and
> DiskBoundaryManager. If you check debug.log, you'll see that it's mostly
> updating disk boundaries. The length of time it takes is proportional to
> the number of tables in the cluster.
>
> Have a look at this section [1] of CassandraDaemon if you're interested
> in the details of the startup sequence. Cheers!
>
> [1]
> https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
>


Re: Cassandra Bootstrap Sequence

2020-06-01 Thread Erick Ramirez
There's quite a lot of steps that takes place during the startup sequence
between these 2 lines:


>> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
>> backlog; proceeding*INFO  [main] 2020-05-31 23:54:06,867
>> NativeTransportService.java:70 - Netty using native Epoll event loop
>>
>
For the most part, it's taken up by CompactionStrategyManager and
DiskBoundaryManager. If you check debug.log, you'll see that it's mostly
updating disk boundaries. The length of time it takes is proportional to
the number of tables in the cluster.

Have a look at this section [1] of CassandraDaemon if you're interested in
the details of the startup sequence. Cheers!

[1]
https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435


Re: Cassandra Bootstrap Sequence

2020-06-01 Thread Reid Pinchback
The thing to look for in GC logs would be signs that you’re bouncing against 
your memory limits and spending a lot of time in full GC collections.

I’m not sure at what phase it kicks in but definitely there is the potential 
for memory issues when you have large column families (large in the number of 
columns I mean), and you’re mentioning that the situation gets worse in 
proportion to the number of tables brought GC to mind.  Not sure about 
proportion of nodes, I think there are thread counts that increase with the 
number of nodes, and increased threads also can add to GC load, particularly in 
G1GC.

I’m speculating a bit on possible causes, but basically the idea was to look 
for GC load during those 3 minutes, because if you see it then you’re not 
hunting for a timeout tuning or anything like that, you’re hunting for a 
resource allocation tuning.

From: Jai Bheemsen Rao Dhanwada 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, June 1, 2020 at 7:15 PM
To: "user@cassandra.apache.org" 
Subject: Re: Cassandra Bootstrap Sequence

Message from External Sender
Is there anything specific to for in GC logs?
b/w this delay happens always whenever I bootstrap the node or restart a C* 
process.

I don't believe it's a GC issue and correction from initial question, it's not 
just bootstrap, but every restart of C* process is causing this.

On Mon, Jun 1, 2020 at 3:22 PM Reid Pinchback 
mailto:rpinchb...@tripadvisor.com>> wrote:
That gap seems a long time.  Have you checked GC logs around the timeframe?

From: Jai Bheemsen Rao Dhanwada 
mailto:jaibheem...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, June 1, 2020 at 3:52 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Cassandra Bootstrap Sequence

Message from External Sender
Hello Team,

When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
gossip settle and port opening. Can someone please explain me where this delay 
is configured and can this be changed? I don't see any information in the logs

In my case if you see there is  a ~3 minutes delay and this increases if I 
increase the #of tables and #of nodes and DC.

INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip to 
settle...
INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop
INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
[netty-buffer=netty-buffer-4.0.44.Final.452812a, 
netty-codec=netty-codec-4.0.44.Final.452812a, 
netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
netty-common=netty-common-4.0.44.Final.452812a, 
netty-handler=netty-handler-4.0.44.Final.452812a, 
netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
netty-transport=netty-transport-4.0.44.Final.452812a, 
netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, 
netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
CQL clients on /x.x.x.x:9042 (encrypted)...

Also during this 3 minutes delay, I am losing all my metrics from the C* 
nodes(basically the metrics are not returned within 10s).

Can someone please help me understand the delay here?

Cassandra Version: 3.11.3
Metrics: Using telegraf to collect metrics.


Re: Cassandra Bootstrap Sequence

2020-06-01 Thread Jai Bheemsen Rao Dhanwada
Is there anything specific to for in GC logs?
b/w this delay happens always whenever I bootstrap the node or restart a C*
process.

I don't believe it's a GC issue and correction from initial question, it's
not just bootstrap, but every restart of C* process is causing this.

On Mon, Jun 1, 2020 at 3:22 PM Reid Pinchback 
wrote:

> That gap seems a long time.  Have you checked GC logs around the timeframe?
>
>
>
> *From: *Jai Bheemsen Rao Dhanwada 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, June 1, 2020 at 3:52 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Cassandra Bootstrap Sequence
>
>
>
> *Message from External Sender*
>
> Hello Team,
>
>
>
> When I am bootstrapping/restarting a Cassandra Node, there is a delay
> between gossip settle and port opening. Can someone please explain me where
> this delay is configured and can this be changed? I don't see any
> information in the logs
>
>
>
> In my case if you see there is  a ~3 minutes delay and this increases if I
> increase the #of tables and #of nodes and DC.
>
>
>
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for
> gossip to settle...
>
> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
> backlog; proceeding *INFO  [main] 2020-05-31 23:54:06,867
> NativeTransportService.java:70 - Netty using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty
> Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a,
> netty-codec=netty-codec-4.0.44.Final.452812a,
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a,
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a,
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a,
> netty-common=netty-common-4.0.44.Final.452812a,
> netty-handler=netty-handler-4.0.44.Final.452812a,
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb,
> netty-transport=netty-transport-4.0.44.Final.452812a,
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
> netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a,
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a,
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> *INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening
> for CQL clients on /x.x.x.x:9042 (encrypted)...*
>
>
>
> Also during this 3 minutes delay, I am losing all my metrics from the C*
> nodes(basically the metrics are not returned within 10s).
>
>
>
> Can someone please help me understand the delay here?
>
>
>
> Cassandra Version: 3.11.3
>
> Metrics: Using telegraf to collect metrics.
>


Re: Cassandra Bootstrap Sequence

2020-06-01 Thread Reid Pinchback
That gap seems a long time.  Have you checked GC logs around the timeframe?

From: Jai Bheemsen Rao Dhanwada 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, June 1, 2020 at 3:52 PM
To: "user@cassandra.apache.org" 
Subject: Cassandra Bootstrap Sequence

Message from External Sender
Hello Team,

When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
gossip settle and port opening. Can someone please explain me where this delay 
is configured and can this be changed? I don't see any information in the logs

In my case if you see there is  a ~3 minutes delay and this increases if I 
increase the #of tables and #of nodes and DC.

INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip to 
settle...
INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop
INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
[netty-buffer=netty-buffer-4.0.44.Final.452812a, 
netty-codec=netty-codec-4.0.44.Final.452812a, 
netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
netty-common=netty-common-4.0.44.Final.452812a, 
netty-handler=netty-handler-4.0.44.Final.452812a, 
netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
netty-transport=netty-transport-4.0.44.Final.452812a, 
netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, 
netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
CQL clients on /x.x.x.x:9042 (encrypted)...

Also during this 3 minutes delay, I am losing all my metrics from the C* 
nodes(basically the metrics are not returned within 10s).

Can someone please help me understand the delay here?

Cassandra Version: 3.11.3
Metrics: Using telegraf to collect metrics.


Cassandra Bootstrap Sequence

2020-06-01 Thread Jai Bheemsen Rao Dhanwada
Hello Team,

When I am bootstrapping/restarting a Cassandra Node, there is a delay
between gossip settle and port opening. Can someone please explain me where
this delay is configured and can this be changed? I don't see any
information in the logs

In my case if you see there is  a ~3 minutes delay and this increases if I
increase the #of tables and #of nodes and DC.

INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for
> gossip to settle...
>
> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
> backlog; proceeding*INFO  [main] 2020-05-31 23:54:06,867
> NativeTransportService.java:70 - Netty using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty
> Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a,
> netty-codec=netty-codec-4.0.44.Final.452812a,
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a,
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a,
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a,
> netty-common=netty-common-4.0.44.Final.452812a,
> netty-handler=netty-handler-4.0.44.Final.452812a,
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb,
> netty-transport=netty-transport-4.0.44.Final.452812a,
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
> netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a,
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a,
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> *INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening
> for CQL clients on /x.x.x.x:9042 (encrypted)...*


Also during this 3 minutes delay, I am losing all my metrics from the C*
nodes(basically the metrics are not returned within 10s).

Can someone please help me understand the delay here?

Cassandra Version: 3.11.3
Metrics: Using telegraf to collect metrics.