Elevated response times from all nodes in a data center at the same time.

2019-10-14 Thread Bill Walters
Hi Everyone,

Need some suggestions regarding a peculiar issue we started facing in our
production cluster for the last couple of days.

Here are our Production environment details.

AWS Regions: us-east-1 and us-west-2. Deployed over 3 availability zone in
each region.
No of Nodes: 24
Data Centers: 4 (6 nodes in each data center, 2 OLTP Data centers for APIs
and 2 OLAP Data centers for Analytics and Batch loads)
Instance Types: r5.8x Large
Average Node Size: 182 GB
Work Load: Read heavy
Read TPS: 22k
Cassandra version: 3.0.15
Java Version: JDK 181.
EBS Volumes: GP2 with 1TB 3000 iops.

1. We have been running in production for more than one year and our
experience with Cassandra is great. Experienced little hiccups here and
there but nothing severe.

2. But recently for the past couple of days we see a behavior where our p99
latency in our AWS us-east-1 region OLTP data center, suddenly starts
rising from 2 ms to 200 ms. It starts with one node where we see the 99th
percentile Read Request latency in Datastax Opscenter starts increasing.
And it spreads immediately, to all other 6 nodes in the data center.

3. We do not see any Read request timeouts or Exception in the our API
Splunk logs only p99 and average latency go up suddenly.

4. We have investigated CPU level usage, Disk I/O, Memory usage and Network
parameters for the nodes during this period and we are not experiencing any
sudden surge in these parameters.

5. We setup client using WhiteListPolicy to send queries to each of the 6
nodes to understand which one is bad, but we see all of them responding
with very high latency. It doesn't happen during our peak traffic period
sometime in the night.

6. We checked the system.log files on our nodes, took a thread dump and
checked for any rouge processes running on the nodes which is stealing CPU
but we are able to find nothing.

7. We even checked our the write requests coming in during this time and we
do not see any large batch operations happening.

8. Initially we tried restarting the nodes to see if the issue can be
mitigated but it kept happening, and we had to fail over API traffic to
us-west-2 region OLTP data center. After a couple of hours we failed back
and everything seems to be working.

We are baffled by this behavior, only correlation we find is the "Native
requests pending" in our Task queues when this happens.

Please let us know your suggestions on how to debug this issue. Has anyone
experienced an issue like this before.(We had issues where one node starts
acting bad due to bad EBS volume I/O read and write time, but all nodes
experiencing an issue at same time is very peculiar)

Thank You,
Bill Walters.


Cassadra node join problem

2019-10-14 Thread Sergio Bilello
Problem:
The cassandra node does not work even after restart throwing this exception:
WARN  [Thread-83069] 2019-10-11 16:13:23,713 CustomTThreadPoolServer.java:125 - 
Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: 
Socket closed
at 
org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:109)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:60) 
~[libthrift-0.9.2.jar:0.9.2]
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:113)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:134)
 [apache-cassandra-3.11.4.jar:3.11.4]

The CPU Load goes to 50 and it becomes unresponsive.

Node configuration:
OS: Linux  4.16.13-1.el7.elrepo.x86_64 #1 SMP Wed May 30 14:31:51 EDT 2018 
x86_64 x86_64 x86_64 GNU/Linux

This is a working node that does not have the recommended settings but it is 
working and it is one of the first node in the cluster
cat /proc/23935/limits
Limit Soft Limit   Hard Limit   Units
Max cpu time  unlimitedunlimitedseconds
Max file size unlimitedunlimitedbytes
Max data size unlimitedunlimitedbytes
Max stack size8388608  unlimitedbytes
Max core file size0unlimitedbytes
Max resident set  unlimitedunlimitedbytes
Max processes 122422   122422   processes
Max open files6553665536files
Max locked memory 6553665536bytes
Max address space unlimitedunlimitedbytes
Max file locksunlimitedunlimitedlocks
Max pending signals   122422   122422   signals
Max msgqueue size 819200   819200   bytes
Max nice priority 00
Max realtime priority 00
Max realtime timeout  unlimitedunlimitedus


I tried to bootstrap a new node that joins the existing cluster. 
The disk space used is around 400GB SSD over 885GB available

At my first attempt, the node failed and got restarted over and over by 
systemctl that does not 
honor the limits configuration specified and thrown

Caused by: java.nio.file.FileSystemException: 
/mnt/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/md-52-big-Index.db:
 Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) 
~[na:1.8.0_161]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) 
~[na:1.8.0_161]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) 
~[na:1.8.0_161]
at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
 ~[na:1.8.0_161]
at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[na:1.8.0_161]
at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[na:1.8.0_161]
at 
org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:104)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
.. 20 common frames omitted
^C

I fixed  the above by stopping cassandra, cleaning commitlog, saved_caches, 
hints and data directory and restarting it and getting the PID and run the 2 
commands below
sudo prlimit -n1048576 -p 
sudo prlimit -u32768 -p 
because at the beginning the node didn't even joint the cluster. it was 
reported by UJ.

After fixing the max open file problem, The node from UpJoining passed to the 
status UpNormal
The node joined the cluster but after a while, it started to throw

WARN  [Thread-83069] 2019-10-11 16:13:23,713 CustomTThreadPoolServer.java:125 - 
Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: 
Socket closed
at 
org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:109)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:60) 
~[libthrift-0.9.2.jar:0.9.2]
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:113)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:134)
 [apache-cassandra-3.11.4.jar:3.11.4]


I compared 

Oversized Read Repair Mutations

2019-10-14 Thread Isaac Reath (BLOOMBERG/ 731 LEX)
Hi Cassandra users,

Recently on some of our production clusters we have run into the following 
error:

2019-10-11 15:14:46,803 DataResolver.java:507 - Encountered an oversized (x/y) 
read repair mutation for table. 

Which is described in this jira: 
https://issues.apache.org/jira/browse/CASSANDRA-13975.

Suppose we don't want to drop the read repair write for consistency purposes, 
is the best solution here to increase commitlog segment size/max mutation size 
to be in line with our max partition size? Is there any plan to change the read 
repair code path so that it chunks the partition update into mutations which 
are smaller than the max mutation size?

Thanks in advance,

Isaac