[ https://issues.apache.org/jira/browse/CASSANDRA-19696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kan Maung updated CASSANDRA-19696: ---------------------------------- Component/s: Cluster/Gossip > Observed large number of Inbound / Outbound connection disconnect / > reconnects in log > ------------------------------------------------------------------------------------- > > Key: CASSANDRA-19696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19696 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip > Reporter: Kan Maung > Priority: Normal > > We are seeing hundreds of InboundConnection established / closed messages on > several of our clusters running Apache Cassandra 4.0.10. Looking at > 'nodetool tpstats' it seems gossip is close to the time out value. It > affects both the LargeMessage and UrgentMessage connections. > Gossiper uses MessagingService to send messages from the source to > destination using OutboundConnection. Depending on the message type > especially for LARGE_MESSAGES it is enqueued in a separate thread pool while > URGENT_MESSAGES are delivered with Verb.Priority.P0. > In the example below this happens just 20 seconds after it connected. These > two nodes are in the same datacenter, so there should be no geographical > latency between them. This cluster 111 has a very standard cassandra.yaml for > our environment. > > 127.10.20.88 cassandra.log: > 2024-05-13 02:06:13,805 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 > ip_address=127.10.20.88 InboundConnectionInitiator.java:529 - > /127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-e039a471 > messaging connection established, version = 12, framing = CRC, encryption = > encrypted(...) > 2024-05-13 02:06:32,201 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 > ip_address=127.10.20.88 OutboundConnection.java:1059 - > /127.10.20.88:7000->/169.73.115.189:7000-LARGE_MESSAGES-70634968 channel > closed by provider > > 127.10.30.171 log: > 2024-05-13 02:05:00,300 [INFO ] [Messaging-EventLoop-3-2] cluster_id=111 > ip_address=127.10.30.171 OutboundConnection.java:1059 - > /127.10.30.171:7000->/169.102.147.87:7000-LARGE_MESSAGES-4b3ea69f channel > closed by provider > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection timed out > 2024-05-13 02:05:46,892 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 > ip_address=127.10.30.171 OutboundConnection.java:1059 - > /127.10.30.171:7000->/127.10.20.88:7000-URGENT_MESSAGES-8fd0dbf2 channel > closed by provider > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection timed out > 2024-05-13 02:06:13,804 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 > ip_address=127.10.30.171 OutboundConnection.java:1153 - > /127.10.30.171:7000(/127.10.30.171:37404)->/127.10.20.88:7000-URGENT_MESSAGES-155d9869 > successfully connected, version = 12, framing = CRC, encryption = > encrypted(...) > 2024-05-13 02:06:24,281 [INFO ] [Messaging-EventLoop-3-4] cluster_id=111 > ip_address=127.10.30.171 OutboundConnection.java:1153 - > /127.10.30.171:7000(/127.10.30.171:50046)->/169.73.137.223:7000-LARGE_MESSAGES-04b51284 > successfully connected, version = 12, framing = LZ4, encryption = > encrypted(...) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org