Aswin Karthik created CASSANDRA-18053:
-----------------------------------------

             Summary: Node disconnection during cassandra 4.0 upgrade from 
cassandra 3.11
                 Key: CASSANDRA-18053
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18053
             Project: Cassandra
          Issue Type: Bug
            Reporter: Aswin Karthik


We are running Cassandra 3.11.11. We are upgrading to 4.0.5.

The nodes use 11044 for its storage port.

 

Our upgrade process is the usual
 * Boot cassandra 4.0.5 using 3.11.11 data disk
 * Run upgradesstables

 

However, during the upgrade, randomly a node is unable to connect to other 
nodes in the cluster. This happens very intermittently and gets fixed on 
restart.

 

On further diagnosis, we found that the problematic node uses 7000 from some 
communication instead of the configured port

 
{noformat}
 InboundConnectionInitiator.java:127 - Listening on address: 
(node-1.dev/x.x.x.x:11044), nic: eth0, encryption: optionally encrypted(openssl)
OutboundConnection.java:1150 - 
node-1.dev/x.x.x.x:7000(/x.x.x.x:50424)->/y.y.y.y:11044-URGENT_MESSAGES-3c193918
 successfully connected, version = 12, framing = LZ4, encryption = 
encryptedfactory=openssl;protocol=TLSv1.2;cipher=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384){noformat}
Notice the x.x.x.x:7000 in log line even though x.x.x.x is starting on 11044.

This gets fixed on restart.

 

The logs on reboot
{noformat}
 InboundConnectionInitiator.java:127 - Listening on address: (/x.x.x.x:11044), 
nic: eth0, encryption: optionally encrypted(openssl)
InboundConnectionInitiator.java:464 - 
/y.y.y.y:11044(/y.y.y.y:40656)->/x.x.x.x:11044-URGENT_MESSAGES-cade4755 
messaging connection established, version = 12, framing = CRC, encryption = 
encrypted(factory=openssl;protocol=TLSv1.2;cipher=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384)
OutboundConnection.java:1150 - 
/x.x.x.x:11044(/x.x.x.x:53316)->/y.y.y.y:11044-URGENT_MESSAGES-92d99f23 
successfully connected, version = 12, framing = LZ4, encryption = 
encrypted(factory=openssl;protocol=TLSv1.2;cipher=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384)
 {noformat}
 

Notice the Outbound connection log line has x.x.x.x:11044 this time.

 

This issue is very random.

 

Looks to be a bug. Is there a fix for this? Are we missing some steps during 
the upgrade?

 

Some relevant sections of cassandra.yaml on both the cassandra 3.x and 4.x

 
{noformat}
storage_port: 11044
ssl_storage_port: 11044
server_encryption_options:
    internode_encryption: all
    keystore: ---------
    keystore_password: -------
    truststore: ---------
    truststore_password: ---------
    protocol: TLSv1.2
    algorithm: PKIX
    store_type: PKCS12
    cipher_suites:
        - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
    require_client_auth: true {noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to