[jira] [Commented] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails due to IP change
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794463#comment-17794463 ] Aldo commented on CASSANDRA-19178: -- {quote}One way out may be to add a new node to the cluster that knows about cassandra7 and cassandra9 that can "introduce" those nodes to each other once it knows about their correct addresses. It may not even need to complete bootstrapping for this to happen. {quote} Good to know, thanks. To be honest, my 3-nodes scenario is a test environment where I'm simulating the 3.x->4.x upgrade. The real production environment is a 5-nodes scenario, with RF=3. So, given the fact that I can temporarily "accept" the quick downtime of 2 nodes out of 5, I can probably: # select a seed node X for upgrade: such node will restart, receive a new IP, and stay of the cluster until another node will discover its IP and communicate with its inbound # select another node Y and just restart it (without upgrade) but giving to it the full list of all 5 nodes as seeds: in this way Y will resolve the hostnames->IP of all the 5 nodes (including X) thus "introducing" X back into the cluster, according to your suggestion I will repeat 1-2 for all the 5 nodes until everything is upgraded to 4.x. Moreover, after the first 1-2 steps on the very first node, the next iterations of the 1-2 steps can be simplified: maybe I can use your other suggestion (nodetool reloadseeds) and perform the step #2 by selecting a Y node of type 4.x and executing "nodetool reloadseeds". > Cluster upgrade 3.x -> 4.x fails due to IP change > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have now the {{me-*}} prefix. > > The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The > procedure that I follow is very simple: > # I start from the _cassandra7_ service (which is a seed node) > # {{nodetool drain}} > # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log > # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version > The procedure is exactly the same I followed for the upgrade 2.2.8 --> > 3.11.16, obviously with a different version at step 4. Unfortunately the > upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and > attempts to communicate with the other seed node ({_}cassandra9{_}) but the > log of _cassandra7_ shows the following: > {code:java} > INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 > OutboundConnectionInitiator.java:390 - Failed to connect to peer > tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: >
[jira] [Commented] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails due to IP change
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794434#comment-17794434 ] Aldo commented on CASSANDRA-19178: -- {quote}Is the seed list on cassandra9 up to date with cassandra7? {quote} Is there a way to dynamically update the seed list in a living node? In my configuration I have: * cassandra 7, just upgraded with 4.x but out-of-the-cluster until it is able to properly communicate with other peers * cassandra 8, running with 3.x and paired with cassandra 9. Don't know the new IP of cassandra7 * cassandra 9, running with 3.x and paired with cassandra 8. Don't know the new IP of cassandra7 If I can trigger some kind of live seed list refresh on cassandra 9 or 8, this will result in what you described: cassandra7 will learn the 8 & 9 messaging version when they communicate with it. But to do it one between 9 or 8 must be triggered in order to use the new cassandra7 IP. "Triggered" to me means "live trigger": it's unaccettable to restart another node (8 or 9) when the 7 is already out of the cluster. Is it possible? Through JMX or something similar ? > Cluster upgrade 3.x -> 4.x fails due to IP change > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have now the {{me-*}} prefix. > > The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The > procedure that I follow is very simple: > # I start from the _cassandra7_ service (which is a seed node) > # {{nodetool drain}} > # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log > # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version > The procedure is exactly the same I followed for the upgrade 2.2.8 --> > 3.11.16, obviously with a different version at step 4. Unfortunately the > upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and > attempts to communicate with the other seed node ({_}cassandra9{_}) but the > log of _cassandra7_ shows the following: > {code:java} > INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 > OutboundConnectionInitiator.java:390 - Failed to connect to peer > tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection reset by peer{code} > The relevant port of the log, related to the missing internode communication, > is attached in _cassandra7.log_ > In the log of _cassandra9_ there is nothing after the abovementioned step #4. > So only _cassandra7_ is saying something in the logs. > I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is > always the same. Of course when I follow the
[jira] [Commented] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails due to IP change
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794431#comment-17794431 ] Aldo commented on CASSANDRA-19178: -- Unfortunately the answer is no: cassandra7 just restarted and got a brand new IP from Docker Swarm. So there is not way for cassandra9 to contact cassandra7 by itself. It is cassandra7 that, just restarted, must communicate with cassandra9. And according to the code I studied in my last comment above, it should work. But it instead, in my environment, the cassandra9 answer is completely masked by the connection reset by peer. > Cluster upgrade 3.x -> 4.x fails due to IP change > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have now the {{me-*}} prefix. > > The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The > procedure that I follow is very simple: > # I start from the _cassandra7_ service (which is a seed node) > # {{nodetool drain}} > # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log > # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version > The procedure is exactly the same I followed for the upgrade 2.2.8 --> > 3.11.16, obviously with a different version at step 4. Unfortunately the > upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and > attempts to communicate with the other seed node ({_}cassandra9{_}) but the > log of _cassandra7_ shows the following: > {code:java} > INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 > OutboundConnectionInitiator.java:390 - Failed to connect to peer > tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection reset by peer{code} > The relevant port of the log, related to the missing internode communication, > is attached in _cassandra7.log_ > In the log of _cassandra9_ there is nothing after the abovementioned step #4. > So only _cassandra7_ is saying something in the logs. > I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is > always the same. Of course when I follow the steps 1..3, then restore the 3.x > snapshot and finally perform the step #4 using the official 3.11.16 version > the node 7 restarts correctly and joins the cluster. I attached the relevant > part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that > node 7 and 9 can communicate. > I suspect this could be related to the port 7000 now (with Cassandra 4.x) > supporting both encrypted and unencrypted traffic. As stated previously I'm > using the untouched official Cassandra images so all my cluster, inside
[jira] [Commented] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails due to IP change
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794427#comment-17794427 ] Aldo commented on CASSANDRA-19178: -- I read carefully the code of _IncomingTcpConnection.java_ (trunk 3.11.16). The [receiveMessages|https://github.com/apache/cassandra/blob/681b6ca103d91d940a9fecb8cd812f58dd2490d0/src/java/org/apache/cassandra/net/IncomingTcpConnection.java#L142] method seems to do two things: # write and flush its current version (11) # throw an IOException The IOException results in the socket to be closed. On the other side, the caller is busy on the _OutboundConnectionInitiator.java_ (trunk 4.1.3). It *for sure* enters the [decode|https://github.com/apache/cassandra/blob/2a4cd36475de3eb47207cd88d2d472b876c6816d/src/java/org/apache/cassandra/net/OutboundConnectionInitiator.java#L263C27-L263C27] method and proceeds to [line 267|https://github.com/apache/cassandra/blob/2a4cd36475de3eb47207cd88d2d472b876c6816d/src/java/org/apache/cassandra/net/OutboundConnectionInitiator.java#L267] where it *should* decode the message, discover the version 11, print {{received second handshake message from peer}} as per [line 273|https://github.com/apache/cassandra/blob/2a4cd36475de3eb47207cd88d2d472b876c6816d/src/java/org/apache/cassandra/net/OutboundConnectionInitiator.java#L273] and then re-contact the peer this time with version 11. But according to my log snippet of cassandra7 above, the _OutboundConnectionInitiator.decode()_ method instead is unable to execute the code at line 267, which result in an exception being thrown and catched at [line 363|https://github.com/apache/cassandra/blob/2a4cd36475de3eb47207cd88d2d472b876c6816d/src/java/org/apache/cassandra/net/OutboundConnectionInitiator.java#L363]. From there the [exceptionCaught|https://github.com/apache/cassandra/blob/2a4cd36475de3eb47207cd88d2d472b876c6816d/src/java/org/apache/cassandra/net/OutboundConnectionInitiator.java#L368] method is invoked and we can see the exception log with {{{}Failed to connect to peer ... Connection reset by peer{}}}. I wonder what is causing this kind of behavior: # Is it a good practice, in version 3.11.16, write and flush the correct messaging version (11) and then abruptly close the socket? # How can the caller (4.1.3) be guaranteed to receive the few bytes indicating the correct messaging? In my environment it seems that the socket abruptly closed by the other peer is "winning" over such few response bytes. # Is there something at netty-level (some kind of System properties) able to mitigate such kind of behavior, either in the 4.1.3 node or on the 3.11.16 node? # Is it possible that my environment (AWS servers, Docker Swarm) triggered something similar to what is documented at [line 372|https://github.com/apache/cassandra/blob/2a4cd36475de3eb47207cd88d2d472b876c6816d/src/java/org/apache/cassandra/net/OutboundConnectionInitiator.java#L372]? The comment relates to {{SslClosedEngineException}} (which is not my case), but the reference to {{io.netty.channel.unix.Errors$NativeIoException: readAddress(..) }}is matching my logs. > Cluster upgrade 3.x -> 4.x fails due to IP change > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force
[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails due to IP change
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aldo updated CASSANDRA-19178: - Summary: Cluster upgrade 3.x -> 4.x fails due to IP change (was: Cluster upgrade 3.x -> 4.x fails with no internode encryption) > Cluster upgrade 3.x -> 4.x fails due to IP change > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have now the {{me-*}} prefix. > > The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The > procedure that I follow is very simple: > # I start from the _cassandra7_ service (which is a seed node) > # {{nodetool drain}} > # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log > # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version > The procedure is exactly the same I followed for the upgrade 2.2.8 --> > 3.11.16, obviously with a different version at step 4. Unfortunately the > upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and > attempts to communicate with the other seed node ({_}cassandra9{_}) but the > log of _cassandra7_ shows the following: > {code:java} > INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 > OutboundConnectionInitiator.java:390 - Failed to connect to peer > tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection reset by peer{code} > The relevant port of the log, related to the missing internode communication, > is attached in _cassandra7.log_ > In the log of _cassandra9_ there is nothing after the abovementioned step #4. > So only _cassandra7_ is saying something in the logs. > I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is > always the same. Of course when I follow the steps 1..3, then restore the 3.x > snapshot and finally perform the step #4 using the official 3.11.16 version > the node 7 restarts correctly and joins the cluster. I attached the relevant > part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that > node 7 and 9 can communicate. > I suspect this could be related to the port 7000 now (with Cassandra 4.x) > supporting both encrypted and unencrypted traffic. As stated previously I'm > using the untouched official Cassandra images so all my cluster, inside the > Docker Swarm, is not (and has never been) configured with encryption. > I can also add the following: if I perform the 4 above steps also for the > _cassandra9_ and _cassandra8_ services, in the end the cluster works. But > this is not acceptable, because the cluster is unavailable until I finish the > full upgrade of all nodes:
[jira] [Comment Edited] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793944#comment-17793944 ] Aldo edited comment on CASSANDRA-19178 at 12/6/23 11:06 PM: I apologize in advance if reopening is not the correct behavior, please tell me if I need to open a new issue. I think I've discovered the source cause of the issue, and wonder if it's a bug or it's caused by a misconfiguration on my side. Using {{nodetool setlogginglevel org.apache.cassandra TRACE}} on both the 4.x upgraded node (cassandra7) and on the running 3.x seed node (cassandra9) I was able to isolate the relevant logs: On cassandra7: {code:java} TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 EndpointMessagingVersions.java:67 - Assuming current protocol version for tasks.cassandra9/10.0.2.92:7000 TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 OutboundConnectionInitiator.java:131 - creating outbound bootstrap to peer: (tasks.cassandra9/10.0.2.92:7000, tasks.cassandra9/10.0.2.92:7000), framing: CRC, encryption: unencrypted, requestVersion: 12 TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,411 OutboundConnectionInitiator.java:236 - starting handshake with peer tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000), msg = Initiate(request: 12, min: 10, max: 12, type: URGENT_MESSAGES, framing: true, from: tasks.cassandra7/10.0.2.137:7000) INFO [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,412 OutboundConnectionInitiator.java:390 - Failed to connect to peer tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000) io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer {code} On cassandra9: {code:java} TRACE [ACCEPT-tasks.cassandra9/10.0.2.92] 2023-12-06 22:16:56,411 MessagingService.java:1315 - Connection version 12 from /10.0.2.137 TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 IncomingTcpConnection.java:111 - IOException reading from socket; closing java.io.IOException: Peer-used messaging version 12 is larger than max supported 11 at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:153) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:98) TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 IncomingTcpConnection.java:125 - Closing socket Socket[addr=/10.0.2.137,port=45680,localport=7000] - isclosed: false {code} So it seems there is a mismatch on this {_}messaging version{_}. I'm trying to understand the behaviour of _EndpointMessagingVersions.java_ and _OutboundConnectionInitiator.java_ on the 4.1.x trunk and it seems that there are few facts: # the internal map of _EndpointMessagingVersions_ on the node just restarted (cassandra7) for sure doesn't include information about the existing node (cassandra9). This because on my network configuration cassandra7 (or more precisely the tasks.cassandra7 hostname) changed IP due to the restart. So cassandra9 (the 3.x running node) cannot send its messaging version (=11) to the newest cassandra7 until the handshake completes. # therefore inside _OutboundConnectionInitiator_ the messaging version for the cassandra7–> cassandra9 handshake is assumed equal to the current (=12) # when the 3.x node (cassandra9) determines the messaging version mismatch it throws an IOException and closed the connection # the 4.x node (cassandra7) just sees a connection reset by peer and seems not capable of downgrade the messaging version and retry the handshake I can again state that a similar upgrade path, with different involved versions (2.2.8 --> to 3.11.16) on the same exact architecture, involving the same Docker swarm services, the same IP-changing behaviour, etc... worked like a charm. So I think something changed on the source code and breaked that behavior when the upgrade is 3.11.16 --> 4.1.3. was (Author: JIRAUSER303409): I apologize in advance if reopening is not the correct behavior, please tell me if I need to open a new issue. I think I've discovered the source cause of the issue, and wonder if it's a bug or it's caused by a misconfiguration on my side. Using {{nodetool setlogginglevel org.apache.cassandra TRACE}} on both the 4.x upgraded node (cassandra7) and on the running 3.x seed node (cassandra9) I was able to isolate the relevant logs: On cassandra7: {code:java} TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 EndpointMessagingVersions.java:67 - Assuming current protocol version for tasks.cassandra9/10.0.2.92:7000 TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 OutboundConnectionInitiator.java:131 - creating outbound bootstrap to peer: (tasks.cassandra9/10.0.2.92:7000, tasks.cassandra9/10.0.2.92:7000), framing: CRC, encryption: unencrypted, requestVersion: 12 TRACE
[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aldo updated CASSANDRA-19178: - Resolution: (was: Invalid) Status: Open (was: Resolved) > Cluster upgrade 3.x -> 4.x fails with no internode encryption > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have now the {{me-*}} prefix. > > The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The > procedure that I follow is very simple: > # I start from the _cassandra7_ service (which is a seed node) > # {{nodetool drain}} > # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log > # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version > The procedure is exactly the same I followed for the upgrade 2.2.8 --> > 3.11.16, obviously with a different version at step 4. Unfortunately the > upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and > attempts to communicate with the other seed node ({_}cassandra9{_}) but the > log of _cassandra7_ shows the following: > {code:java} > INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 > OutboundConnectionInitiator.java:390 - Failed to connect to peer > tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection reset by peer{code} > The relevant port of the log, related to the missing internode communication, > is attached in _cassandra7.log_ > In the log of _cassandra9_ there is nothing after the abovementioned step #4. > So only _cassandra7_ is saying something in the logs. > I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is > always the same. Of course when I follow the steps 1..3, then restore the 3.x > snapshot and finally perform the step #4 using the official 3.11.16 version > the node 7 restarts correctly and joins the cluster. I attached the relevant > part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that > node 7 and 9 can communicate. > I suspect this could be related to the port 7000 now (with Cassandra 4.x) > supporting both encrypted and unencrypted traffic. As stated previously I'm > using the untouched official Cassandra images so all my cluster, inside the > Docker Swarm, is not (and has never been) configured with encryption. > I can also add the following: if I perform the 4 above steps also for the > _cassandra9_ and _cassandra8_ services, in the end the cluster works. But > this is not acceptable, because the cluster is unavailable until I finish the > full upgrade of all nodes: I need to perform a step-update, one
[jira] [Commented] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793944#comment-17793944 ] Aldo commented on CASSANDRA-19178: -- I apologize in advance if reopening is not the correct behavior, please tell me if I need to open a new issue. I think I've discovered the source cause of the issue, and wonder if it's a bug or it's caused by a misconfiguration on my side. Using {{nodetool setlogginglevel org.apache.cassandra TRACE}} on both the 4.x upgraded node (cassandra7) and on the running 3.x seed node (cassandra9) I was able to isolate the relevant logs: On cassandra7: {code:java} TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 EndpointMessagingVersions.java:67 - Assuming current protocol version for tasks.cassandra9/10.0.2.92:7000 TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,410 OutboundConnectionInitiator.java:131 - creating outbound bootstrap to peer: (tasks.cassandra9/10.0.2.92:7000, tasks.cassandra9/10.0.2.92:7000), framing: CRC, encryption: unencrypted, requestVersion: 12 TRACE [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,411 OutboundConnectionInitiator.java:236 - starting handshake with peer tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000), msg = Initiate(request: 12, min: 10, max: 12, type: URGENT_MESSAGES, framing: true, from: tasks.cassandra7/10.0.2.137:7000) INFO [Messaging-EventLoop-3-3] 2023-12-06 22:16:56,412 OutboundConnectionInitiator.java:390 - Failed to connect to peer tasks.cassandra9/10.0.2.92:7000(tasks.cassandra9/10.0.2.92:7000) io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer {code} On cassandra9: {code:java} TRACE [ACCEPT-tasks.cassandra9/10.0.2.92] 2023-12-06 22:16:56,411 MessagingService.java:1315 - Connection version 12 from /10.0.2.137 TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 IncomingTcpConnection.java:111 - IOException reading from socket; closing java.io.IOException: Peer-used messaging version 12 is larger than max supported 11 at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:153) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:98) TRACE [MessagingService-Incoming-/10.0.2.137] 2023-12-06 22:16:56,412 IncomingTcpConnection.java:125 - Closing socket Socket[addr=/10.0.2.137,port=45680,localport=7000] - isclosed: false {code} So it seems there is a mismatch on this {_}messaging version{_}. > Cluster upgrade 3.x -> 4.x fails with no internode encryption > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have
[jira] [Commented] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793934#comment-17793934 ] Aldo commented on CASSANDRA-19178: -- Thanks, I moved the question on StackExchange [here|https://dba.stackexchange.com/questions/333799/cassandra-cluster-upgrade-3-x-4-x-fails-with-internode-encryption-none]. > Cluster upgrade 3.x -> 4.x fails with no internode encryption > - > > Key: CASSANDRA-19178 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Aldo >Priority: Normal > Attachments: cassandra7.downgrade.log, cassandra7.log > > > I have a Docker swarm cluster with 3 distinct Cassandra services (named > {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different > servers. The 3 services are running the version 3.11.16, using the official > Cassandra image 3.11.16 on Docker Hub. The first service is configured just > with the following environment variables > {code:java} > CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" > CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} > which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance > the _cassandra.yaml_ for the first service contains the following (and the > rest is the image default): > {code:java} > # grep tasks /etc/cassandra/cassandra.yaml > - seeds: "tasks.cassandra7,tasks.cassandra9" > listen_address: tasks.cassandra7 > broadcast_address: tasks.cassandra7 > broadcast_rpc_address: tasks.cassandra7 {code} > Other services (8 and 9) have a similar configuration, obviously with a > different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and > {{{}tasks.cassandra9{}}}). > The cluster is running smoothly and all the nodes are perfectly able to > rejoin the cluster whichever event occurs, thanks to the Docker Swarm > {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for > Docker swarm to restart it, force update it in order to force a restart, > scale to 0 and then 1 the service, restart an entire server, turn off and > then turn on all the 3 servers. Never found an issue on this. > I also just completed a full upgrade of the cluster from version 2.2.8 to > 3.11.16 (simply upgrading the Docker official image associated with the > services) without issues. I was also able, thanks to a 2.2.8 snapshot on each > server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I > finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables > have now the {{me-*}} prefix. > > The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The > procedure that I follow is very simple: > # I start from the _cassandra7_ service (which is a seed node) > # {{nodetool drain}} > # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log > # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version > The procedure is exactly the same I followed for the upgrade 2.2.8 --> > 3.11.16, obviously with a different version at step 4. Unfortunately the > upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and > attempts to communicate with the other seed node ({_}cassandra9{_}) but the > log of _cassandra7_ shows the following: > {code:java} > INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 > OutboundConnectionInitiator.java:390 - Failed to connect to peer > tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) > io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: > Connection reset by peer{code} > The relevant port of the log, related to the missing internode communication, > is attached in _cassandra7.log_ > In the log of _cassandra9_ there is nothing after the abovementioned step #4. > So only _cassandra7_ is saying something in the logs. > I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is > always the same. Of course when I follow the steps 1..3, then restore the 3.x > snapshot and finally perform the step #4 using the official 3.11.16 version > the node 7 restarts correctly and joins the cluster. I attached the relevant > part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that > node 7 and 9 can communicate. > I suspect this could be related to the port 7000 now (with Cassandra 4.x) > supporting both encrypted and unencrypted traffic. As stated previously I'm > using the untouched official Cassandra images so all my cluster, inside the > Docker Swarm, is not (and has never been) configured with encryption. > I can also add the following: if I perform the 4 above steps also for the > _cassandra9_ and _cassandra8_ services, in the end the cluster works. But > this
[jira] [Updated] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
[ https://issues.apache.org/jira/browse/CASSANDRA-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aldo updated CASSANDRA-19178: - Description: I have a Docker swarm cluster with 3 distinct Cassandra services (named {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different servers. The 3 services are running the version 3.11.16, using the official Cassandra image 3.11.16 on Docker Hub. The first service is configured just with the following environment variables {code:java} CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance the _cassandra.yaml_ for the first service contains the following (and the rest is the image default): {code:java} # grep tasks /etc/cassandra/cassandra.yaml - seeds: "tasks.cassandra7,tasks.cassandra9" listen_address: tasks.cassandra7 broadcast_address: tasks.cassandra7 broadcast_rpc_address: tasks.cassandra7 {code} Other services (8 and 9) have a similar configuration, obviously with a different {{CASSANDRA_LISTEN_ADDRESS }}(\{{{}tasks.cassandra8}} and {{{}tasks.cassandra9{}}}). The cluster is running smoothly and all the nodes are perfectly able to rejoin the cluster whichever event occurs, thanks to the Docker Swarm {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for Docker swarm to restart it, force update it in order to force a restart, scale to 0 and then 1 the service, restart an entire server, turn off and then turn on all the 3 servers. Never found an issue on this. I also just completed a full upgrade of the cluster from version 2.2.8 to 3.11.16 (simply upgrading the Docker official image associated with the services) without issues. I was also able, thanks to a 2.2.8 snapshot on each server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables have now the {{me-*}} prefix. The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The procedure that I follow is very simple: # I start from the _cassandra7_ service (which is a seed node) # {{nodetool drain}} # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version The procedure is exactly the same I followed for the upgrade 2.2.8 --> 3.11.16, obviously with a different version at step 4. Unfortunately the upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and attempts to communicate with the other seed node ({_}cassandra9{_}) but the log of _cassandra7_ shows the following: {code:java} INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 OutboundConnectionInitiator.java:390 - Failed to connect to peer tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer{code} The relevant port of the log, related to the missing internode communication, is attached in _cassandra7.log_ In the log of _cassandra9_ there is nothing after the abovementioned step #4. So only _cassandra7_ is saying something in the logs. I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is always the same. Of course when I follow the steps 1..3, then restore the 3.x snapshot and finally perform the step #4 using the official 3.11.16 version the node 7 restarts correctly and joins the cluster. I attached the relevant part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that node 7 and 9 can communicate. I suspect this could be related to the port 7000 now (with Cassandra 4.x) supporting both encrypted and unencrypted traffic. As stated previously I'm using the untouched official Cassandra images so all my cluster, inside the Docker Swarm, is not (and has never been) configured with encryption. I can also add the following: if I perform the 4 above steps also for the _cassandra9_ and _cassandra8_ services, in the end the cluster works. But this is not acceptable, because the cluster is unavailable until I finish the full upgrade of all nodes: I need to perform a step-update, one node after the other, where only 1 node is temporarily down and the other N-1 stay up. Any idea on how to further investigate the issue? Thanks was: I have a Docker swarm cluster with 3 distinct Cassandra services (named {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different servers. The 3 services are running the version 3.11.16, using the official Cassandra image 3.11.16 on Docker Hub. The first service is configured just with the following environment variables {code:java} CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} which in turn, at startup, modifies
[jira] [Created] (CASSANDRA-19178) Cluster upgrade 3.x -> 4.x fails with no internode encryption
Aldo created CASSANDRA-19178: Summary: Cluster upgrade 3.x -> 4.x fails with no internode encryption Key: CASSANDRA-19178 URL: https://issues.apache.org/jira/browse/CASSANDRA-19178 Project: Cassandra Issue Type: Bug Components: Cluster/Gossip Reporter: Aldo Attachments: cassandra7.downgrade.log, cassandra7.log I have a Docker swarm cluster with 3 distinct Cassandra services (named {_}cassandra7{_}, {_}cassandra8{_}, {_}cassandra9{_}) running on 3 different servers. The 3 services are running the version 3.11.16, using the official Cassandra image 3.11.16 on Docker Hub. The first service is configured just with the following environment variables {code:java} CASSANDRA_LISTEN_ADDRESS="tasks.cassandra7" CASSANDRA_SEEDS="tasks.cassandra7,tasks.cassandra9" {code} which in turn, at startup, modifies the {_}cassandra.yaml{_}. So for instance the _cassandra.yaml_ for the first service contains the following (and the rest is the image default): {code:java} # grep tasks /etc/cassandra/cassandra.yaml - seeds: "tasks.cassandra7,tasks.cassandra9" listen_address: tasks.cassandra7 broadcast_address: tasks.cassandra7 broadcast_rpc_address: tasks.cassandra7 {code} Other services (8 and 9) have a similar configuration, obviously with a different {{CASSANDRA_LISTEN_ADDRESS }}({{{}tasks.cassandra8{}}} and {{{}tasks.cassandra9{}}}). The cluster is running smoothly and all the nodes are perfectly able to rejoin the cluster whichever event occurs, thanks to the Docker Swarm {{tasks.cassandraXXX}} "hostname": i can kill a Docker container waiting for Docker swarm to restart it, force update it in order to force a restart, scale to 0 and then 1 the service, restart an entire server, turn off and then turn on all the 3 servers. Never found an issue on this. I also just completed a full upgrade of the cluster from version 2.2.8 to 3.11.16 (simply upgrading the Docker official image associated with the services) without issues. I was also able, thanks to a 2.2.8 snapshot on each server, to perform a full downgrade to 2.2.8 and back to 3.11.16 again. I finally issued a {{nodetool upgradesstables}} on all nodes, so my SSTables have now the {{me-*}} prefix. The problem I'm facing right now is the upgrade from 3.11.16 to 4.x. The procedure that I follow is very simple: # I start from the _cassandra7_ service (which is a seed node) # {{nodetool drain}} # Wait for the {{DRAINING ... DRAINED}} messages to appear in the log # Upgrade the Docker image of _cassandra7_ to the official 4.1.3 version The procedure is exactly the same I followed for the upgrade 2.2.8 --> 3.11.16, obviously with a different version at step 4. Unfortunately the upgrade 3.x --> 4.x is not working, the _cassandra7_ service restarts and attempts to communicate with the other seed node ({_}cassandra9{_}) but the log of _cassandra7_ shows the following: {code:java} INFO [Messaging-EventLoop-3-3] 2023-12-06 17:15:04,727 OutboundConnectionInitiator.java:390 - Failed to connect to peer tasks.cassandra9/10.0.2.196:7000(tasks.cassandra9/10.0.2.196:7000) io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer{code} The relevant port of the log, related to the missing internode communication, is attached in _cassandra7.log_ In the log of _cassandra9_ there is nothing after the abovementioned step #4. So only _cassandra7_ is saying something in the logs. I tried with multiple versions (4.0.11 but also 4.0.0) but the outcome is always the same. Of course when I follow the steps 1..3, then restore the 3.x snapshot and finally perform the step #4 using the official 3.11.16 version the node 7 restarts correctly and joins the cluster. I attached the relevant part of the log (see {_}cassandra7.downgrade.log{_}) where you can see that node 7 and 9 can communicate. I suspect this could be related to the port 7000 now (with Cassandra 4.x) supporting both encrypted and unencrypted traffic. As stated previously I'm using the untouched official Cassandra images so all my cluster, inside the Docker Swarm, is not (and has never been) configured with encryption. Any idea on how to further investigate the issue? Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (AIRFLOW-2755) k8s workers think DAGs are always in `/tmp/dags`
Aldo created AIRFLOW-2755: - Summary: k8s workers think DAGs are always in `/tmp/dags` Key: AIRFLOW-2755 URL: https://issues.apache.org/jira/browse/AIRFLOW-2755 Project: Apache Airflow Issue Type: Bug Components: configuration, worker Reporter: Aldo We have Airflow configured to use the `KubernetesExecutor` and run tasks in newly created pods. I tried to use the `PythonOperator` to import the python callable from a python module located in the DAGs directory as [that should be possible|https://github.com/apache/incubator-airflow/blob/c7a472ed6b0d8a4720f57ba1140c8cf665757167/airflow/__init__.py#L42]. Airflow complained that the module was not found. After a fair amount of digging we found that the issue was that the workers have the `AIRFLOW__CORE__DAGS_FOLDER` environment variable set to `/tmp/dags` as [you can see from the code|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/kubernetes/worker_configuration.py#L84]. Unset that environment variable from within the task's pod and running the task manually worked as expected. I think that this path should be configurable (I'll give it a try to add a `kubernetes.worker_dags_folder` configuration). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TOREE-399) Make Spark Kernel work on Windows
[ https://issues.apache.org/jira/browse/TOREE-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958656#comment-15958656 ] aldo commented on TOREE-399: Hi Jakob, I created a quick run.bat with hardcoded values %SPARK_HOME%/bin/spark-submit --class org.apache.toree.Main C:\ProgramData\jupyter\kernels\apache_toree_scala\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar This passes the previous error, but still getting errors. I guess related to some scala config see below. Any idea? Besides the error, with the goal to create a windows version of the run.sh, is not clear to me how kernel.json var are passed to the run.bat and how can I refer to them in run.bat. Any direction? > Make Spark Kernel work on Windows > - > > Key: TOREE-399 > URL: https://issues.apache.org/jira/browse/TOREE-399 > Project: TOREE > Issue Type: New Feature > Environment: Windows 7/8/10 >Reporter: aldo > > After a successful install of the Spark Kernel the error: "Failed to run > command:" occurs when from jupyter we select a Scala Notebook. > The error happens because the kernel.json runs > C:\\ProgramData\\jupyter\\kernels\\apache_toree_scala\\bin\\run.sh which is > bash shell script and hence cannot work on windows. > Can you give me some direction to fix this, and I will implement it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (TOREE-399) Make Spark Kernel work on Windows
[ https://issues.apache.org/jira/browse/TOREE-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958656#comment-15958656 ] aldo edited comment on TOREE-399 at 4/6/17 9:38 AM: Hi Jakob, I created a quick run.bat with hardcoded values %SPARK_HOME%/bin/spark-submit --class org.apache.toree.Main C:\ProgramData\jupyter\kernels\apache_toree_scala\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar This passes the previous error, but still getting errors. I guess related to some scala config see below. Any idea? Besides the error, with the goal to create a windows version of the run.sh, is not clear to me how kernel.json var are passed to the run.bat and how can I refer to them in run.bat. Any direction? 17/03/31 09:55:29 [WARN] o.a.h.u.NativeCodeLoader - Unable to load native-hadoop library for your platform... using buil tin-java classes where applicable 17/03/31 09:55:30 [INFO] o.a.t.b.l.StandardComponentInitialization$$anon$1 - Connecting to spark.master local[*] [init] error: error while loading Object, Missing dependency 'object scala in compiler mirror', required by C:\Program F iles\Java\jdk1.8.0_121\jre\lib\rt.jar(java/lang/Object.class) Failed to initialize compiler: object scala in compiler mirror not found. ** Note that as of 2.8 scala does not assume use of the java classpath. ** For the old behavior pass -usejavacp to scala, or if using a Settings ** object programmatically, settings.usejavacp.value = true. Failed to initialize compiler: object scala in compiler mirror not found. ** Note that as of 2.8 scala does not assume use of the java classpath. ** For the old behavior pass -usejavacp to scala, or if using a Settings ** object programmatically, settings.usejavacp.value = true. Exception in thread "main" java.lang.NullPointerException at scala.reflect.internal.SymbolTable.exitingPhase(SymbolTable.scala:256) at scala.tools.nsc.interpreter.IMain$Request.x$20$lzycompute(IMain.scala:896) at scala.tools.nsc.interpreter.IMain$Request.x$20(IMain.scala:895) at scala.tools.nsc.interpreter.IMain$Request.headerPreamble$lzycompute(IMain.scala:895) at scala.tools.nsc.interpreter.IMain$Request.headerPreamble(IMain.scala:895) at scala.tools.nsc.interpreter.IMain$Request$Wrapper.preamble(IMain.scala:918) at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1337) at scala.tools.nsc.interpreter.IMain$CodeAssembler$$anonfun$apply$23.apply(IMain.scala:1336) at scala.tools.nsc.util.package$.stringFromWriter(package.scala:64) at scala.tools.nsc.interpreter.IMain$CodeAssembler$class.apply(IMain.scala:1336) at scala.tools.nsc.interpreter.IMain$Request$Wrapper.apply(IMain.scala:908) at scala.tools.nsc.interpreter.IMain$Request.compile$lzycompute(IMain.scala:1002) at scala.tools.nsc.interpreter.IMain$Request.compile(IMain.scala:997) at scala.tools.nsc.interpreter.IMain.compile(IMain.scala:579) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:567) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$$anonfun$start$1.apply(ScalaInterpreterSpe cific.scala:295) at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$$anonfun$start$1.apply(ScalaInterpreterSpe cific.scala:289) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at org.apache.toree.kernel.interpreter.scala.ScalaInterpreterSpecific$class.start(ScalaInterpreterSpecific.scala :289) at org.apache.toree.kernel.interpreter.scala.ScalaInterpreter.start(ScalaInterpreter.scala:44) at org.apache.toree.kernel.interpreter.scala.ScalaInterpreter.init(ScalaInterpreter.scala:87) at org.apache.toree.boot.layer.InterpreterManager$$anonfun$initializeInterpreters$1.apply(InterpreterManager.sca la:35) was (Author: alpajj): Hi Jakob, I created a quick run.bat with hardcoded values %SPARK_HOME%/bin/spark-submit --class org.apache.toree.Main C:\ProgramData\jupyter\kernels\apache_toree_scala\lib\toree-assembly-0.2.0.dev1-incubating-SNAPSHOT.jar This passes the previous error, but still getting errors. I guess related to some scala config see below. Any idea? Besides the error, with the goal to create a windows version of the run.sh, is not clear to me how kernel.json var are passed to the run.bat and how can I refer to them in run.bat. Any direction? > Make Spark Kernel work on Windows > - > > Key: TOREE-399 > URL: https://issues.apache.org/jira/browse/TOREE-399 > Project: TOREE > Issue Type: New Feature > Environment: Windows 7/8/10 >Reporter: aldo > > After a successful install of the Spark Kernel the error:
[jira] [Updated] (TIKA-2248) How to set up the content encoding
[ https://issues.apache.org/jira/browse/TIKA-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aldo updated TIKA-2248: --- Priority: Trivial (was: Major) > How to set up the content encoding > -- > > Key: TIKA-2248 > URL: https://issues.apache.org/jira/browse/TIKA-2248 > Project: Tika > Issue Type: Wish >Reporter: Aldo >Priority: Trivial > > If I try to set up content encoding with > Metadata metadata = new Metadata(); > metadata.add(Metadata.CONTENT_ENCODING, DATAFILE_CHARSET); > String parsedString = tika.parseToString(inputStream, metadata); > metadata CONTENT_ENCODING is ignored; > How I can force Tika to use CONTENT_ENCODING setted in metadata? > Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-2248) How to set up the content encoding
Aldo created TIKA-2248: -- Summary: How to set up the content encoding Key: TIKA-2248 URL: https://issues.apache.org/jira/browse/TIKA-2248 Project: Tika Issue Type: Wish Reporter: Aldo If I try to set up content encoding with Metadata metadata = new Metadata(); metadata.add(Metadata.CONTENT_ENCODING, DATAFILE_CHARSET); String parsedString = tika.parseToString(inputStream, metadata); metadata CONTENT_ENCODING is ignored; How I can force Tika to use CONTENT_ENCODING setted in metadata? Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)