[ https://issues.apache.org/jira/browse/CASSANDRA-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ventsislav updated CASSANDRA-14924: ----------------------------------- Description: I have 3 nodes of elassandra running in docker containers. Containers created like: {code:java} > Host 10.0.0.1 : docker run --name elassandra-node-1 --net=host -e > CASSANDRA_SEEDS="10.0.0.1" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e > CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest{code} {code:java} > Host 10.0.0.2 : docker run --name elassandra-node-2 --net=host -e > CASSANDRA_SEEDS="10.0.0.1,10.0.0.2" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e > CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest{code} {code:java} > Host 10.0.0.3 : docker run --name elassandra-node-3 --net=host -e > CASSANDRA_SEEDS="10.0.0.1,10.0.0.2,10.0.0.3" -e CASSANDRA_CLUSTER_NAME="BD > Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d > strapdata/elassandra:latest{code} Cluster was working fine for a couple of days since created, elastic, cassandra all was perfect. Currently however all cassandra nodes became unreachable to each other: Nodetool status on all nodes is like {code:java} > Datacenter: DC1 > =============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > – Address Load Tokens Owns (effective) Host ID Rack > DN 10.0.0.3 11.95 GiB 8 100.0% 7652f66e-194e-4886-ac10-0fc21ac8afeb r1 > DN 10.0.0.2 11.92 GiB 8 100.0% b91fa129-1dd0-4cf8-be96-9c06b23daac6 r1 > UN 10.0.0.1 11.9 GiB 8 100.0% 5c1afcff-b0aa-4985-a3cc-7f932056c08f r1{code} Where the UN is the current host 10.0.0.1 Same on all other nodes. Nodetool describecluster on 10.0.0.1 is like {code:java} > Cluster Information: > Name: BD Storage > Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch > DynamicEndPointSnitch: enabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > 24fa5e55-3935-3c0e-9808-99ce502fe98d: [10.0.0.1] > > UNREACHABLE: [10.0.0.2,10.0.0.3]{code} When attached to the first node its only repeating these infos: {code:java} > 2018-12-09 07:47:32,927 WARN [OptionalTasks:1] > org.apache.cassandra.auth.CassandraRoleManager.setupDefaultRole(CassandraRoleManager.java:361) > CassandraRoleManager skipped default role setup: some nodes were not ready > 2018-12-09 07:47:32,927 INFO [OptionalTasks:1] > org.apache.cassandra.auth.CassandraRoleManager$4.run(CassandraRoleManager.java:400) > Setup task failed with error, rescheduling > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.2] > org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561) > Handshaking version with /10.0.0.2 > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.3] > org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561) > Handshaking version with /10.0.0.3{code} After a while when some node is restarted: {code:java} > 2018-12-09 07:52:21,972 WARN [MigrationStage:1] > org.apache.cassandra.service.MigrationTask.runMayThrow(MigrationTask.java:67) > Can't send schema pull request: node /10.0.0.2 is down.{code} Tried so far: Restarting all containers at the same time Restarting all containers one after another Restarting cassandra in all containers like : service cassandra restart Nodetool disablegossip then enable it Nodetool repair : Repair command #1 failed with error Endpoint not alive: /10.0.0.2 Seems that all node schemas are different, but I still dont understand why they are marked as down to each other. was: I have 3 nodes of elassandra running in docker containers. Containers created like: > Host 10.0.0.1 : docker run --name elassandra-node-1 --net=host -e > CASSANDRA_SEEDS="10.0.0.1" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e > CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest > Host 10.0.0.2 : docker run --name elassandra-node-2 --net=host -e > CASSANDRA_SEEDS="10.0.0.1,10.0.0.2" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e > CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest > Host 10.0.0.3 : docker run --name elassandra-node-3 --net=host -e > CASSANDRA_SEEDS="10.0.0.1,10.0.0.2,10.0.0.3" -e CASSANDRA_CLUSTER_NAME="BD > Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d > strapdata/elassandra:latest Cluster was working fine for a couple of days since created, elastic, cassandra all was perfect. Currently however all cassandra nodes became unreachable to each other: Nodetool status on all nodes is like > Datacenter: DC1 > =============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID Rack > DN 10.0.0.3 11.95 GiB 8 100.0% 7652f66e-194e-4886-ac10-0fc21ac8afeb r1 > DN 10.0.0.2 11.92 GiB 8 100.0% b91fa129-1dd0-4cf8-be96-9c06b23daac6 r1 > UN 10.0.0.1 11.9 GiB 8 100.0% 5c1afcff-b0aa-4985-a3cc-7f932056c08f r1 Where the UN is the current host 10.0.0.1 Same on all other nodes. Nodetool describecluster on 10.0.0.1 is like > Cluster Information: > Name: BD Storage > Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch > DynamicEndPointSnitch: enabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > 24fa5e55-3935-3c0e-9808-99ce502fe98d: [10.0.0.1] > > UNREACHABLE: [10.0.0.2,10.0.0.3] When attached to the first node its only repeating these infos: > 2018-12-09 07:47:32,927 WARN [OptionalTasks:1] > org.apache.cassandra.auth.CassandraRoleManager.setupDefaultRole(CassandraRoleManager.java:361) > CassandraRoleManager skipped default role setup: some nodes were not ready > 2018-12-09 07:47:32,927 INFO [OptionalTasks:1] > org.apache.cassandra.auth.CassandraRoleManager$4.run(CassandraRoleManager.java:400) > Setup task failed with error, rescheduling > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.2] > org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561) > Handshaking version with /10.0.0.2 > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.3] > org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561) > Handshaking version with /10.0.0.3 After a while when some node is restarted: > 2018-12-09 07:52:21,972 WARN [MigrationStage:1] > org.apache.cassandra.service.MigrationTask.runMayThrow(MigrationTask.java:67) > Can't send schema pull request: node /10.0.0.2 is down. Tried so far: Restarting all containers at the same time Restarting all containers one after another Restarting cassandra in all containers like : service cassandra restart Nodetool disablegossip then enable it Nodetool repair : Repair command #1 failed with error Endpoint not alive: /10.0.0.2 Seems that all node schemas are different, but I still dont understand why they are marked as down to each other. > Cassandra nodes becomes unreachable to each other > ------------------------------------------------- > > Key: CASSANDRA-14924 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14924 > Project: Cassandra > Issue Type: Bug > Reporter: ventsislav > Priority: Critical > > I have 3 nodes of elassandra running in docker containers. > Containers created like: > {code:java} > > Host 10.0.0.1 : docker run --name elassandra-node-1 --net=host -e > > CASSANDRA_SEEDS="10.0.0.1" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e > > CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d > > strapdata/elassandra:latest{code} > {code:java} > > Host 10.0.0.2 : docker run --name elassandra-node-2 --net=host -e > > CASSANDRA_SEEDS="10.0.0.1,10.0.0.2" -e CASSANDRA_CLUSTER_NAME="BD Storage" > > -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d > > strapdata/elassandra:latest{code} > {code:java} > > Host 10.0.0.3 : docker run --name elassandra-node-3 --net=host -e > > CASSANDRA_SEEDS="10.0.0.1,10.0.0.2,10.0.0.3" -e CASSANDRA_CLUSTER_NAME="BD > > Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d > > strapdata/elassandra:latest{code} > Cluster was working fine for a couple of days since created, elastic, > cassandra all was perfect. > Currently however all cassandra nodes became unreachable to each other: > Nodetool status on all nodes is like > {code:java} > > Datacenter: DC1 > > =============== > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/Moving > > – Address Load Tokens Owns (effective) Host ID Rack > > DN 10.0.0.3 11.95 GiB 8 100.0% 7652f66e-194e-4886-ac10-0fc21ac8afeb r1 > > DN 10.0.0.2 11.92 GiB 8 100.0% b91fa129-1dd0-4cf8-be96-9c06b23daac6 r1 > > UN 10.0.0.1 11.9 GiB 8 100.0% 5c1afcff-b0aa-4985-a3cc-7f932056c08f r1{code} > Where the UN is the current host 10.0.0.1 > Same on all other nodes. > Nodetool describecluster on 10.0.0.1 is like > {code:java} > > Cluster Information: > > Name: BD Storage > > Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch > > DynamicEndPointSnitch: enabled > > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > > Schema versions: > > 24fa5e55-3935-3c0e-9808-99ce502fe98d: [10.0.0.1] > > > > UNREACHABLE: [10.0.0.2,10.0.0.3]{code} > When attached to the first node its only repeating these infos: > {code:java} > > 2018-12-09 07:47:32,927 WARN [OptionalTasks:1] > > org.apache.cassandra.auth.CassandraRoleManager.setupDefaultRole(CassandraRoleManager.java:361) > > CassandraRoleManager skipped default role setup: some nodes were not ready > > 2018-12-09 07:47:32,927 INFO [OptionalTasks:1] > org.apache.cassandra.auth.CassandraRoleManager$4.run(CassandraRoleManager.java:400) > Setup task failed with error, rescheduling > > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.2] > org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561) > Handshaking version with /10.0.0.2 > > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.3] > org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561) > Handshaking version with /10.0.0.3{code} > After a while when some node is restarted: > {code:java} > > 2018-12-09 07:52:21,972 WARN [MigrationStage:1] > > org.apache.cassandra.service.MigrationTask.runMayThrow(MigrationTask.java:67) > > Can't send schema pull request: node /10.0.0.2 is down.{code} > Tried so far: > Restarting all containers at the same time > Restarting all containers one after another > Restarting cassandra in all containers like : service cassandra restart > Nodetool disablegossip then enable it > Nodetool repair : Repair command #1 failed with error Endpoint not alive: > /10.0.0.2 > Seems that all node schemas are different, but I still dont understand why > they are marked as down to each other. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org