[jira] [Updated] (CASSANDRA-14924) Cassandra nodes becomes unreachable to each other

ventsislav (JIRA) Sun, 09 Dec 2018 00:34:58 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ventsislav updated CASSANDRA-14924:
-----------------------------------
    Description: 
I have 3 nodes of elassandra running in docker containers.

Containers created like:
{code:java}
> Host 10.0.0.1 : docker run --name elassandra-node-1 --net=host -e 
> CASSANDRA_SEEDS="10.0.0.1" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e 
> CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest{code}
{code:java}
> Host 10.0.0.2 : docker run --name elassandra-node-2 --net=host -e 
> CASSANDRA_SEEDS="10.0.0.1,10.0.0.2" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e 
> CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest{code}
{code:java}
> Host 10.0.0.3 : docker run --name elassandra-node-3 --net=host -e 
> CASSANDRA_SEEDS="10.0.0.1,10.0.0.2,10.0.0.3" -e CASSANDRA_CLUSTER_NAME="BD 
> Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d 
> strapdata/elassandra:latest{code}
Cluster was working fine for a couple of days since created, elastic, cassandra 
all was perfect.

Currently however all cassandra nodes became unreachable to each other:
 Nodetool status on all nodes is like
{code:java}
> Datacenter: DC1
 > ===============
 > Status=Up/Down
 > |/ State=Normal/Leaving/Joining/Moving
 > – Address Load Tokens Owns (effective) Host ID Rack
 > DN 10.0.0.3 11.95 GiB 8 100.0% 7652f66e-194e-4886-ac10-0fc21ac8afeb r1
 > DN 10.0.0.2 11.92 GiB 8 100.0% b91fa129-1dd0-4cf8-be96-9c06b23daac6 r1
 > UN 10.0.0.1 11.9 GiB 8 100.0% 5c1afcff-b0aa-4985-a3cc-7f932056c08f r1{code}
Where the UN is the current host 10.0.0.1
 Same on all other nodes.

Nodetool describecluster on 10.0.0.1 is like
{code:java}
> Cluster Information:
 > Name: BD Storage
 > Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
 > DynamicEndPointSnitch: enabled
 > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 > Schema versions:
 > 24fa5e55-3935-3c0e-9808-99ce502fe98d: [10.0.0.1]
 > 
 > UNREACHABLE: [10.0.0.2,10.0.0.3]{code}
When attached to the first node its only repeating these infos:
{code:java}
> 2018-12-09 07:47:32,927 WARN [OptionalTasks:1] 
> org.apache.cassandra.auth.CassandraRoleManager.setupDefaultRole(CassandraRoleManager.java:361)
>  CassandraRoleManager skipped default role setup: some nodes were not ready
 > 2018-12-09 07:47:32,927 INFO [OptionalTasks:1] 
 > org.apache.cassandra.auth.CassandraRoleManager$4.run(CassandraRoleManager.java:400)
 >  Setup task failed with error, rescheduling
 > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.2] 
 > org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561)
 >  Handshaking version with /10.0.0.2
 > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.3] 
 > org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561)
 >  Handshaking version with /10.0.0.3{code}
After a while when some node is restarted:
{code:java}
> 2018-12-09 07:52:21,972 WARN [MigrationStage:1] 
> org.apache.cassandra.service.MigrationTask.runMayThrow(MigrationTask.java:67) 
> Can't send schema pull request: node /10.0.0.2 is down.{code}
Tried so far:
 Restarting all containers at the same time
 Restarting all containers one after another
 Restarting cassandra in all containers like : service cassandra restart
 Nodetool disablegossip then enable it
 Nodetool repair : Repair command #1 failed with error Endpoint not alive: 
/10.0.0.2

Seems that all node schemas are different, but I still dont understand why they 
are marked as down to each other.

  was:
I have 3 nodes of elassandra running in docker containers.

Containers created like:

> Host 10.0.0.1 : docker run --name elassandra-node-1 --net=host -e 
> CASSANDRA_SEEDS="10.0.0.1" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e 
> CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest

> Host 10.0.0.2 : docker run --name elassandra-node-2 --net=host -e 
> CASSANDRA_SEEDS="10.0.0.1,10.0.0.2" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e 
> CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d strapdata/elassandra:latest

> Host 10.0.0.3 : docker run --name elassandra-node-3 --net=host -e 
> CASSANDRA_SEEDS="10.0.0.1,10.0.0.2,10.0.0.3" -e CASSANDRA_CLUSTER_NAME="BD 
> Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d 
> strapdata/elassandra:latest

Cluster was working fine for a couple of days since created, elastic, cassandra 
all was perfect.

Currently however all cassandra nodes became unreachable to each other:
Nodetool status on all nodes is like

> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID Rack
> DN 10.0.0.3 11.95 GiB 8 100.0% 7652f66e-194e-4886-ac10-0fc21ac8afeb r1
> DN 10.0.0.2 11.92 GiB 8 100.0% b91fa129-1dd0-4cf8-be96-9c06b23daac6 r1
> UN 10.0.0.1 11.9 GiB 8 100.0% 5c1afcff-b0aa-4985-a3cc-7f932056c08f r1

Where the UN is the current host 10.0.0.1
Same on all other nodes.

Nodetool describecluster on 10.0.0.1 is like

> Cluster Information:
> Name: BD Storage
> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
> DynamicEndPointSnitch: enabled
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> 24fa5e55-3935-3c0e-9808-99ce502fe98d: [10.0.0.1]
> 
> UNREACHABLE: [10.0.0.2,10.0.0.3]


When attached to the first node its only repeating these infos:

> 2018-12-09 07:47:32,927 WARN [OptionalTasks:1] 
> org.apache.cassandra.auth.CassandraRoleManager.setupDefaultRole(CassandraRoleManager.java:361)
>  CassandraRoleManager skipped default role setup: some nodes were not ready
> 2018-12-09 07:47:32,927 INFO [OptionalTasks:1] 
> org.apache.cassandra.auth.CassandraRoleManager$4.run(CassandraRoleManager.java:400)
>  Setup task failed with error, rescheduling
> 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.2] 
> org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561)
>  Handshaking version with /10.0.0.2
> 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.3] 
> org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561)
>  Handshaking version with /10.0.0.3

After a while when some node is restarted:

> 2018-12-09 07:52:21,972 WARN [MigrationStage:1] 
> org.apache.cassandra.service.MigrationTask.runMayThrow(MigrationTask.java:67) 
> Can't send schema pull request: node /10.0.0.2 is down.

Tried so far:
Restarting all containers at the same time
Restarting all containers one after another
Restarting cassandra in all containers like : service cassandra restart
Nodetool disablegossip then enable it
Nodetool repair : Repair command #1 failed with error Endpoint not alive: 
/10.0.0.2

Seems that all node schemas are different, but I still dont understand why they 
are marked as down to each other.


> Cassandra nodes becomes unreachable to each other
> -------------------------------------------------
>
>                 Key: CASSANDRA-14924
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14924
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: ventsislav
>            Priority: Critical
>
> I have 3 nodes of elassandra running in docker containers.
> Containers created like:
> {code:java}
> > Host 10.0.0.1 : docker run --name elassandra-node-1 --net=host -e 
> > CASSANDRA_SEEDS="10.0.0.1" -e CASSANDRA_CLUSTER_NAME="BD Storage" -e 
> > CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d 
> > strapdata/elassandra:latest{code}
> {code:java}
> > Host 10.0.0.2 : docker run --name elassandra-node-2 --net=host -e 
> > CASSANDRA_SEEDS="10.0.0.1,10.0.0.2" -e CASSANDRA_CLUSTER_NAME="BD Storage" 
> > -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d 
> > strapdata/elassandra:latest{code}
> {code:java}
> > Host 10.0.0.3 : docker run --name elassandra-node-3 --net=host -e 
> > CASSANDRA_SEEDS="10.0.0.1,10.0.0.2,10.0.0.3" -e CASSANDRA_CLUSTER_NAME="BD 
> > Storage" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="r1" -d 
> > strapdata/elassandra:latest{code}
> Cluster was working fine for a couple of days since created, elastic, 
> cassandra all was perfect.
> Currently however all cassandra nodes became unreachable to each other:
>  Nodetool status on all nodes is like
> {code:java}
> > Datacenter: DC1
>  > ===============
>  > Status=Up/Down
>  > |/ State=Normal/Leaving/Joining/Moving
>  > – Address Load Tokens Owns (effective) Host ID Rack
>  > DN 10.0.0.3 11.95 GiB 8 100.0% 7652f66e-194e-4886-ac10-0fc21ac8afeb r1
>  > DN 10.0.0.2 11.92 GiB 8 100.0% b91fa129-1dd0-4cf8-be96-9c06b23daac6 r1
>  > UN 10.0.0.1 11.9 GiB 8 100.0% 5c1afcff-b0aa-4985-a3cc-7f932056c08f r1{code}
> Where the UN is the current host 10.0.0.1
>  Same on all other nodes.
> Nodetool describecluster on 10.0.0.1 is like
> {code:java}
> > Cluster Information:
>  > Name: BD Storage
>  > Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>  > DynamicEndPointSnitch: enabled
>  > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>  > Schema versions:
>  > 24fa5e55-3935-3c0e-9808-99ce502fe98d: [10.0.0.1]
>  > 
>  > UNREACHABLE: [10.0.0.2,10.0.0.3]{code}
> When attached to the first node its only repeating these infos:
> {code:java}
> > 2018-12-09 07:47:32,927 WARN [OptionalTasks:1] 
> > org.apache.cassandra.auth.CassandraRoleManager.setupDefaultRole(CassandraRoleManager.java:361)
> >  CassandraRoleManager skipped default role setup: some nodes were not ready
>  > 2018-12-09 07:47:32,927 INFO [OptionalTasks:1] 
> org.apache.cassandra.auth.CassandraRoleManager$4.run(CassandraRoleManager.java:400)
>  Setup task failed with error, rescheduling
>  > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.2] 
> org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561)
>  Handshaking version with /10.0.0.2
>  > 2018-12-09 07:47:32,980 INFO [HANDSHAKE-/10.0.0.3] 
> org.apache.cassandra.net.OutboundTcpConnection.lambda$handshakeVersion$1(OutboundTcpConnection.java:561)
>  Handshaking version with /10.0.0.3{code}
> After a while when some node is restarted:
> {code:java}
> > 2018-12-09 07:52:21,972 WARN [MigrationStage:1] 
> > org.apache.cassandra.service.MigrationTask.runMayThrow(MigrationTask.java:67)
> >  Can't send schema pull request: node /10.0.0.2 is down.{code}
> Tried so far:
>  Restarting all containers at the same time
>  Restarting all containers one after another
>  Restarting cassandra in all containers like : service cassandra restart
>  Nodetool disablegossip then enable it
>  Nodetool repair : Repair command #1 failed with error Endpoint not alive: 
> /10.0.0.2
> Seems that all node schemas are different, but I still dont understand why 
> they are marked as down to each other.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14924) Cassandra nodes becomes unreachable to each other

Reply via email to