[jira] [Commented] (CASSANDRA-8274) Node fails to rejoin cluster on EC2 if private IP is changed

2017-09-21 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175724#comment-16175724
 ] 

Jeff Jirsa commented on CASSANDRA-8274:
---

Is this still an issue, or has it been resolved in the past 3 years?


> Node fails to rejoin cluster on EC2 if private IP is changed
> 
>
> Key: CASSANDRA-8274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
> Environment: Amazon EC2
>Reporter: Joseph Clark
>Priority: Minor
> Fix For: 3.11.x
>
>
> Nodes in Amazon AWS EC2 Classic (not a VPC) may be assigned a new private IP 
> if the node is stopped and then started again. In this case we have puppet 
> update the configured listen_address to the new private IP. However, once the 
> cassandra service starts, it is unable to communicate with the existing 
> nodes(single region) and vice versa.
> 'nodetool status' shows that each node believes that it is 'UN' and the other 
> node is 'DN'.
> 'nodetool gossipinfo' on the node that remained running shows the *old* 
> private IP listed as the 'INTERNAL_IP' of the node that was stopped and 
> restarted. 
> The situation is resolved by restarting the cassandra service on the node 
> that remained running. Once it has restarted, the INTERNAL_IP is correctly 
> updated to the new private IP. 'nodetool status' shows that both nodes are up 
> and the cluster appears to function normally.
> This appears to me to be the root cause of 
> https://issues.apache.org/jira/browse/CASSANDRA-7292. -Possibly 
> https://issues.apache.org/jira/browse/CASSANDRA-8072 as well, but I am not 
> convinced they are actually duplicates.-



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8274) Node fails to rejoin cluster on EC2 if private IP is changed

2017-09-21 Thread Joseph Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175788#comment-16175788
 ] 

Joseph Clark commented on CASSANDRA-8274:
-

Unfortunately I'm no longer set up to reproduce this issue.

> Node fails to rejoin cluster on EC2 if private IP is changed
> 
>
> Key: CASSANDRA-8274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
> Environment: Amazon EC2
>Reporter: Joseph Clark
>Priority: Minor
> Fix For: 3.11.x
>
>
> Nodes in Amazon AWS EC2 Classic (not a VPC) may be assigned a new private IP 
> if the node is stopped and then started again. In this case we have puppet 
> update the configured listen_address to the new private IP. However, once the 
> cassandra service starts, it is unable to communicate with the existing 
> nodes(single region) and vice versa.
> 'nodetool status' shows that each node believes that it is 'UN' and the other 
> node is 'DN'.
> 'nodetool gossipinfo' on the node that remained running shows the *old* 
> private IP listed as the 'INTERNAL_IP' of the node that was stopped and 
> restarted. 
> The situation is resolved by restarting the cassandra service on the node 
> that remained running. Once it has restarted, the INTERNAL_IP is correctly 
> updated to the new private IP. 'nodetool status' shows that both nodes are up 
> and the cluster appears to function normally.
> This appears to me to be the root cause of 
> https://issues.apache.org/jira/browse/CASSANDRA-7292. -Possibly 
> https://issues.apache.org/jira/browse/CASSANDRA-8072 as well, but I am not 
> convinced they are actually duplicates.-



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8274) Node fails to rejoin cluster on EC2 if private IP is changed

2017-09-22 Thread Chris mildebrandt (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177493#comment-16177493
 ] 

Chris mildebrandt commented on CASSANDRA-8274:
--

I just hit this issue today with the 3.11.0 docker image running in kubernetes. 
I had 4 nodes in the cassandra cluster, two members were restarted and can't 
rejoin. There's one seed that is up and reachable from all the other 
containers, and one other member that is able to join. The first exception I 
see is this:
{{java.lang.RuntimeException: Cache schema version 
38e97a53-563b-3074-b86f-c81efa980524 does not match current schema version 
1bfdabae-743e-357e-a661-93984c26bc32
at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:206) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:164) 
[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:160) 
[apache-cassandra-3.11.0.jar:3.11.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]}}

Then I see the one related to this issue:
{{java.lang.RuntimeException: Unable to gossip with any seeds
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1413) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:550)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:801)
 ~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:666) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:393) 
[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) 
[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) 
[apache-cassandra-3.11.0.jar:3.11.0]}}

Restarting the nodes didn't help. nodetool status is now reporting only two 
nodes, and nodetool gossipinfo has three "empty" entries:

{{/100.96.3.164
  generation:0
  heartbeat:0
  TOKENS: not present
/100.96.1.7
  generation:0
  heartbeat:0
  TOKENS: not present
/100.96.2.170
  generation:0
  heartbeat:0
  TOKENS: not present}}


> Node fails to rejoin cluster on EC2 if private IP is changed
> 
>
> Key: CASSANDRA-8274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
> Environment: Amazon EC2
>Reporter: Joseph Clark
>Priority: Minor
> Fix For: 3.11.x
>
>
> Nodes in Amazon AWS EC2 Classic (not a VPC) may be assigned a new private IP 
> if the node is stopped and then started again. In this case we have puppet 
> update the configured listen_address to the new private IP. However, once the 
> cassandra service starts, it is unable to communicate with the existing 
> nodes(single region) and vice versa.
> 'nodetool status' shows that each node believes that it is 'UN' and the other 
> node is 'DN'.
> 'nodetool gossipinfo' on the node that remained running shows the *old* 
> private IP listed as the 'INTERNAL_IP' of the node that was stopped and 
> restarted. 
> The situation is resolved by restarting the cassandra service on the node 
> that remained running. Once it has restarted, the INTERNAL_IP is correctly 
> updated to the new private IP. 'nodetool status' shows that both nodes are up 
> and the cluster appears to function normally.
> This appears to me to be the root cause of 
> https://issues.apache.org/jira/browse/CASSANDRA-7292. -Possibly 
> https://issues.apache.org/jira/browse/CASSANDRA-8072 as well, but I am not 
> convinced they are actually duplicates.-



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8274) Node fails to rejoin cluster on EC2 if private IP is changed

2014-11-06 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201230#comment-14201230
 ] 

Brandon Williams commented on CASSANDRA-8274:
-

bq. This appears to me to be the root cause of 
https://issues.apache.org/jira/browse/CASSANDRA-7292. Possibly 
https://issues.apache.org/jira/browse/CASSANDRA-8072

Neither, because I've repro'd it and don't use EC2.

> Node fails to rejoin cluster on EC2 if private IP is changed
> 
>
> Key: CASSANDRA-8274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Amazon EC2
>Reporter: Joseph Clark
>
> Nodes in Amazon AWS EC2 Classic (not a VPC) may be assigned a new private IP 
> if the node is stopped and then started again. In this case we have puppet 
> update the configured listen_address to the new private IP. However, once the 
> cassandra service starts, it is unable to communicate with the existing 
> nodes(single region) and vice versa.
> 'nodetool status' shows that each node believes that it is 'UN' and the other 
> node is 'DN'.
> 'nodetool gossipinfo' on the node that remained running shows the *old* 
> private IP listed as the 'INTERNAL_IP' of the node that was stopped and 
> restarted. 
> The situation is resolved by restarting the cassandra service on the node 
> that remained running. Once it has restarted, the INTERNAL_IP is correctly 
> updated to the new private IP. 'nodetool status' shows that both nodes are up 
> and the cluster appears to function normally.
> This appears to me to be the root cause of 
> https://issues.apache.org/jira/browse/CASSANDRA-7292. Possibly 
> https://issues.apache.org/jira/browse/CASSANDRA-8072 as well, but I am not 
> convinced they are actually duplicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8274) Node fails to rejoin cluster on EC2 if private IP is changed

2014-11-06 Thread Joseph Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201244#comment-14201244
 ] 

Joseph Clark commented on CASSANDRA-8274:
-

7292 and 8072 have different stack traces. Which did you reproduce?

> Node fails to rejoin cluster on EC2 if private IP is changed
> 
>
> Key: CASSANDRA-8274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Amazon EC2
>Reporter: Joseph Clark
>
> Nodes in Amazon AWS EC2 Classic (not a VPC) may be assigned a new private IP 
> if the node is stopped and then started again. In this case we have puppet 
> update the configured listen_address to the new private IP. However, once the 
> cassandra service starts, it is unable to communicate with the existing 
> nodes(single region) and vice versa.
> 'nodetool status' shows that each node believes that it is 'UN' and the other 
> node is 'DN'.
> 'nodetool gossipinfo' on the node that remained running shows the *old* 
> private IP listed as the 'INTERNAL_IP' of the node that was stopped and 
> restarted. 
> The situation is resolved by restarting the cassandra service on the node 
> that remained running. Once it has restarted, the INTERNAL_IP is correctly 
> updated to the new private IP. 'nodetool status' shows that both nodes are up 
> and the cluster appears to function normally.
> This appears to me to be the root cause of 
> https://issues.apache.org/jira/browse/CASSANDRA-7292. Possibly 
> https://issues.apache.org/jira/browse/CASSANDRA-8072 as well, but I am not 
> convinced they are actually duplicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8274) Node fails to rejoin cluster on EC2 if private IP is changed

2014-11-06 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201278#comment-14201278
 ] 

Brandon Williams commented on CASSANDRA-8274:
-

Given that the stracktraces in both are exactly the same: yes :)

> Node fails to rejoin cluster on EC2 if private IP is changed
> 
>
> Key: CASSANDRA-8274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Amazon EC2
>Reporter: Joseph Clark
>
> Nodes in Amazon AWS EC2 Classic (not a VPC) may be assigned a new private IP 
> if the node is stopped and then started again. In this case we have puppet 
> update the configured listen_address to the new private IP. However, once the 
> cassandra service starts, it is unable to communicate with the existing 
> nodes(single region) and vice versa.
> 'nodetool status' shows that each node believes that it is 'UN' and the other 
> node is 'DN'.
> 'nodetool gossipinfo' on the node that remained running shows the *old* 
> private IP listed as the 'INTERNAL_IP' of the node that was stopped and 
> restarted. 
> The situation is resolved by restarting the cassandra service on the node 
> that remained running. Once it has restarted, the INTERNAL_IP is correctly 
> updated to the new private IP. 'nodetool status' shows that both nodes are up 
> and the cluster appears to function normally.
> This appears to me to be the root cause of 
> https://issues.apache.org/jira/browse/CASSANDRA-7292. Possibly 
> https://issues.apache.org/jira/browse/CASSANDRA-8072 as well, but I am not 
> convinced they are actually duplicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8274) Node fails to rejoin cluster on EC2 if private IP is changed

2014-11-06 Thread Joseph Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201345#comment-14201345
 ] 

Joseph Clark commented on CASSANDRA-8274:
-

at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1193)
at 
org.apache.cassandra.service.StorageService.*prepareReplacementInfo*(StorageService.java:419)
at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:650)

at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
at 
org.apache.cassandra.service.StorageService.*checkForEndpointCollision*(StorageService.java:444)
at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)

Slightly different :)

Even if they were the same, I believe that they are still separate issues. 7292 
and this bug both involve starting a node that the cluster *already knows 
about* as identified by the public lP. Another similarity between 7292 and this 
bug, in my case at least and likely the original reporter as well, is that the 
private IP/listen_address has been changed. As far as I can tell, 8072 occurs 
with brand new nodes that aren't replacing another node in the cluster.

> Node fails to rejoin cluster on EC2 if private IP is changed
> 
>
> Key: CASSANDRA-8274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8274
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Amazon EC2
>Reporter: Joseph Clark
>
> Nodes in Amazon AWS EC2 Classic (not a VPC) may be assigned a new private IP 
> if the node is stopped and then started again. In this case we have puppet 
> update the configured listen_address to the new private IP. However, once the 
> cassandra service starts, it is unable to communicate with the existing 
> nodes(single region) and vice versa.
> 'nodetool status' shows that each node believes that it is 'UN' and the other 
> node is 'DN'.
> 'nodetool gossipinfo' on the node that remained running shows the *old* 
> private IP listed as the 'INTERNAL_IP' of the node that was stopped and 
> restarted. 
> The situation is resolved by restarting the cassandra service on the node 
> that remained running. Once it has restarted, the INTERNAL_IP is correctly 
> updated to the new private IP. 'nodetool status' shows that both nodes are up 
> and the cluster appears to function normally.
> This appears to me to be the root cause of 
> https://issues.apache.org/jira/browse/CASSANDRA-7292. Possibly 
> https://issues.apache.org/jira/browse/CASSANDRA-8072 as well, but I am not 
> convinced they are actually duplicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)