[jira] [Commented] (CASSANDRA-7292) Can't seed new node into ring with (public) ip of an old node
[ https://issues.apache.org/jira/browse/CASSANDRA-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500557#comment-14500557 ] John Alberts commented on CASSANDRA-7292: - [~brandon.williams] I was able to get this patch to fix my problem last night but tried again today and couldn't reproduce. I think the db had issues from multiple restarts, version switches, etc. I'm going to start from scratch with a new cluster, re-test, and I'll get back to you. Can't seed new node into ring with (public) ip of an old node - Key: CASSANDRA-7292 URL: https://issues.apache.org/jira/browse/CASSANDRA-7292 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.7, Ec2MultiRegionSnitch Reporter: Juho Mäkinen Assignee: Brandon Williams Labels: bootstrap, gossip Fix For: 2.0.15, 2.1.5 Attachments: 7292.txt, cassandra-replace-address.log This bug prevents node to return with bootstrap into the cluster with its old ip. Scenario: five node ec2 cluster spread into three AZ, all in one region. I'm using Ec2MultiRegionSnitch. Nodes are reported with their public ips (as Ec2MultiRegionSnitch requires) I simulated a loss of one node by terminating one instance. nodetool status reported correctly that node was down. Then I launched new instance with the old public ip (i'm using elastic ips) with Dcassandra.replace_address=IP_ADDRESS but the new node can't join the cluster: INFO 07:20:43,424 Gathering node replacement information for /54.86.191.30 INFO 07:20:43,428 Starting Messaging Service on port 9043 INFO 07:20:43,489 Handshaking version with /54.86.171.10 INFO 07:20:43,491 Handshaking version with /54.86.187.245 (some delay) ERROR 07:21:14,445 Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1193) at org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:419) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:650) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:505) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:362) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:480) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:569) It does not help if I remove the Dcassandra.replace_address=IP_ADDRESS system property. Also it does not help to remove the node with nodetool removenode with or without the cassandra.replace_address property. I think this is because the node information is preserved in the gossip info as seen this output of nodetool gossipinfo /54.86.191.30 INTERNAL_IP:172.16.1.231 DC:us-east REMOVAL_COORDINATOR:REMOVER,d581309a-8610-40d4-ba30-cb250eda22a8 STATUS:removed,19311925-46b5-4fe4-928a-321e8adb731d,1401089960664 HOST_ID:19311925-46b5-4fe4-928a-321e8adb731d RPC_ADDRESS:0.0.0.0 NET_VERSION:7 SCHEMA:226f9315-b4b2-32c1-bfe1-f4bb49fccfd5 RACK:1b LOAD:7.075290515E9 SEVERITY:0.0 RELEASE_VERSION:2.0.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7292) Can't seed new node into ring with (public) ip of an old node
[ https://issues.apache.org/jira/browse/CASSANDRA-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500811#comment-14500811 ] John Alberts commented on CASSANDRA-7292: - I was able to confirm this patch does indeed seem to fix the problem I was having. The patch was built against tag 'cassandra-2.0.11' running on amazon linux 2014.03. [~brandon.williams] Thank you for all of your help with providing a fix for this issue. Can't wait until it's in an official release package. Can't seed new node into ring with (public) ip of an old node - Key: CASSANDRA-7292 URL: https://issues.apache.org/jira/browse/CASSANDRA-7292 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.7, Ec2MultiRegionSnitch Reporter: Juho Mäkinen Assignee: Brandon Williams Labels: bootstrap, gossip Fix For: 2.0.15, 2.1.5 Attachments: 7292.txt, cassandra-replace-address.log This bug prevents node to return with bootstrap into the cluster with its old ip. Scenario: five node ec2 cluster spread into three AZ, all in one region. I'm using Ec2MultiRegionSnitch. Nodes are reported with their public ips (as Ec2MultiRegionSnitch requires) I simulated a loss of one node by terminating one instance. nodetool status reported correctly that node was down. Then I launched new instance with the old public ip (i'm using elastic ips) with Dcassandra.replace_address=IP_ADDRESS but the new node can't join the cluster: INFO 07:20:43,424 Gathering node replacement information for /54.86.191.30 INFO 07:20:43,428 Starting Messaging Service on port 9043 INFO 07:20:43,489 Handshaking version with /54.86.171.10 INFO 07:20:43,491 Handshaking version with /54.86.187.245 (some delay) ERROR 07:21:14,445 Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1193) at org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:419) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:650) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:505) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:362) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:480) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:569) It does not help if I remove the Dcassandra.replace_address=IP_ADDRESS system property. Also it does not help to remove the node with nodetool removenode with or without the cassandra.replace_address property. I think this is because the node information is preserved in the gossip info as seen this output of nodetool gossipinfo /54.86.191.30 INTERNAL_IP:172.16.1.231 DC:us-east REMOVAL_COORDINATOR:REMOVER,d581309a-8610-40d4-ba30-cb250eda22a8 STATUS:removed,19311925-46b5-4fe4-928a-321e8adb731d,1401089960664 HOST_ID:19311925-46b5-4fe4-928a-321e8adb731d RPC_ADDRESS:0.0.0.0 NET_VERSION:7 SCHEMA:226f9315-b4b2-32c1-bfe1-f4bb49fccfd5 RACK:1b LOAD:7.075290515E9 SEVERITY:0.0 RELEASE_VERSION:2.0.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497070#comment-14497070 ] John Alberts commented on CASSANDRA-8072: - [~brandon.williams] Absolutely. I probably won't get a chance to test this until tomorrow. Thanks for the patch. Exception during startup: Unable to gossip with any seeds - Key: CASSANDRA-8072 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 Project: Cassandra Issue Type: Bug Reporter: Ryan Springer Assignee: Brandon Williams Fix For: 2.0.15, 2.1.5 Attachments: 8072.txt, cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*. The error in the /var/log/cassandra/system.log is: ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept() thread This errors does not always occur when provisioning a 2-node cluster, but probably around half of the time on only one of the nodes. I haven't been able to reproduce this error with DSC 2.0.9, and there have been no code or definition file changes in Opscenter. I can reproduce locally with the above steps. I'm happy to test any proposed fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Alberts updated CASSANDRA-8072: Attachment: trace_logs.tar.bz2 Logs from cassandra cluster with logging set to TRACE. This is from a new node launched and cassandra failed to start. Exception during startup: Unable to gossip with any seeds - Key: CASSANDRA-8072 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 Project: Cassandra Issue Type: Bug Reporter: Ryan Springer Assignee: Brandon Williams Fix For: 2.0.15, 2.1.5 Attachments: casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*. The error in the /var/log/cassandra/system.log is: ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept() thread This errors does not always occur when provisioning a 2-node cluster, but probably around half of the time on only one of the nodes. I haven't been able to reproduce this error with DSC 2.0.9, and there have been no code or definition file changes in Opscenter. I can reproduce locally with the above steps. I'm happy to test any proposed fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490383#comment-14490383 ] John Alberts edited comment on CASSANDRA-8072 at 4/10/15 9:23 PM: -- Logs from cassandra cluster with logging set to TRACE. This is from a new node launched and cassandra failed to start. This is for a cluster running on EC2 using the ec2multiregion snitch. I was able to reproduce this issue on a new cluster, decommissioned a node, shut it down, brought up a new node with the same EIP and this failed. was (Author: albertsj1): Logs from cassandra cluster with logging set to TRACE. This is from a new node launched and cassandra failed to start. Exception during startup: Unable to gossip with any seeds - Key: CASSANDRA-8072 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 Project: Cassandra Issue Type: Bug Reporter: Ryan Springer Assignee: Brandon Williams Fix For: 2.0.15, 2.1.5 Attachments: casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*. The error in the /var/log/cassandra/system.log is: ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept() thread This errors does not always occur when provisioning a 2-node cluster, but probably around half of the time on only one of the nodes. I haven't been able to reproduce this error with DSC 2.0.9, and there have been no code or definition file changes in Opscenter. I can reproduce locally with the above steps. I'm happy to test any proposed fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Alberts updated CASSANDRA-8072: Attachment: cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2 cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2 cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2 Wow, that was a quick response. :) Thanks for looking into this. I cleaned out the log files, enabled trace on both seed nodes and the new node that is failing and started cassandra on the failing node. Trace logs from each node are attached. Exception during startup: Unable to gossip with any seeds - Key: CASSANDRA-8072 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072 Project: Cassandra Issue Type: Bug Reporter: Ryan Springer Assignee: Brandon Williams Fix For: 2.0.15, 2.1.5 Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster in either ec2 or locally, an error occurs sometimes with one of the nodes refusing to start C*. The error in the /var/log/cassandra/system.log is: ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200) at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444) at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:609) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 MessagingService.java (line 701) Waiting for messaging service to quiesce INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 MessagingService.java (line 941) MessagingService has terminated the accept() thread This errors does not always occur when provisioning a 2-node cluster, but probably around half of the time on only one of the nodes. I haven't been able to reproduce this error with DSC 2.0.9, and there have been no code or definition file changes in Opscenter. I can reproduce locally with the above steps. I'm happy to test any proposed fixes since I'm the only person able to reproduce reliably so far. -- This message was sent by Atlassian JIRA (v6.3.4#6332)