[jira] [Commented] (CASSANDRA-7292) Can't seed new node into ring with (public) ip of an old node

2015-04-17 Thread John Alberts (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500557#comment-14500557
 ] 

John Alberts commented on CASSANDRA-7292:
-

[~brandon.williams] I was able to get this patch to fix my problem last night 
but tried again today and couldn't reproduce.  I think the db had issues from 
multiple restarts, version switches, etc.  I'm going to start from scratch with 
a new cluster, re-test, and I'll get back to you.


 Can't seed new node into ring with (public) ip of an old node
 -

 Key: CASSANDRA-7292
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7292
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.7, Ec2MultiRegionSnitch
Reporter: Juho Mäkinen
Assignee: Brandon Williams
  Labels: bootstrap, gossip
 Fix For: 2.0.15, 2.1.5

 Attachments: 7292.txt, cassandra-replace-address.log


 This bug prevents node to return with bootstrap into the cluster with its old 
 ip.
 Scenario: five node ec2 cluster spread into three AZ, all in one region. I'm 
 using Ec2MultiRegionSnitch. Nodes are reported with their public ips (as 
 Ec2MultiRegionSnitch requires)
 I simulated a loss of one node by terminating one instance. nodetool status 
 reported correctly that node was down. Then I launched new instance with the 
 old public ip (i'm using elastic ips) with 
 Dcassandra.replace_address=IP_ADDRESS but the new node can't join the 
 cluster:
  INFO 07:20:43,424 Gathering node replacement information for /54.86.191.30
  INFO 07:20:43,428 Starting Messaging Service on port 9043
  INFO 07:20:43,489 Handshaking version with /54.86.171.10
  INFO 07:20:43,491 Handshaking version with /54.86.187.245
 (some delay)
 ERROR 07:21:14,445 Exception encountered during startup
 java.lang.RuntimeException: Unable to gossip with any seeds
   at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1193)
   at 
 org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:419)
   at 
 org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:650)
   at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
   at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:505)
   at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:362)
   at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:480)
   at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:569)
 It does not help if I remove the Dcassandra.replace_address=IP_ADDRESS 
 system property. 
 Also it does not help to remove the node with nodetool removenode with or 
 without the cassandra.replace_address property.
 I think this is because the node information is preserved in the gossip info 
 as seen this output of nodetool gossipinfo
 /54.86.191.30
   INTERNAL_IP:172.16.1.231
   DC:us-east
   REMOVAL_COORDINATOR:REMOVER,d581309a-8610-40d4-ba30-cb250eda22a8
   STATUS:removed,19311925-46b5-4fe4-928a-321e8adb731d,1401089960664
   HOST_ID:19311925-46b5-4fe4-928a-321e8adb731d
   RPC_ADDRESS:0.0.0.0
   NET_VERSION:7
   SCHEMA:226f9315-b4b2-32c1-bfe1-f4bb49fccfd5
   RACK:1b
   LOAD:7.075290515E9
   SEVERITY:0.0
   RELEASE_VERSION:2.0.7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7292) Can't seed new node into ring with (public) ip of an old node

2015-04-17 Thread John Alberts (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500811#comment-14500811
 ] 

John Alberts commented on CASSANDRA-7292:
-

I was able to confirm this patch does indeed seem to fix the problem I was 
having.  The patch was built against tag 'cassandra-2.0.11' running on amazon 
linux 2014.03.
[~brandon.williams] Thank you for all of your help with providing a fix for 
this issue.  Can't wait until it's in an official release package.


 Can't seed new node into ring with (public) ip of an old node
 -

 Key: CASSANDRA-7292
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7292
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.7, Ec2MultiRegionSnitch
Reporter: Juho Mäkinen
Assignee: Brandon Williams
  Labels: bootstrap, gossip
 Fix For: 2.0.15, 2.1.5

 Attachments: 7292.txt, cassandra-replace-address.log


 This bug prevents node to return with bootstrap into the cluster with its old 
 ip.
 Scenario: five node ec2 cluster spread into three AZ, all in one region. I'm 
 using Ec2MultiRegionSnitch. Nodes are reported with their public ips (as 
 Ec2MultiRegionSnitch requires)
 I simulated a loss of one node by terminating one instance. nodetool status 
 reported correctly that node was down. Then I launched new instance with the 
 old public ip (i'm using elastic ips) with 
 Dcassandra.replace_address=IP_ADDRESS but the new node can't join the 
 cluster:
  INFO 07:20:43,424 Gathering node replacement information for /54.86.191.30
  INFO 07:20:43,428 Starting Messaging Service on port 9043
  INFO 07:20:43,489 Handshaking version with /54.86.171.10
  INFO 07:20:43,491 Handshaking version with /54.86.187.245
 (some delay)
 ERROR 07:21:14,445 Exception encountered during startup
 java.lang.RuntimeException: Unable to gossip with any seeds
   at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1193)
   at 
 org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:419)
   at 
 org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:650)
   at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
   at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:505)
   at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:362)
   at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:480)
   at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:569)
 It does not help if I remove the Dcassandra.replace_address=IP_ADDRESS 
 system property. 
 Also it does not help to remove the node with nodetool removenode with or 
 without the cassandra.replace_address property.
 I think this is because the node information is preserved in the gossip info 
 as seen this output of nodetool gossipinfo
 /54.86.191.30
   INTERNAL_IP:172.16.1.231
   DC:us-east
   REMOVAL_COORDINATOR:REMOVER,d581309a-8610-40d4-ba30-cb250eda22a8
   STATUS:removed,19311925-46b5-4fe4-928a-321e8adb731d,1401089960664
   HOST_ID:19311925-46b5-4fe4-928a-321e8adb731d
   RPC_ADDRESS:0.0.0.0
   NET_VERSION:7
   SCHEMA:226f9315-b4b2-32c1-bfe1-f4bb49fccfd5
   RACK:1b
   LOAD:7.075290515E9
   SEVERITY:0.0
   RELEASE_VERSION:2.0.7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-15 Thread John Alberts (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497070#comment-14497070
 ] 

John Alberts commented on CASSANDRA-8072:
-

[~brandon.williams] Absolutely.  I probably won't get a chance to test this 
until tomorrow.
Thanks for the patch.


 Exception during startup: Unable to gossip with any seeds
 -

 Key: CASSANDRA-8072
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
 Project: Cassandra
  Issue Type: Bug
Reporter: Ryan Springer
Assignee: Brandon Williams
 Fix For: 2.0.15, 2.1.5

 Attachments: 8072.txt, 
 cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, 
 cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, 
 cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, 
 casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2


 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
 in either ec2 or locally, an error occurs sometimes with one of the nodes 
 refusing to start C*.  The error in the /var/log/cassandra/system.log is:
 ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
 Exception encountered during startup
 java.lang.RuntimeException: Unable to gossip with any seeds
 at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
 at 
 org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
 at 
 org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
 (line 1279) Announcing shutdown
  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
 MessagingService.java (line 701) Waiting for messaging service to quiesce
  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
 MessagingService.java (line 941) MessagingService has terminated the accept() 
 thread
 This errors does not always occur when provisioning a 2-node cluster, but 
 probably around half of the time on only one of the nodes.  I haven't been 
 able to reproduce this error with DSC 2.0.9, and there have been no code or 
 definition file changes in Opscenter.
 I can reproduce locally with the above steps.  I'm happy to test any proposed 
 fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread John Alberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Alberts updated CASSANDRA-8072:

Attachment: trace_logs.tar.bz2

Logs from cassandra cluster with logging set to TRACE.  This is from a new node 
launched and cassandra failed to start.

 Exception during startup: Unable to gossip with any seeds
 -

 Key: CASSANDRA-8072
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
 Project: Cassandra
  Issue Type: Bug
Reporter: Ryan Springer
Assignee: Brandon Williams
 Fix For: 2.0.15, 2.1.5

 Attachments: casandra-system-log-with-assert-patch.log, 
 trace_logs.tar.bz2


 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
 in either ec2 or locally, an error occurs sometimes with one of the nodes 
 refusing to start C*.  The error in the /var/log/cassandra/system.log is:
 ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
 Exception encountered during startup
 java.lang.RuntimeException: Unable to gossip with any seeds
 at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
 at 
 org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
 at 
 org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
 (line 1279) Announcing shutdown
  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
 MessagingService.java (line 701) Waiting for messaging service to quiesce
  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
 MessagingService.java (line 941) MessagingService has terminated the accept() 
 thread
 This errors does not always occur when provisioning a 2-node cluster, but 
 probably around half of the time on only one of the nodes.  I haven't been 
 able to reproduce this error with DSC 2.0.9, and there have been no code or 
 definition file changes in Opscenter.
 I can reproduce locally with the above steps.  I'm happy to test any proposed 
 fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread John Alberts (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490383#comment-14490383
 ] 

John Alberts edited comment on CASSANDRA-8072 at 4/10/15 9:23 PM:
--

Logs from cassandra cluster with logging set to TRACE.  This is from a new node 
launched and cassandra failed to start.
This is for a cluster running on EC2 using the ec2multiregion snitch.
I was able to reproduce this issue on a new cluster, decommissioned a node, 
shut it down, brought up a new node with the same EIP and this failed.



was (Author: albertsj1):
Logs from cassandra cluster with logging set to TRACE.  This is from a new node 
launched and cassandra failed to start.

 Exception during startup: Unable to gossip with any seeds
 -

 Key: CASSANDRA-8072
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
 Project: Cassandra
  Issue Type: Bug
Reporter: Ryan Springer
Assignee: Brandon Williams
 Fix For: 2.0.15, 2.1.5

 Attachments: casandra-system-log-with-assert-patch.log, 
 trace_logs.tar.bz2


 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
 in either ec2 or locally, an error occurs sometimes with one of the nodes 
 refusing to start C*.  The error in the /var/log/cassandra/system.log is:
 ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
 Exception encountered during startup
 java.lang.RuntimeException: Unable to gossip with any seeds
 at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
 at 
 org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
 at 
 org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
 (line 1279) Announcing shutdown
  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
 MessagingService.java (line 701) Waiting for messaging service to quiesce
  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
 MessagingService.java (line 941) MessagingService has terminated the accept() 
 thread
 This errors does not always occur when provisioning a 2-node cluster, but 
 probably around half of the time on only one of the nodes.  I haven't been 
 able to reproduce this error with DSC 2.0.9, and there have been no code or 
 definition file changes in Opscenter.
 I can reproduce locally with the above steps.  I'm happy to test any proposed 
 fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread John Alberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Alberts updated CASSANDRA-8072:

Attachment: cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2
cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2
cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2

Wow, that was a quick response. :)  Thanks for looking into this.
I cleaned out the log files, enabled trace on both seed nodes and the new node 
that is failing and started cassandra on the failing node.  Trace logs from 
each node are attached.


 Exception during startup: Unable to gossip with any seeds
 -

 Key: CASSANDRA-8072
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
 Project: Cassandra
  Issue Type: Bug
Reporter: Ryan Springer
Assignee: Brandon Williams
 Fix For: 2.0.15, 2.1.5

 Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, 
 cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, 
 cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, 
 casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2


 When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
 in either ec2 or locally, an error occurs sometimes with one of the nodes 
 refusing to start C*.  The error in the /var/log/cassandra/system.log is:
 ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
 Exception encountered during startup
 java.lang.RuntimeException: Unable to gossip with any seeds
 at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
 at 
 org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
 at 
 org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
 (line 1279) Announcing shutdown
  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
 MessagingService.java (line 701) Waiting for messaging service to quiesce
  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
 MessagingService.java (line 941) MessagingService has terminated the accept() 
 thread
 This errors does not always occur when provisioning a 2-node cluster, but 
 probably around half of the time on only one of the nodes.  I haven't been 
 able to reproduce this error with DSC 2.0.9, and there have been no code or 
 definition file changes in Opscenter.
 I can reproduce locally with the above steps.  I'm happy to test any proposed 
 fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)