[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread John Alberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Alberts updated CASSANDRA-8072:

Attachment: trace_logs.tar.bz2

Logs from cassandra cluster with logging set to TRACE.  This is from a new node 
launched and cassandra failed to start.

> Exception during startup: Unable to gossip with any seeds
> -
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ryan Springer
>Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: casandra-system-log-with-assert-patch.log, 
> trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
> in either ec2 or locally, an error occurs sometimes with one of the nodes 
> refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
> (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
> MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
> MessagingService.java (line 941) MessagingService has terminated the accept() 
> thread
> This errors does not always occur when provisioning a 2-node cluster, but 
> probably around half of the time on only one of the nodes.  I haven't been 
> able to reproduce this error with DSC 2.0.9, and there have been no code or 
> definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed 
> fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread John Alberts (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490383#comment-14490383
 ] 

John Alberts edited comment on CASSANDRA-8072 at 4/10/15 9:23 PM:
--

Logs from cassandra cluster with logging set to TRACE.  This is from a new node 
launched and cassandra failed to start.
This is for a cluster running on EC2 using the ec2multiregion snitch.
I was able to reproduce this issue on a new cluster, decommissioned a node, 
shut it down, brought up a new node with the same EIP and this failed.



was (Author: albertsj1):
Logs from cassandra cluster with logging set to TRACE.  This is from a new node 
launched and cassandra failed to start.

> Exception during startup: Unable to gossip with any seeds
> -
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ryan Springer
>Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: casandra-system-log-with-assert-patch.log, 
> trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
> in either ec2 or locally, an error occurs sometimes with one of the nodes 
> refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
> (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
> MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
> MessagingService.java (line 941) MessagingService has terminated the accept() 
> thread
> This errors does not always occur when provisioning a 2-node cluster, but 
> probably around half of the time on only one of the nodes.  I haven't been 
> able to reproduce this error with DSC 2.0.9, and there have been no code or 
> definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed 
> fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490401#comment-14490401
 ] 

Brandon Williams commented on CASSANDRA-8072:
-

Thanks for the logs! So, the relevant portion of these logs is here:

{noformat}
DEBUG 20:53:50,981 Starting shadow gossip round to check for endpoint collision
 INFO 20:53:51,304 Starting Encrypted Messaging Service on SSL port 7001
 INFO 20:53:51,312 Starting Messaging Service on port 7000
 INFO 20:53:51,315 Loading settings from file:/etc/cassandra/conf/cassandra.yaml
TRACE 20:53:51,336 /54.219.189.161 sending GOSSIP_DIGEST_SYN to 
1@/54.219.189.162
TRACE 20:53:51,353 /54.219.189.161 sending GOSSIP_DIGEST_SYN to 
2@/54.219.189.163
DEBUG 20:53:51,354 attempting to connect to /54.219.189.162
TRACE 20:53:51,354 Assuming current protocol version for /54.219.189.162
TRACE 20:53:51,355 Filtering 
org.apache.cassandra.db.ColumnFamilyStore$9@496cc7de for rows matching 
org.apache.cassandra.db.filter.ExtendedFilter$EmptyClauseFilter@4b5e57b
DEBUG 20:53:51,359 attempting to connect to /54.219.189.163
TRACE 20:53:51,360 Assuming current protocol version for /54.219.189.163
 INFO 20:53:51,543 Handshaking version with 
cas-dev-dt-01-uw1-cassandra-seed02.localdomain-ext/54.219.189.163
 INFO 20:53:51,544 Handshaking version with 
cas-dev-dt-01-uw1-cassandra-seed01.localdomain-ext/54.219.189.162
DEBUG 20:53:51,583 Setting version 7 for 
cas-dev-dt-01-uw1-cassandra-seed02.localdomain-ext/54.219.189.163
DEBUG 20:53:51,586 Setting version 7 for 
cas-dev-dt-01-uw1-cassandra-seed01.localdomain-ext/54.219.189.162
TRACE 20:53:51,586 Upgrading OutputStream to be compressed
TRACE 20:53:51,588 Upgrading OutputStream to be compressed
TRACE 20:53:55,247 Expired 0 entries
DEBUG 20:53:55,598 GC for ConcurrentMarkSweep: 71 ms for 1 collections, 
240266176 used; max is 7935623168
TRACE 20:54:00,248 Expired 0 entries
TRACE 20:54:05,248 Expired 0 entries
TRACE 20:54:10,249 Expired 0 entries
TRACE 20:54:15,249 Expired 0 entries
TRACE 20:54:20,250 Expired 0 entries
ERROR 20:54:22,360 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
{noformat}

We can see that this node sent the SYN, which caused MS to connect to those 
nodes, and it successfully negotiated the version, so we know that everything 
worked as expected until this point.  What we don't know is why neither seed 
replied to the SYN, or if they even attempted to, or just never received the 
SYN for some reason.  Without TRACE from one of the seeds, we won't be able to 
tell.

> Exception during startup: Unable to gossip with any seeds
> -
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ryan Springer
>Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: casandra-system-log-with-assert-patch.log, 
> trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
> in either ec2 or locally, an error occurs sometimes with one of the nodes 
> refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
> (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
> MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
> MessagingService.java (line 941) MessagingService has terminated the accept() 
> thread
> This errors does not always occur when provisioning a 2-node cluster, but 
> probably around half of the time on only one of the nodes.  I haven't been 
> able to reproduce this error with DSC 2.0.9, and there have been no code or 
> definition file changes i

[jira] [Commented] (CASSANDRA-7557) User permissions for UDFs

2015-04-10 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490407#comment-14490407
 ] 

Tyler Hobbs commented on CASSANDRA-7557:


bq.  I've taken the lead from DROP TABLE - when IF EXISTS is used the statement 
silently succeeds, bypassing authz. When IF EXISTS is not present, we throw an 
IRE with "Unconfigured function ks.func(args)". wdyt?

That seems reasonable to me.

After looking over the tests again, I've come up with a few more things that 
would be good to test (apologies if any of these are already covered and I 
missed them):
* Granting both root/ks-level permissions _and_ individual function 
permissions, ensuring that revoking one does not affect revoking the other
* Similar to {{drop_function_and_keyspace_cleans_up_udf_permissions_test}}, 
test that dropping a keyspace drops function-level permissions for functions in 
that keyspace
* Ensure granting permissions on a builtin function (e.g. {{system.now}}) 
errors nicely.  Same for REVOKE on builtins and granting EXECUTE on 
non-function objects.
* Double granting/revoking is well-behaved (I'm not sure if it's supposed to 
error or succeed)

Also, in the {{inheritance_of_udf_permissions_test}}, shouldn't the {{GRANT 
EXECUTE}} statement be executed by the {{function_user}} role instead of 
{{cassandra}}?

> User permissions for UDFs
> -
>
> Key: CASSANDRA-7557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7557
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Tyler Hobbs
>Assignee: Sam Tunnicliffe
>  Labels: client-impacting, cql, udf
> Fix For: 3.0
>
>
> We probably want some new permissions for user defined functions.  Most 
> RDBMSes split function permissions roughly into {{EXECUTE}} and 
> {{CREATE}}/{{ALTER}}/{{DROP}} permissions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread John Alberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Alberts updated CASSANDRA-8072:

Attachment: cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2
cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2
cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2

Wow, that was a quick response. :)  Thanks for looking into this.
I cleaned out the log files, enabled trace on both seed nodes and the new node 
that is failing and started cassandra on the failing node.  Trace logs from 
each node are attached.


> Exception during startup: Unable to gossip with any seeds
> -
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ryan Springer
>Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, 
> casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
> in either ec2 or locally, an error occurs sometimes with one of the nodes 
> refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
> (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
> MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
> MessagingService.java (line 941) MessagingService has terminated the accept() 
> thread
> This errors does not always occur when provisioning a 2-node cluster, but 
> probably around half of the time on only one of the nodes.  I haven't been 
> able to reproduce this error with DSC 2.0.9, and there have been no code or 
> definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed 
> fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9165) improve error pathway test coverage, including fault injection testing

2015-04-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490510#comment-14490510
 ] 

Benedict commented on CASSANDRA-9165:
-

One avenue to explore for fault injection testing is handling of OOM, both from 
native allocators and random allocation points in normal java code.

> improve error pathway test coverage, including fault injection testing
> --
>
> Key: CASSANDRA-9165
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9165
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9161) Add random interleaving for flush/compaction when running CQL unit tests

2015-04-10 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-9161:

Labels: retrospective_generated  (was: )

> Add random interleaving for flush/compaction when running CQL unit tests
> 
>
> Key: CASSANDRA-9161
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9161
> Project: Cassandra
>  Issue Type: Test
>Reporter: Sylvain Lebresne
>  Labels: retrospective_generated
>
> Most CQL tests don't bother flushing, which means that they overwhelmingly 
> test the memtable path and not the sstables one. A simple way to improve on 
> that would be to make {{CQLTester}} issue flushes and compactions randomly 
> between statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9164) Randomized correctness testing of CQLSSTableWriter (and friends)

2015-04-10 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-9164:

Labels: retrospective_generated  (was: )

> Randomized correctness testing of CQLSSTableWriter (and friends)
> 
>
> Key: CASSANDRA-9164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9164
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Priority: Minor
>  Labels: retrospective_generated
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9163) Randomized subsystem correctness testing

2015-04-10 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-9163:

Labels: retrospective_generated  (was: )

> Randomized subsystem correctness testing
> 
>
> Key: CASSANDRA-9163
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9163
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>  Labels: retrospective_generated
>
> Whilst we are aiming to introduce a test harness that will be validated by 
> stress and test non-conflicting subsystem interactions, many bugs could be 
> caught with isolated randomized testing of a given component, or pair of 
> components, and more complex and rigorous testing is possible since in this 
> case we _do_ expect the subsystems to interact with each other, and by 
> isolating the components we can test more iterations more rapidly. Subsystems 
> that would certainly benefit: commitlog/memtables, cqlsstablewriter(s), 
> gossip...
> This is catch-all placeholder ticket to bridge CASSANDRA-9042 with specific 
> tickets for each subsystem



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9165) improve error pathway test coverage, including fault injection testing

2015-04-10 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-9165:

Labels: retrospective_generated  (was: )

> improve error pathway test coverage, including fault injection testing
> --
>
> Key: CASSANDRA-9165
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9165
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>  Labels: retrospective_generated
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9162) Randomized correctness testing for commit log subsystem, and memtable interaction

2015-04-10 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-9162:

Labels: retrospective_generated  (was: )

> Randomized correctness testing for commit log subsystem, and memtable 
> interaction
> -
>
> Key: CASSANDRA-9162
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9162
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>  Labels: retrospective_generated
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-8072:

Attachment: 8072.txt

Now we're getting somewhere.  It starts here, after the seed receives the dead 
state for the decommissioned node:

{noformat}
DEBUG [GossipStage:1] 2015-04-10 22:05:10,147 ReconnectableSnitchHelper.java 
(line 70) Intiated reconnect to an Internal IP /10.2.1.139 for the 
/54.219.189.161
{noformat}

Later, the seed receives the SYN and tries to send the ACK, but it tries to 
send over the previous internal IP:

{noformat}
DEBUG [ACCEPT-/10.2.0.71] 2015-04-10 22:06:45,576 MessagingService.java (line 
917) Connection version 7 from /54.219.189.161
DEBUG [Thread-11] 2015-04-10 22:06:45,621 MessagingService.java (line 780) 
Setting version 7 for /54.219.189.161
DEBUG [Thread-11] 2015-04-10 22:06:45,621 IncomingTcpConnection.java (line 107) 
Set version for /54.219.189.161 to 7 (will use 7)
TRACE [GossipStage:1] 2015-04-10 22:06:45,658 GossipDigestSynVerbHandler.java 
(line 40) Received a GossipDigestSynMessage from /54.219.189.161
TRACE [GossipStage:1] 2015-04-10 22:06:45,660 Gossiper.java (line 768) local 
heartbeat version 179776 greater than 0 for /54.219.189.161
TRACE [GossipStage:1] 2015-04-10 22:06:45,666 GossipDigestSynVerbHandler.java 
(line 84) Sending a GossipDigestAckMessage to /54.219.189.161
TRACE [GossipStage:1] 2015-04-10 22:06:45,666 MessagingService.java (line 660) 
/54.219.189.162 sending GOSSIP_DIGEST_ACK to 399@/54.219.189.161
DEBUG [WRITE-/54.219.189.161] 2015-04-10 22:06:45,666 
OutboundTcpConnection.java (line 290) attempting to connect to /10.2.1.139
{noformat}

It seems like the 'new' 161 isn't binding this IP, which is fine depending on 
your circumstance, but at least one problem we have is we shouldn't be sending 
the onJoin event for a dead state which triggers the initial reconnect.  I 
can't think of any reason we'd want to send that event upon discovery of any 
dead state, so patch to only send it for live states.

That said, I don't think this is the original cause, because when I've seen it 
I wasn't using INTERNAL_IP nor a reconnecting snitch.

> Exception during startup: Unable to gossip with any seeds
> -
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ryan Springer
>Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: 8072.txt, 
> cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, 
> casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
> in either ec2 or locally, an error occurs sometimes with one of the nodes 
> refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
> (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
> MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
> MessagingService.java (line 941) MessagingService has terminated the accept() 
> thread
> This errors does not always occur when provisioning a 2-node cluster, but 
> probably around half of the time on only one of the nodes.  I haven't been 
> able to reproduce this error with DSC 2.0.9, and there have been no code or 
> definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed 
> fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-8072:

Attachment: (was: 8072.txt)

> Exception during startup: Unable to gossip with any seeds
> -
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ryan Springer
>Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, 
> casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
> in either ec2 or locally, an error occurs sometimes with one of the nodes 
> refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
> (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
> MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
> MessagingService.java (line 941) MessagingService has terminated the accept() 
> thread
> This errors does not always occur when provisioning a 2-node cluster, but 
> probably around half of the time on only one of the nodes.  I haven't been 
> able to reproduce this error with DSC 2.0.9, and there have been no code or 
> definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed 
> fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-8072:

Attachment: 8072.txt

bq. I can't think of any reason we'd want to send that event upon discovery of 
any dead state, so patch to only send it for live states.

Actually, I can.  In the case of a removal/decom during a partition, this is 
the only way the other side will remove it when the partition heals.  Patch to 
instead filter dead states in ReconnectableSnitchHelper.

> Exception during startup: Unable to gossip with any seeds
> -
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ryan Springer
>Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: 8072.txt, 
> cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, 
> casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
> in either ec2 or locally, an error occurs sometimes with one of the nodes 
> refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
> (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
> MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
> MessagingService.java (line 941) MessagingService has terminated the accept() 
> thread
> This errors does not always occur when provisioning a 2-node cluster, but 
> probably around half of the time on only one of the nodes.  I haven't been 
> able to reproduce this error with DSC 2.0.9, and there have been no code or 
> definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed 
> fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-8072:

Attachment: 8072.txt

> Exception during startup: Unable to gossip with any seeds
> -
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ryan Springer
>Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: 8072.txt, 
> cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, 
> casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
> in either ec2 or locally, an error occurs sometimes with one of the nodes 
> refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
> (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
> MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
> MessagingService.java (line 941) MessagingService has terminated the accept() 
> thread
> This errors does not always occur when provisioning a 2-node cluster, but 
> probably around half of the time on only one of the nodes.  I haven't been 
> able to reproduce this error with DSC 2.0.9, and there have been no code or 
> definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed 
> fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8072) Exception during startup: Unable to gossip with any seeds

2015-04-10 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-8072:

Attachment: (was: 8072.txt)

> Exception during startup: Unable to gossip with any seeds
> -
>
> Key: CASSANDRA-8072
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8072
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Ryan Springer
>Assignee: Brandon Williams
> Fix For: 2.0.15, 2.1.5
>
> Attachments: 8072.txt, 
> cas-dev-dt-01-uw1-cassandra-seed01_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra-seed02_logs.tar.bz2, 
> cas-dev-dt-01-uw1-cassandra02_logs.tar.bz2, 
> casandra-system-log-with-assert-patch.log, trace_logs.tar.bz2
>
>
> When Opscenter 4.1.4 or 5.0.1 tries to provision a 2-node DSC 2.0.10 cluster 
> in either ec2 or locally, an error occurs sometimes with one of the nodes 
> refusing to start C*.  The error in the /var/log/cassandra/system.log is:
> ERROR [main] 2014-10-06 15:54:52,292 CassandraDaemon.java (line 513) 
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1200)
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:444)
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:655)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:609)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:52,326 Gossiper.java 
> (line 1279) Announcing shutdown
>  INFO [StorageServiceShutdownHook] 2014-10-06 15:54:54,326 
> MessagingService.java (line 701) Waiting for messaging service to quiesce
>  INFO [ACCEPT-localhost/127.0.0.1] 2014-10-06 15:54:54,327 
> MessagingService.java (line 941) MessagingService has terminated the accept() 
> thread
> This errors does not always occur when provisioning a 2-node cluster, but 
> probably around half of the time on only one of the nodes.  I haven't been 
> able to reproduce this error with DSC 2.0.9, and there have been no code or 
> definition file changes in Opscenter.
> I can reproduce locally with the above steps.  I'm happy to test any proposed 
> fixes since I'm the only person able to reproduce reliably so far.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-3486) Node Tool command to stop repair

2015-04-10 Thread Carlos Diaz (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490677#comment-14490677
 ] 

Carlos Diaz commented on CASSANDRA-3486:


+1 for this fix.  I run into this issue quite regularly and have no good 
solution other than restarting the cassandra nodes one by one.  However, this 
causes a significant performance impact when I do it. 

> Node Tool command to stop repair
> 
>
> Key: CASSANDRA-3486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3486
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
> Environment: JVM
>Reporter: Vijay
>Assignee: Jason Brown
>Priority: Minor
>  Labels: repair
> Fix For: 2.1.5
>
> Attachments: 0001-stop-repair-3583.patch
>
>
> After CASSANDRA-1740, If the validation compaction is stopped then the repair 
> will hang. This ticket will allow users to kill the original repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)