[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261468#comment-14261468 ] Tyler Hobbs commented on CASSANDRA-8245: bq. What's odd is that the cassandra process continues running despite the OutOfMemory exception. You'd expect it to exit. bq. Prior to getting OutOfMemory, I notice that such nodes are slow in responding to commands and queries (e.g., jmx). OOMs are handled in a better (more consistent) way with CASSANDRA-7507. That ticket may answer a few questions for you. > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Assignee: Brandon Williams >Priority: Minor > Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, > stack5.txt > > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261238#comment-14261238 ] Donald Smith commented on CASSANDRA-8245: - We're getting a similar increase in the number of pending Gossip stage tasks, followed by OutOfMemory. This happens once a day or so on some node of our 38 node DC. Other nodes have increases in pending Gossip stage tasks but they recover. This is with C* 2.0.11.We have two other DCs. ntpd is running on all nodes. But all nodes on one DC are down now. > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Assignee: Brandon Williams >Priority: Minor > Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, > stack5.txt > > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214638#comment-14214638 ] Oleg Poleshuk commented on CASSANDRA-8245: -- There are 6 nodes total: 3 per DC. Time difference is 2 seconds between 2 DCs. Within one DC it's <1 second. Anyway, upgraded to 2.1 and the error is gone. I would recommend to add an additional debug info to FailureDetector, hostname should be very useful that just "Ignoring interval time " > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Assignee: Brandon Williams >Priority: Minor > Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, > stack5.txt > > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212700#comment-14212700 ] Brandon Williams commented on CASSANDRA-8245: - All of these dumps show the nodes blocked on flushing the system table as a result of processing a new node joining, which means that a lot of nodes must be joining. The interesting message here is: {noformat} DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) Ignoring interval time of 2085963047 {noformat} Which means it didn't see a heartbeat from that node for *241 days*, which almost certainly points to a system clock problem of some sort. I strongly suspect this is environmental, not a bug in Cassandra. > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Assignee: Brandon Williams >Priority: Minor > Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, > stack5.txt > > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194735#comment-14194735 ] Michael Shuler commented on CASSANDRA-8245: --- To me, this appears to be symptomatic of something more basic to cluster/node configuration, thus minor. Critical would indicate a reproducible but causing something like data loss. Provide the steps to reproduce, along with verifiable data loss, and we can talk about the severity :) It's just a setting that can be changed to prioritize bugs for developer eyes. > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Priority: Minor > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194729#comment-14194729 ] Oleg Poleshuk commented on CASSANDRA-8245: -- Dates go from August because it went down several times. Probably, I grepped previous event. Datacenters communicate properly. In any case, isn't it supposed to survive DC link failure?.. > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Priority: Minor > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194726#comment-14194726 ] Oleg Poleshuk commented on CASSANDRA-8245: -- Why did it go to Minor? Datacenter dies periodically because of Gossiper memory leaks, this is not Minor as to me... > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Priority: Minor > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194725#comment-14194725 ] Oleg Poleshuk commented on CASSANDRA-8245: -- I will provide a thread dump when it dies next time. > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Priority: Minor > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194724#comment-14194724 ] Oleg Poleshuk commented on CASSANDRA-8245: -- Processors => 8 Memory => 15.58 GB processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz stepping: 1 cpu MHz : 2400.000 cache size : 20480 KB > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Priority: Minor > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194715#comment-14194715 ] Michael Shuler commented on CASSANDRA-8245: --- Additionally, those dates go from 2014-08-12 to 2014-10-06 - are the datacenters communicating properly at all? > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Priority: Minor > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194711#comment-14194711 ] Michael Shuler commented on CASSANDRA-8245: --- I asked the devs about this ticket, and a thread dump to see what's blocking gossip would be very helpful with troubleshooting this. > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Priority: Minor > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194704#comment-14194704 ] Michael Shuler commented on CASSANDRA-8245: --- What are the hardware specs of your nodes and the memory-related configurations? Nodes that die due to heap exhaustion aren't really a datacenter issue, but a more basic node-level setup problem that should be addressed with larger machines and configuration tuning, typically. > Cassandra nodes periodically die in 2-DC configuration > -- > > Key: CASSANDRA-8245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Scientific Linux release 6.5 > java version "1.7.0_51" > Cassandra 2.0.9 >Reporter: Oleg Poleshuk >Priority: Critical > > We have 2 DCs with 3 nodes in each. > Second DC periodically has 1-2 nodes down. > Looks like it looses connectivity with another nodes and then Gossiper starts > to accumulate tasks until Cassandra dies with OOM. > WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting > live ratio to maximum of 64.0 instead of Infinity > WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip > stage has 1 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip > stage has 4 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip > stage has 8 pending tasks; skipping status check (no nodes will be marked > down) > WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip > stage has 11 pending tasks; skipping status check (no nodes will be marked > down) > ... > WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip > stage has 1014764 pending tasks; skipping status check (no nodes will be > marked down) > WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) > Unexpected exception in the selector loop. > java.lang.OutOfMemoryError: Java heap space > Also those lines but not sure it is relevant: > DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) > Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)