[jira] [Comment Edited] (CASSANDRA-7307) New nodes mark dead nodes as up for 10 minutes

Richard Low (JIRA) Tue, 27 May 2014 16:10:33 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010478#comment-14010478
 ]


Richard Low edited comment on CASSANDRA-7307 at 5/27/14 11:09 PM:
------------------------------------------------------------------

The 'Cannnot (sic) replace a live node' error came about 1 minute after boot, 
even with a 5 minute RING_DELAY. So I don't think a higher RING_DELAY will work:

INFO [main] 2014-05-23 19:51:16,934 CassandraDaemon.java (line 119) Logging 
initialized
INFO [main] 2014-05-23 19:51:20,038 StorageService.java (line 105) Overriding 
RING_DELAY to 300000ms
ERROR [main] 2014-05-23 19:52:25,189 CassandraDaemon.java (line 464) Exception 
encountered during startup
java.lang.UnsupportedOperationException: Cannnot replace a live node... 

I was surprised by this, I expected it to wait for RING_DELAY before getting 
host replacement info. Is this expected behaviour?

(These logs are from 1.2.15)


was (Author: rlow):
The 'Cannnot (sic) replace a live node' error came about 1 minute after boot, 
even with a 5 minute RING_DELAY. So I don't think a higher RING_DELAY will work:

INFO [main] 2014-05-23 19:51:16,934 CassandraDaemon.java (line 119) Logging 
initialized
INFO [main] 2014-05-23 19:51:20,038 StorageService.java (line 105) Overriding 
RING_DELAY to 300000ms
ERROR [main] 2014-05-23 19:52:25,189 CassandraDaemon.java (line 464) Exception 
encountered during startup
java.lang.UnsupportedOperationException: Cannnot replace a live node... 

I was surprised by this, I expected it to wait for RING_DELAY before getting 
host replacement info. Is this expected behaviour?

> New nodes mark dead nodes as up for 10 minutes
> ----------------------------------------------
>
>                 Key: CASSANDRA-7307
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7307
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Richard Low
>            Assignee: Brandon Williams
>             Fix For: 1.2.17
>
>
> When doing a node replacement when other nodes are down we see the down nodes 
> marked as up for about 10 minutes. This means requests are routed to the dead 
> nodes causing timeouts. It also means replacing a node when multiple nodes 
> from a replica set is extremely difficult - the node usually tries to stream 
> from a dead node and the replacement fails.
> This isn't limited to host replacement. I did a simple test:
> 1. Create a 2 node cluster
> 2. Kill node 2
> 3. Start a 3rd node with a unique token (I used auto_bootstrap=false but I 
> don't think this is significant)
> The 3rd node lists node 2 (127.0.0.2) as up for almost 10 minutes:
> {code}
> INFO [main] 2014-05-27 14:28:24,753 CassandraDaemon.java (line 119) Logging 
> initialized
> INFO [GossipStage:1] 2014-05-27 14:28:31,492 Gossiper.java (line 843) Node 
> /127.0.0.2 is now part of the cluster
> INFO [GossipStage:1] 2014-05-27 14:28:31,495 Gossiper.java (line 809) 
> InetAddress /127.0.0.2 is now UP
> INFO [GossipTasks:1] 2014-05-27 14:37:44,526 Gossiper.java (line 823) 
> InetAddress /127.0.0.2 is now DOWN
> {code}
> I reproduced on 1.2.15 and 1.2.16.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-7307) New nodes mark dead nodes as up for 10 minutes

Reply via email to