[ https://issues.apache.org/jira/browse/CASSANDRA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams resolved CASSANDRA-4373. ----------------------------------------- Resolution: Not A Problem Closing since this is working as intended and any solution would be incorrect and break other things. "Don't do that" is the right way to handle this. > Gossip can surreptitiously mark a node UP twice without marking it DOWN > ----------------------------------------------------------------------- > > Key: CASSANDRA-4373 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4373 > Project: Cassandra > Issue Type: Bug > Reporter: Brandon Williams > Assignee: Brandon Williams > Fix For: 1.1.2 > > > As evidenced by dtests: > {noformat} > INFO [GossipStage:1] 2012-06-25 17:19:21,999 Gossiper.java (line 770) Node > /127.0.0.2 has restarted, now UP > INFO [GossipStage:1] 2012-06-25 17:19:22,000 Gossiper.java (line 738) > InetAddress /127.0.0.2 is now UP > INFO [GossipStage:1] 2012-06-25 17:19:22,001 StorageService.java (line 1103) > Node /127.0.0.2 state jump to normal > INFO [GossipStage:1] 2012-06-25 17:19:22,002 Gossiper.java (line 770) Node > /127.0.0.3 has restarted, now UP > INFO [GossipStage:1] 2012-06-25 17:19:22,004 Gossiper.java (line 738) > InetAddress /127.0.0.3 is now UP > INFO [GossipStage:1] 2012-06-25 17:19:22,005 StorageService.java (line 1103) > Node /127.0.0.3 state jump to normal > INFO [RMI TCP Connection(2)-50.57.224.92] 2012-06-25 17:19:24,809 > StorageService.java (line 1933) Starting repair command #1, repairing 3 > ranges. > INFO [AntiEntropySessions:1] 2012-06-25 17:19:24,818 AntiEntropyService.java > (line 620) [repair #d21b8bd0-bf13-11e1-0000-fe8ebeead9ff] new session: will > sync /127.0.0.1, /127.0.0.2, /127.0.0.3 on range > (Token(bytes[00]),Token(bytes[0113427455640312821154458202477256070484])] for > ks.[cf] > INFO [AntiEntropySessions:1] 2012-06-25 17:19:24,823 AntiEntropyService.java > (line 825) [repair #d21b8bd0-bf13-11e1-0000-fe8ebeead9ff] requesting merkle > trees for cf (to [/127.0.0.2, /127.0.0.3, /127.0.0.1]) > INFO [GossipStage:1] 2012-06-25 17:19:24,925 Gossiper.java (line 770) Node > /127.0.0.3 has restarted, now UP > INFO [GossipStage:1] 2012-06-25 17:19:24,926 Gossiper.java (line 738) > InetAddress /127.0.0.3 is now UP > INFO [GossipStage:1] 2012-06-25 17:19:24,926 StorageService.java (line 1103) > Node /127.0.0.3 state jump to normal > ERROR [AntiEntropySessions:1] 2012-06-25 17:19:24,927 AntiEntropyService.java > (line 670) [repair #d21b8bd0-bf13-11e1-0000-fe8ebeead9ff] session completed > with the following error > java.io.IOException: Endpoint /127.0.0.3 died > {noformat} > It appears that given nodes X, Y, and Z, X sees Z as up via Y even though Z > is still down, but the FD does not ever mark it down. Later when Z actually > does come up, this triggers another handleMajorStateChange as a restart, > which causes an onRestart event, which in turn fails the repair even though > it succeeds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira