[ https://issues.apache.org/jira/browse/CASSANDRA-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Shuler updated CASSANDRA-13144: --------------------------------------- Fix Version/s: (was: 2.1.2) 2.1.x > Decommissioned nodes show as DOWN in Cassandra versions 2.1.12 - 2.1.16 > ----------------------------------------------------------------------- > > Key: CASSANDRA-13144 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13144 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata > Environment: Centos 6 > Java 8 > Reporter: sai k potturi > Priority: Major > Fix For: 2.1.x > > > In the Cassandra versions 2.1.11 - 2.1.16, after we decommission a node or > datacenter, we observe the decommissioned nodes marked as DOWN in the cluster > when you do a "nodetool describecluster". The nodes however do not show up in > the "nodetool status" command. > The decommissioned node also does not show up in the "system_peers" table > on the nodes. > The workaround we follow is rolling restart of the cluster, which removes the > decommissioned nodes from the "UNREACHABLE STATE", and shows the actual state > of the cluster. The workaround is tedious for huge clusters. > We also verified the decommission process in CCM tool, and observed the same > issue for clusters with versions from 2.1.12 to 2.1.16. The issue was not > observed in versions prior to or later than the ones mentioned above. > Below are the observed logs from the versions without the bug, and with the > bug. > Cassandra 2.1.1 Logs showing the decommissioned node : > 2017-01-19 20:18:56,415 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval > time of 2049943233 for /X.X.X.X > 2017-01-19 20:18:56,416 [GossipStage:1] DEBUG StorageService Node /X.X.X.X > state left, tokens [ 59353109817657926242901533144729725259, > 60254520910109313597677907197875221475, > 75698727618038614819889933974570742305, > 84508739091270910297310401957975430578] > 2017-01-19 20:18:56,416 [GossipStage:1] DEBUG Gossiper adding expire time for > endpoint : /X.X.X.X (1485116334088) > 2017-01-19 20:18:56,417 [GossipStage:1] INFO StorageService Removing tokens > [100434964734820719895982857900842892337, > 114144647582686041354301802358217767299, > 132090888860517964702932350041942412177, > 138409460913927199437556572481804704749] for /X.X.X.X > 2017-01-19 20:18:56,418 [HintedHandoff:3] INFO HintedHandOffManager Deleting > any stored hints for /X.X.X.X > 2017-01-19 20:18:56,424 [GossipStage:1] DEBUG MessagingService Resetting > version for /X.X.X.X > 2017-01-19 20:18:56,424 [GossipStage:1] DEBUG Gossiper removing endpoint > /X.X.X.X > 2017-01-19 20:18:56,437 [GossipStage:1] DEBUG StorageService Ignoring state > change for dead or unknown endpoint: /X.X.X.X > 2017-01-19 20:19:02,022 [WRITE-/X.X.X.X] DEBUG OutboundTcpConnection > attempting to connect to /X.X.X.X > 2017-01-19 20:19:02,023 [HANDSHAKE-/X.X.X.X] INFO OutboundTcpConnection > Handshaking version with /X.X.X.X > 2017-01-19 20:19:02,023 [WRITE-/X.X.X.X] DEBUG MessagingService Setting > version 7 for /X.X.X.X > 2017-01-19 20:19:08,096 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval > time of 2074454222 for /X.X.X.X > 2017-01-19 20:19:54,407 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval > time of 4302985797 for /X.X.X.X > 2017-01-19 20:19:57,405 [GossipTasks:1] DEBUG Gossiper 60000 elapsed, > /X.X.X.X gossip quarantine over > 2017-01-19 20:19:57,455 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval > time of 3047826501 for /X.X.X.X > 2017-01-19 20:19:57,455 [GossipStage:1] DEBUG StorageService Ignoring state > change for dead or unknown endpoint: /X.X.X.X > Cassandra 2.1.16 Logs showing the decommissioned node : (The logs in 2.1.16 > show the same as 2.1.1 upto "DEBUG Gossiper 60000 elapsed, /X.X.X.X gossip > quarantine over", and then is followed by "NODE is now DOWN" > 017-01-19 19:52:23,687 [GossipStage:1] DEBUG StorageService.java:1883 - Node > /X.X.X.X state left, tokens [-1112888759032625467, -228773855963737699, > -311455042375 > 4381391, -4848625944949064281, -6920961603460018610, -8566729719076824066, > 1611098831406674636, 7278843689020594771, 7565410054791352413, 9166885764, > 8654747784805453046] > 2017-01-19 19:52:23,688 [GossipStage:1] DEBUG Gossiper.java:1520 - adding > expire time for endpoint : /X.X.X.X (1485114743567) > 2017-01-19 19:52:23,688 [GossipStage:1] INFO StorageService.java:1965 - > Removing tokens [-1112888759032625467, -228773855963737699, > -3114550423754381391, -48486259449 > 49064281, -6920961603460018610, 5690722015779071557, 6202373691525063547, > 7191120402564284381, 7278843689020594771, 7565410054791352413, > 8524200089166885764, 865474778 > 4805453046] for /X.X.X.X > 2017-01-19 19:52:23,689 [HintedHandoffManager:1] INFO > HintedHandOffManager.java:230 - Deleting any stored hints for /X.X.X.X > 2017-01-19 19:52:23,689 [GossipStage:1] DEBUG MessagingService.java:840 - > Resetting version for /X.X.X.X > 2017-01-19 19:52:23,690 [GossipStage:1] DEBUG Gossiper.java:417 - removing > endpoint /X.X.X.X > 2017-01-19 19:52:23,691 [GossipStage:1] DEBUG StorageService.java:1552 - > Ignoring state change for dead or unknown endpoint: /X.X.X.X > 2017-01-19 19:52:31,617 [MessagingService-Outgoing-/X.X.X.X] DEBUG > OutboundTcpConnection.java:372 - attempting to connect to /X.X.X.X > 2017-01-19 19:52:31,618 [HANDSHAKE-/X.X.X.X] INFO > OutboundTcpConnection.java:488 - Handshaking version with /X.X.X.X > 2017-01-19 19:52:31,619 [MessagingService-Outgoing-/X.X.X.X] DEBUG > MessagingService.java:826 - Setting version 8 for /X.X.X.X > 2017-01-19 19:53:09,699 [GossipStage:1] DEBUG FailureDetector.java:423 - > Ignoring interval time of 4001002966 for /X.X.X.X > 2017-01-19 19:53:13,910 [GossipStage:1] DEBUG FailureDetector.java:423 - > Ignoring interval time of 4210611081 for /X.X.X.X > 2017-01-19 19:53:19,914 [GossipStage:1] DEBUG FailureDetector.java:423 - > Ignoring interval time of 6004119075 for /X.X.X.X > 2017-01-19 19:53:23,702 [GossipTasks:1] DEBUG Gossiper.java:795 - 60000 > elapsed, /X.X.X.X gossip quarantine over > 2017-01-19 19:53:23,985 [GossipStage:1] DEBUG StorageService.java:1552 - > Ignoring state change for dead or unknown endpoint: /X.X.X.X > 2017-01-19 19:53:26,223 [GossipStage:1] DEBUG FailureDetector.java:423 - > Ignoring interval time of 6309159352 for /X.X.X.X > 2017-01-19 19:53:50,709 [GossipTasks:1] DEBUG Gossiper.java:336 - Convicting > /X.X.X.X with status LEFT - alive true > 2017-01-19 19:53:50,709 [GossipTasks:1] INFO Gossiper.java:1008 - > InetAddress /X.X.X.X is now DOWN > 2017-01-19 19:53:50,709 [GossipTasks:1] DEBUG MessagingService.java:429 - > Resetting pool for /X.X.X.X > 2017-01-19 19:53:51,710 [GossipTasks:1] DEBUG Gossiper.java:336 - Convicting > /X.X.X.X with status LEFT - alive false > 2017-01-19 19:53:52,710 [GossipTasks:1] DEBUG Gossiper.java:336 - Convicting > /X.X.X.X with status LEFT - alive false > 2017-01-19 19:53:53,711 [MessagingService-Outgoing-/X.X.X.X] DEBUG > OutboundTcpConnection.java:372 - attempting to connect to /X.X.X.X > 2017-01-19 19:53:53,711 [GossipTasks:1] DEBUG Gossiper.java:336 - Convicting > /X.X.X.X with status LEFT - alive false > 2017-01-19 19:53:54,711 [GossipTasks:1] DEBUG Gossiper.java:336 - Convicting > /X.X.X.X with status LEFT - alive false -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org