[ 
https://issues.apache.org/jira/browse/CASSANDRA-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Shuler updated CASSANDRA-13144:
---------------------------------------
    Fix Version/s:     (was: 2.1.2)
                   2.1.x

> Decommissioned nodes show as DOWN in Cassandra versions 2.1.12 - 2.1.16
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-13144
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13144
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>         Environment: Centos 6
> Java 8
>            Reporter: sai k potturi
>            Priority: Major
>             Fix For: 2.1.x
>
>
> In the Cassandra versions 2.1.11 - 2.1.16, after we decommission a node or 
> datacenter, we observe the decommissioned nodes marked as DOWN in the cluster 
> when you do a "nodetool describecluster". The nodes however do not show up in 
> the "nodetool status" command.
>    The decommissioned node also does not show up in the "system_peers" table 
> on the nodes.
> The workaround we follow is rolling restart of the cluster, which removes the 
> decommissioned nodes from the "UNREACHABLE STATE", and shows the actual state 
> of the cluster. The workaround is tedious for huge clusters.
> We also verified the decommission process in CCM tool, and observed the same 
> issue for clusters with versions from 2.1.12 to 2.1.16. The issue was not 
> observed in versions prior to or later than the ones mentioned above.
> Below are the observed logs from the versions without the bug, and with the 
> bug.
> Cassandra 2.1.1 Logs showing the decommissioned node :
> 2017-01-19 20:18:56,415 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval 
> time of 2049943233 for /X.X.X.X
> 2017-01-19 20:18:56,416 [GossipStage:1] DEBUG StorageService Node /X.X.X.X 
> state left, tokens [ 59353109817657926242901533144729725259, 
> 60254520910109313597677907197875221475, 
> 75698727618038614819889933974570742305, 
> 84508739091270910297310401957975430578]
> 2017-01-19 20:18:56,416 [GossipStage:1] DEBUG Gossiper adding expire time for 
> endpoint : /X.X.X.X (1485116334088)
> 2017-01-19 20:18:56,417 [GossipStage:1] INFO StorageService Removing tokens 
> [100434964734820719895982857900842892337, 
> 114144647582686041354301802358217767299, 
> 132090888860517964702932350041942412177, 
> 138409460913927199437556572481804704749] for /X.X.X.X
> 2017-01-19 20:18:56,418 [HintedHandoff:3] INFO HintedHandOffManager Deleting 
> any stored hints for /X.X.X.X
> 2017-01-19 20:18:56,424 [GossipStage:1] DEBUG MessagingService Resetting 
> version for /X.X.X.X
> 2017-01-19 20:18:56,424 [GossipStage:1] DEBUG Gossiper removing endpoint 
> /X.X.X.X
> 2017-01-19 20:18:56,437 [GossipStage:1] DEBUG StorageService Ignoring state 
> change for dead or unknown endpoint: /X.X.X.X
> 2017-01-19 20:19:02,022 [WRITE-/X.X.X.X] DEBUG OutboundTcpConnection 
> attempting to connect to /X.X.X.X
> 2017-01-19 20:19:02,023 [HANDSHAKE-/X.X.X.X] INFO OutboundTcpConnection 
> Handshaking version with /X.X.X.X
> 2017-01-19 20:19:02,023 [WRITE-/X.X.X.X] DEBUG MessagingService Setting 
> version 7 for /X.X.X.X
> 2017-01-19 20:19:08,096 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval 
> time of 2074454222 for /X.X.X.X
> 2017-01-19 20:19:54,407 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval 
> time of 4302985797 for /X.X.X.X
> 2017-01-19 20:19:57,405 [GossipTasks:1] DEBUG Gossiper 60000 elapsed, 
> /X.X.X.X gossip quarantine over
> 2017-01-19 20:19:57,455 [GossipStage:1] DEBUG ArrivalWindow Ignoring interval 
> time of 3047826501 for /X.X.X.X
> 2017-01-19 20:19:57,455 [GossipStage:1] DEBUG StorageService Ignoring state 
> change for dead or unknown endpoint: /X.X.X.X
> Cassandra 2.1.16 Logs showing the decommissioned node : (The logs in 2.1.16 
> show the same as 2.1.1 upto "DEBUG Gossiper 60000 elapsed, /X.X.X.X gossip 
> quarantine over", and then is followed by "NODE is now DOWN"
> 017-01-19 19:52:23,687 [GossipStage:1] DEBUG  StorageService.java:1883 - Node 
> /X.X.X.X state left, tokens [-1112888759032625467, -228773855963737699, 
> -311455042375
> 4381391, -4848625944949064281, -6920961603460018610, -8566729719076824066, 
> 1611098831406674636, 7278843689020594771, 7565410054791352413, 9166885764, 
> 8654747784805453046]
> 2017-01-19 19:52:23,688 [GossipStage:1] DEBUG  Gossiper.java:1520 - adding 
> expire time for endpoint : /X.X.X.X (1485114743567)
> 2017-01-19 19:52:23,688 [GossipStage:1] INFO   StorageService.java:1965 - 
> Removing tokens [-1112888759032625467, -228773855963737699, 
> -3114550423754381391, -48486259449
> 49064281, -6920961603460018610, 5690722015779071557, 6202373691525063547, 
> 7191120402564284381, 7278843689020594771, 7565410054791352413, 
> 8524200089166885764, 865474778
> 4805453046] for /X.X.X.X
> 2017-01-19 19:52:23,689 [HintedHandoffManager:1] INFO   
> HintedHandOffManager.java:230 - Deleting any stored hints for /X.X.X.X
> 2017-01-19 19:52:23,689 [GossipStage:1] DEBUG  MessagingService.java:840 - 
> Resetting version for /X.X.X.X
> 2017-01-19 19:52:23,690 [GossipStage:1] DEBUG  Gossiper.java:417 - removing 
> endpoint /X.X.X.X
> 2017-01-19 19:52:23,691 [GossipStage:1] DEBUG  StorageService.java:1552 - 
> Ignoring state change for dead or unknown endpoint: /X.X.X.X
> 2017-01-19 19:52:31,617 [MessagingService-Outgoing-/X.X.X.X] DEBUG  
> OutboundTcpConnection.java:372 - attempting to connect to /X.X.X.X
> 2017-01-19 19:52:31,618 [HANDSHAKE-/X.X.X.X] INFO   
> OutboundTcpConnection.java:488 - Handshaking version with /X.X.X.X
> 2017-01-19 19:52:31,619 [MessagingService-Outgoing-/X.X.X.X] DEBUG  
> MessagingService.java:826 - Setting version 8 for /X.X.X.X
> 2017-01-19 19:53:09,699 [GossipStage:1] DEBUG  FailureDetector.java:423 - 
> Ignoring interval time of 4001002966 for /X.X.X.X
> 2017-01-19 19:53:13,910 [GossipStage:1] DEBUG  FailureDetector.java:423 - 
> Ignoring interval time of 4210611081 for /X.X.X.X
> 2017-01-19 19:53:19,914 [GossipStage:1] DEBUG  FailureDetector.java:423 - 
> Ignoring interval time of 6004119075 for /X.X.X.X
> 2017-01-19 19:53:23,702 [GossipTasks:1] DEBUG  Gossiper.java:795 - 60000 
> elapsed, /X.X.X.X gossip quarantine over
> 2017-01-19 19:53:23,985 [GossipStage:1] DEBUG  StorageService.java:1552 - 
> Ignoring state change for dead or unknown endpoint: /X.X.X.X
> 2017-01-19 19:53:26,223 [GossipStage:1] DEBUG  FailureDetector.java:423 - 
> Ignoring interval time of 6309159352 for /X.X.X.X
> 2017-01-19 19:53:50,709 [GossipTasks:1] DEBUG  Gossiper.java:336 - Convicting 
> /X.X.X.X with status LEFT - alive true
> 2017-01-19 19:53:50,709 [GossipTasks:1] INFO   Gossiper.java:1008 - 
> InetAddress /X.X.X.X is now DOWN
> 2017-01-19 19:53:50,709 [GossipTasks:1] DEBUG  MessagingService.java:429 - 
> Resetting pool for /X.X.X.X
> 2017-01-19 19:53:51,710 [GossipTasks:1] DEBUG  Gossiper.java:336 - Convicting 
> /X.X.X.X with status LEFT - alive false
> 2017-01-19 19:53:52,710 [GossipTasks:1] DEBUG  Gossiper.java:336 - Convicting 
> /X.X.X.X with status LEFT - alive false
> 2017-01-19 19:53:53,711 [MessagingService-Outgoing-/X.X.X.X] DEBUG  
> OutboundTcpConnection.java:372 - attempting to connect to /X.X.X.X
> 2017-01-19 19:53:53,711 [GossipTasks:1] DEBUG  Gossiper.java:336 - Convicting 
> /X.X.X.X with status LEFT - alive false
> 2017-01-19 19:53:54,711 [GossipTasks:1] DEBUG  Gossiper.java:336 - Convicting 
> /X.X.X.X with status LEFT - alive false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to