[ https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068186#comment-15068186 ]
Didier commented on CASSANDRA-10371: ------------------------------------ Hi Stefania, You are perfectly right ! I just fix my issue when you wrote your answer. My problem is that in fact there is a lot of nodes impacted in this mess (not just one : Multi DC Europe / US). I have setup these entries in the log4j-server.properties in one node : {code} log4j.logger.org.apache.cassandra.gms.GossipDigestSynVerbHandler=TRACE log4j.logger.org.apache.cassandra.gms.FailureDetector=TRACE {/code} With this trick I have found the culpurit nodes with a simple tail in the system.log : I just run a tail -f system.log | grep "TRACE" | grep -A 10 -B 10 "192.168.136.28" {code} TRACE [GossipStage:1] 2015-12-22 14:25:10,262 GossipDigestSynVerbHandler.java (line 40) Received a GossipDigestSynMessage from /10.0.2.110 TRACE [GossipStage:1] 2015-12-22 14:25:10,262 GossipDigestSynVerbHandler.java (line 71) Gossip syn digests are : /10.10.102.97:1448271725:7650177 /10.10.2.23:1450793863:1377 /10.0.102.190:1448275278:7636527 /10.0.2.36:1450792729:4816 /192.168.136.28:1449485228:258388 {code} Every time I found a match with a phantom node IP in the Gossip syn digests, I run this on the affected node (in this example 10.0.2.110) : {code} nodetool drain && /etc/init.d/cassandra restart {/code} After some nodes (15 nodes), I check if I get some entries in my system.log with the phantom nodes ... and voila ! No more phantom nodes. Thanks for your help ;) Didier > Decommissioned nodes can remain in gossip > ----------------------------------------- > > Key: CASSANDRA-10371 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10371 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata > Reporter: Brandon Williams > Assignee: Stefania > Priority: Minor > > This may apply to other dead states as well. Dead states should be expired > after 3 days. In the case of decom we attach a timestamp to let the other > nodes know when it should be expired. It has been observed that sometimes a > subset of nodes in the cluster never expire the state, and through heap > analysis of these nodes it is revealed that the epstate.isAlive check returns > true when it should return false, which would allow the state to be evicted. > This may have been affected by CASSANDRA-8336. -- This message was sent by Atlassian JIRA (v6.3.4#6332)