[ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh McKenzie updated CASSANDRA-13308: -------------------------------------- Bug Category: Parent values: Availability(12983)Level 1 values: Unavailable(12994) > Gossip breaks, Hint files not being deleted on nodetool decommission > -------------------------------------------------------------------- > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints, Legacy/Streaming and Messaging > Environment: Using Cassandra version 3.0.9 > Reporter: Arijit Banerjee > Assignee: Jeff Jirsa > Priority: Normal > Fix For: 3.0.14, 3.11.0, 4.0 > > Attachments: 28207.stack, logs, logs_decommissioned_node > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a > ton of hints. Start Cassandra on the node and immediately run "nodetool > decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other > nodes do not seem to see this message. "nodetool status" shows the > decommissioned node in state "UL" on all other nodes (it is also present in > system.peers), and Cassandra logs show that gossip tasks on nodes are not > proceeding (number of pending tasks keeps increasing). Jstack suggests that a > gossip task is blocked on hints dispatch (I can provide traces if this is not > obvious). Because the cluster is large and there are a lot of hints, this is > taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint > files for the decommissioned node. Documentation seems to suggest that these > hints should be deleted during "nodetool decommission", but it does not seem > to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, > the hints dispatcher threads throw a bunch of exceptions and the > decommissioned node is now in state "DL" (perhaps it missed some gossip > messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the > node remains in the peers table). In fact, after this point the > decommissioned node is in state "DN" -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org