[jira] [Updated] (CASSANDRA-13308) Gossip breaks, Hint files not being deleted on nodetool decommission

Josh McKenzie (Jira) Sat, 18 Apr 2020 12:42:22 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Josh McKenzie updated CASSANDRA-13308:
--------------------------------------
    Bug Category: Parent values: Availability(12983)Level 1 values: 
Unavailable(12994)

> Gossip breaks, Hint files not being deleted on nodetool decommission
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-13308
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13308
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Hints, Legacy/Streaming and Messaging
>         Environment: Using Cassandra version 3.0.9
>            Reporter: Arijit Banerjee
>            Assignee: Jeff Jirsa
>            Priority: Normal
>             Fix For: 3.0.14, 3.11.0, 4.0
>
>         Attachments: 28207.stack, logs, logs_decommissioned_node
>
>
> How to reproduce the issue I'm seeing:
> Shut down Cassandra on one node of the cluster and wait until we accumulate a 
> ton of hints. Start Cassandra on the node and immediately run "nodetool 
> decommission" on it.
> The node streams its replicas and marks itself as DECOMMISSIONED, but other 
> nodes do not seem to see this message. "nodetool status" shows the 
> decommissioned node in state "UL" on all other nodes (it is also present in 
> system.peers), and Cassandra logs show that gossip tasks on nodes are not 
> proceeding (number of pending tasks keeps increasing). Jstack suggests that a 
> gossip task is blocked on hints dispatch (I can provide traces if this is not 
> obvious). Because the cluster is large and there are a lot of hints, this is 
> taking a while. 
> On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint 
> files for the decommissioned node. Documentation seems to suggest that these 
> hints should be deleted during "nodetool decommission", but it does not seem 
> to be the case here. This is the bug being reported.
> To recover from this scenario, if I manually delete hint files on the nodes, 
> the hints dispatcher threads throw a bunch of exceptions and the 
> decommissioned node is now in state "DL" (perhaps it missed some gossip 
> messages?). The node is still in my "system.peers" table
> Restarting Cassandra on all nodes after this step does not fix the issue (the 
> node remains in the peers table). In fact, after this point the 
> decommissioned node is in state "DN"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13308) Gossip breaks, Hint files not being deleted on nodetool decommission

Reply via email to