[ https://issues.apache.org/jira/browse/CASSANDRA-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732552#comment-17732552 ]
Stefan Miklosovic commented on CASSANDRA-18555: ----------------------------------------------- Aha ... well, to put my way of thinking under scrutiny, so let's imagine that a decommission fails, we kill the node and we start it again (that scenario itself is quite improbable but anyway). My point is that "this is dangerous so we need to save the state". OK, so we save the state, we see that the previous decommission has failed and now what? Like ... what are we going to do about that? What other possible course of action we could take when we see a node has failed to decommission but to try to decommission it again? So the fact that it failed to decommission _and to persist this state until a possible restart_ is kind of useless. If decommission means to be repeatable if it fails in the middle, as you suggested, that knowing this across restarts is not helpful. Whole decommissioning logic is basically about two methods in StorageService: startLeaving() and unbootstrap(). startLeaving just gossips that status will be LEAVING so other nodes know this. unbootstrap is repairing some paxos topology, starts batchlog replay, hints replay and it streams data to other nodes, all of which seems to be repeatable without issues. I do not see any dtest which would test failed decommission so we would see it is indeed repeatable operation. I check what it would take to gossip unsuccessful decommission operation. I dont have a clue how complex that would be but my gut feeling is that it wont be so easy. Let's see. > A new nodetool/JMX command that tells whether node's decommission failed or > not > ------------------------------------------------------------------------------- > > Key: CASSANDRA-18555 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18555 > Project: Cassandra > Issue Type: Task > Components: Observability/JMX > Reporter: Jaydeepkumar Chovatia > Assignee: Jaydeepkumar Chovatia > Priority: Normal > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, when a node is being decommissioned and if any failure happens, > then an exception is thrown back to the caller. > But Cassandra's decommission takes considerable time ranging from minutes to > hours to days. There are various scenarios in that the caller may need to > probe the status again: > * The caller times out > * It is not possible to keep the caller hanging for such a long time > And If the caller does not know what happened internally, then it cannot > retry, etc., leading to other issues. > So, in this ticket, I am going to add a new nodetool/JMX command that can be > invoked by the caller anytime, and it will return the correct status. > It might look like a smaller change, but when we need to operate Cassandra at > scale in a large-scale fleet, then this becomes a bottleneck and require > constant operator intervention. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org