[ 
https://issues.apache.org/jira/browse/CASSANDRA-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729613#comment-17729613
 ] 

Stefan Miklosovic edited comment on CASSANDRA-18555 at 6/6/23 6:58 AM:
-----------------------------------------------------------------------

In general, we try to move away from the introduction of any new nodetool 
commands and we try to do it via cql instead. However, here it is a little bit 
problematic as CQL will probably not be available anymore upon decomission as 
it would be already turned off so we can not connect to it anymore and only JMX 
is available.

That being said, could not we somehow reuse already existing nodetool command 
and we could put this information into it instead of creating brand new 
nodetool command? Something like "info", "status" or "describecluster"? 

If there are various statuses in "status" command like UJ, UN, UL ... can not 
be there some status which reflects the fact that such node has failed to 
decommission itself?

The advantage of seeing it in "nodetool status" is that you would know _all 
nodes which failed to decommission_ instead of going to that particular node 
asking its decommission status via the proposed command.


was (Author: smiklosovic):
In general, we try to move away from the introduction of any new nodetool 
commands and we try to do it via cql instead. However, here it is a little bit 
problematic as CQL will probably not be available anymore upon decomission as 
it would be already turned off so we can not connect to it anymore and only JMX 
is available.

That being said, could not we somehow reuse already existing nodetool command 
and we could put this information into it instead of creating brand new 
nodetool command? Something like "info", "status" or "describecluster"? 

If there are various statuses in "status" command like UJ, UN, UL ... can not 
be there some status which reflects the fact that such node has failed to 
decommission itself?

> A new nodetool/JMX command that tells whether node's decommission failed or 
> not
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18555
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18555
>             Project: Cassandra
>          Issue Type: Task
>          Components: Observability/JMX
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Normal
>
> Currently, when a node is being decommissioned and if any failure happens, 
> then an exception is thrown back to the caller.
> But Cassandra's decommission takes considerable time ranging from minutes to 
> hours to days. There are various scenarios in that the caller may need to 
> probe the status again:
>  * The caller times out
>  * It is not possible to keep the caller hanging for such a long time
> And If the caller does not know what happened internally, then it cannot 
> retry, etc., leading to other issues.
> So, in this ticket, I am going to add a new nodetool/JMX command that can be 
> invoked by the caller anytime, and it will return the correct status.
> It might look like a smaller change, but when we need to operate Cassandra at 
> scale in a large-scale fleet, then this becomes a bottleneck and require 
> constant operator intervention.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to