[jira] [Commented] (CASSANDRA-18555) A new nodetool/JMX command that tells whether node's decommission failed or not

Stefan Miklosovic (Jira) Wed, 14 Jun 2023 09:01:08 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732601#comment-17732601
 ]


Stefan Miklosovic commented on CASSANDRA-18555:
-----------------------------------------------

[~chovatia.jayd...@gmail.com] yeah I think we need to return back to that and 
tweak few things around + propagate it to nodetool info. I do not see that in 
your PR.

I will keep you as a co-author to reflect your ideas you contributed.

The PR is here [https://github.com/apache/cassandra/pull/2390/files]

It looks like this in nodetool info:
{code:java}
Bootstrap state        : COMPLETED
Decommission failed    : false
{code}
When decommission failed it will be:
{code:java}
Bootstrap state        : COMPLETED
Decommission failed    : true
{code}
and when it is decommissioned it will be:
{code:java}
Bootstrap state        : DECOMMISSIONED
Decommission failed    : false
{code}
There is also interesting cornercase when we try to decommission a node which 
was already decommissioned successfully. That should just return and log and no 
additional logic should be executed.

There is also "isDecommissionFailed" on StorageServiceMBean added to query this 
via JMX in general.

I will kindly re-assign you to this ticket and ask you to see if it works for 
you as you are the person who are going to use this feature primarily. 

> A new nodetool/JMX command that tells whether node's decommission failed or 
> not
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18555
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18555
>             Project: Cassandra
>          Issue Type: Task
>          Components: Observability/JMX
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently, when a node is being decommissioned and if any failure happens, 
> then an exception is thrown back to the caller.
> But Cassandra's decommission takes considerable time ranging from minutes to 
> hours to days. There are various scenarios in that the caller may need to 
> probe the status again:
>  * The caller times out
>  * It is not possible to keep the caller hanging for such a long time
> And If the caller does not know what happened internally, then it cannot 
> retry, etc., leading to other issues.
> So, in this ticket, I am going to add a new nodetool/JMX command that can be 
> invoked by the caller anytime, and it will return the correct status.
> It might look like a smaller change, but when we need to operate Cassandra at 
> scale in a large-scale fleet, then this becomes a bottleneck and require 
> constant operator intervention.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18555) A new nodetool/JMX command that tells whether node's decommission failed or not

Reply via email to