[ 
https://issues.apache.org/jira/browse/CASSANDRA-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732589#comment-17732589
 ] 

Jaydeepkumar Chovatia edited comment on CASSANDRA-18555 at 6/14/23 3:35 PM:
----------------------------------------------------------------------------

[~smiklosovic] [~brandon.williams]  

If you look at the original Pull request, then it does not persist in any state 
and provides a JMX endpoint. It has been tested and working in my production 
environment for more than a year now without any issues.  Please look at this 
and let us know your comments: [https://github.com/apache/cassandra/pull/2374]

 


was (Author: chovatia.jayd...@gmail.com):
[~smiklosovic] [~brandon.williams]  

If you look at the original Pull request, then it does exactly what you have 
just proposed. It does not persist in any state and provides a JMX endpoint. It 
has been tested and working in my production environment for more than a year 
now without any issues.  Please look at this and let us know your comments: 
[https://github.com/apache/cassandra/pull/2374]

 

> A new nodetool/JMX command that tells whether node's decommission failed or 
> not
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18555
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18555
>             Project: Cassandra
>          Issue Type: Task
>          Components: Observability/JMX
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently, when a node is being decommissioned and if any failure happens, 
> then an exception is thrown back to the caller.
> But Cassandra's decommission takes considerable time ranging from minutes to 
> hours to days. There are various scenarios in that the caller may need to 
> probe the status again:
>  * The caller times out
>  * It is not possible to keep the caller hanging for such a long time
> And If the caller does not know what happened internally, then it cannot 
> retry, etc., leading to other issues.
> So, in this ticket, I am going to add a new nodetool/JMX command that can be 
> invoked by the caller anytime, and it will return the correct status.
> It might look like a smaller change, but when we need to operate Cassandra at 
> scale in a large-scale fleet, then this becomes a bottleneck and require 
> constant operator intervention.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to