[
https://issues.apache.org/jira/browse/CASSSIDECAR-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18013787#comment-18013787
]
Andres Beck-Ruiz commented on CASSSIDECAR-274:
----------------------------------------------
Following the existing Sidecar API structure that organizes keyspace and table
operations under {{/keyspace}} or {{/table}} resources, respectively, we can
create a new {{/cluster}} resource for all cluster-wide operations, which could
include upgrades and configuration changes in the future.
----
{{POST /api/v1/cassandra/cluster/restart}}
Create a restart job. By default, will put the restart job in "PENDING" state,
and will throw a 409 Conflict error if {{executeImmediately = true}} is
submitted as an argument if there is already an active restart ("RUNNING" or
"PAUSED").
||Field||Type||Optional||Default Value||Description||
|hosts|List<String>|true|[]|Subset of hosts to restart. By default, all hosts
will be restarted|
|rackParallelism|int|true|1|Amount of nodes that can be restarted in parallel
within a rack. Max parallelism would be the number of nodes in a rack.|
|executeImmediately|boolean|true|false|Optional flag to skip pending state and
set the restart task to "RUNNING"|
{{RestartJob}} Object
The order of the lists of hosts in {{hosts_pending}} represents the order in
which hosts will be restarted, and each list of hosts can be restarted in
parallel.
{code:java}
{
id: int,
state: String,
start_time: String,
hosts_succeeded: List<String>,
hosts_failed: List<String>,
hosts_restarting: List<String>,
hosts_pending: List<List<String>>,
last_update: String (message containing job updates)
}
{code}
Response
* 201 Created
** {{restart_job: RestartJob}}
* 400 Bad Request
** {{error :: string}}
* 409 Conflict
** {{error :: string}}
* 500 Internal Sever Error
** {{error :: string}}
Sample input:
{code:java}
{
"hosts": ["192.168.1.10", "192.168.1.11", "192.168.1.12",
"192.168.1.13", "192.168.1.14", "192.168.1.15"],
"rackParallelism": 2,
"executeImmediately": true
}
{code}
Sample output:
{code:java}
{
id: 123,
state: "RUNNING",
start_date: “2025-08-04T22:55:00+00:00”,
hosts_succeeded: []
hosts_failed: []
hosts_restarting: []
hosts_pending: [["192.168.1.10","192.168.1.11"]["192.168.1.12",
"192.168.1.13"],[ "192.168.1.14", "192.168.1.15"]]
last_update: “Starting restart 123"
}
{code}
----
{{{}PATCH /api/v1/cassandra/cluster/restart/{id{}}}}
Used to update the state of an active restart job specified by ID. This
endpoint will be used to start, pause, and abort an active restart job. Will
throw a 409 Conflict error if setting a restart to "RUNNING" when there is
already an active restart or invalid state transition. Valid state transitions
are depicted below, and the parameters follow guidelines described in [RFC 6902
JavaScript Object Notation (JSON)
Patch|https://www.rfc-editor.org/rfc/rfc6902#section-4.3].
||Field||Type||Optional||Default Value||Description||
|op|String|true|N/A|Patch operation (must be “replace”)|
|path|String|true|N/A|Resource being replaced (must be “state”)|
|value|String|true|N/A|State to patch for the current restart job. Valid states
to patch on update are RUNNING, PAUSED, or ABORTED.|
Valid state transitions:
!Screenshot 2025-08-13 at 12.34.43 PM.png|width=590,height=454!
Response
* 200 Ok
** {{restart_job: RestartJob object}}
* 400 Bad Request
** {{error :: string}}
* 409 Conflict (Invalid state transition or active restart ongoing)
** {{error :: string}}
* 500 Internal Sever Error
** {{error :: string}}
Sample Input
{code:java}
{
“op”: “replace”,
“path”: “/state”,
“value”: “PAUSED”
}
{code}
----
{{{}GET /api/v1/cassandra/cluster/restart/{id{}}}}
Get the restart job specified by the ID.
Response
* 200 Ok
** {{restart_job :: RestartJob }}
* 404 Not Found
** {{error :: string}}
* 500 Internal Sever Error
** {{error :: string}}
----
{{GET /api/v1/cassandra/cluster/restart}}
Gets the history of all restart jobs, including any active restart. The amount
of restart history that is kept can be configured in {{{}sidecar.yaml{}}}.
Response
* 200 Ok
** {{restart_jobs :: [RestartJob]}} (can be empty)
* 500 Internal Sever Error
** {{error :: string}}
Sample response:
{code:java}
{
“restart_jobs”: [
{
id: 123,
state: ABORTED,
start_date: “2025-08-04T22:55:00+00:00”,
hosts_succeeded: ["192.168.1.10","192.168.1.11"],
hosts_failed: [],
hosts_restarting: [],
hosts_pending: [["192.168.1.12", "192.168.1.13"],[ "192.168.1.14",
"192.168.1.15"]],
last_update: “Restart 123 aborted"
},
{
id: 456,
state: COMPLETED,
start_date: “2025-08-03T22:55:00+00:00”,
hosts_succeeded: ["192.168.1.10", "192.168.1.11", "192.168.1.12",
"192.168.1.13", "192.168.1.14","192.168.1.15"],
hosts_failed: [],
hosts_restarting: [],
hosts_pending: [],
last_update: “Restart 123 completed"
}
]
}
{code}
> Enable rolling restarts of Cassandra clusters via Sidecar
> ---------------------------------------------------------
>
> Key: CASSSIDECAR-274
> URL: https://issues.apache.org/jira/browse/CASSSIDECAR-274
> Project: Sidecar for Apache Cassandra
> Issue Type: Improvement
> Reporter: Isaac Reath
> Priority: Major
> Attachments: Screenshot 2025-08-13 at 12.34.43 PM.png
>
>
> Rolling restarts are frequently used in Cassandra to apply changes to a
> cluster such as configuration changes, or version upgrades. In
> CASSSIDECAR-266, we are adding functionality to safely start and stop a
> single Cassandra node via Sidecar. This ticket will build on that work to
> implement a coordinated rolling restart.
> The scope of this effort includes:
> * Adding API endpoints to enable operators to start, monitor, pause and stop
> a rolling restart.
> * Updating Sidecar to orchestrate start and stop operations across the
> cluster, allowing for a configurable amount of nodes to be offline
> simultaneously.
> * Creating safeguards to ensure that a rolling restart is safe to perform
> and does not interfere with other operations ongoing in the cluster such as
> node bootstraps or decommissions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]