[ 
https://issues.apache.org/jira/browse/CASSANDRA-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051320#comment-18051320
 ] 

Isaac Reath commented on CASSANDRA-21024:
-----------------------------------------

Thank you for the initial review [~smiklosovic]!

This is already implemented in for the guardrail in the 
{{ModificationStatement#validateDiskUsage}} function 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/ModificationStatement.java#L424).
 This function calls {{Guardrails.replicaDiskUsage.guard}} for each replica 
which owns a token for the (keyspace,token) in the write request. This in turn 
calls {{DiskUsageBroadcaster#isFull}} which is what this patch updates. The 
patch updates {{DiskUsageBroadcaster#isFull}} such that it returns true if the 
node or any other node in that node's datacenter is full. 

> Add configuration to disk usage guardrails to stop writes across all replicas 
> of a keyspace when any node replicating that keyspace exceeds the disk usage 
> failure threshold.
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21024
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21024
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Feature/Guardrails
>            Reporter: Isaac Reath
>            Assignee: Isaac Reath
>            Priority: Normal
>             Fix For: 6.x
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> [CASSANDRA-17150|https://issues.apache.org/jira/browse/CASSANDRA-17150] 
> introduced disk usage guardrails that stop writes for specific tokens when 
> any replica responsible for those tokens exceeds the configured failure 
> threshold. This mechanism protects individual nodes from running out of disk 
> space but can result in inconsistent write availability when only a subset of 
> replicas or token ranges are affected. This in turn pushes the responsibility 
> onto the application owner to decide how to handle the partial write 
> unavailability. 
> We propose adding a new configuration option, 
> data_disk_usage_stop_writes_for_keyspace_on_fail, that extends this behavior 
> to the keyspace level. When enabled, if any node that participates in 
> replication for a keyspace exceeds the disk usage failure threshold, writes 
> to that keyspace will be stopped across all nodes who replicate that keyspace.
> This change provides operators with finer control over guardrail enforcement, 
> allowing them to choose between the current per-token behavior or a stricter, 
> keyspace-wide policy that prioritizes simplicity and operational 
> predictability over partial write availability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to